Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ParameterNet: Parameters Are All You Need

About

The large-scale visual pretraining has significantly improve the performance of large vision models. However, we observe the \emph{low FLOPs pitfall} that the existing low-FLOPs models cannot benefit from large-scale pretraining. In this paper, we introduce a novel design principle, termed ParameterNet, aimed at augmenting the number of parameters in large-scale visual pretraining models while minimizing the increase in FLOPs. We leverage dynamic convolutions to incorporate additional parameters into the networks with only a marginal rise in FLOPs. The ParameterNet approach allows low-FLOPs networks to take advantage of large-scale visual pretraining. Furthermore, we extend the ParameterNet concept to the language domain to enhance inference results while preserving inference speed. Experiments on the large-scale ImageNet-22K have shown the superiority of our ParameterNet scheme. For example, ParameterNet-600M can achieve higher accuracy on ImageNet than the widely-used Swin Transformer (81.6\% \emph{vs.} 80.9\%) and has much lower FLOPs (0.6G \emph{vs.} 4.5G). In the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher accuracy over vanilla LLaMA. The code will be released at \url{https://parameternet.github.io/}.

Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy60.56
1460
Boolean Question AnsweringBoolQ
Accuracy60.56
307
Common Sense ReasoningARC Easy
ARC (easy) Accuracy49.58
52
Sentiment AnalysisSST-2
SST-2 Accuracy90.34
7
Showing 4 of 4 rows

Other info

Code

Follow for update