Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SSA: Improving Performance With a Better Scoring Function

About

While transformer models exhibit strong in-context learning (ICL) abilities, they often fail to generalize under simple distribution shifts. We analyze these failures and identify Softmax, the scoring function in the attention mechanism, as a contributing factor. We propose \textbf{Scaled Signed Averaging (SSA)}, a novel attention scoring function that mitigates these failures. SSA significantly improves performance on our ICL tasks and outperforms transformer models with Softmax on several NLP benchmarks and linguistic probing tasks, in both decoder-only and encoder-only architectures.

Omar Naim, Swarnadeep Bhar, J\'er\^ome Bolte, Nicholas Asher• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy51.78
1442
Commonsense ReasoningHellaSwag
HellaSwag Accuracy32.83
711
Question AnsweringOpenBookQA
Accuracy30.4
305
Word Sense DisambiguationWiC
Avg Accuracy50.78
261
Common Sense ReasoningCOPA
Accuracy64
256
Boolean Question AnsweringBoolQ
Accuracy56.18
27
Reading ComprehensionMultiRC
MultiRC Accuracy43.5
25
Question AnsweringARC Easy
Normalized Accuracy53.87
20
Reading ComprehensionReCoRD
F1 Score24.82
6
Language ModelingFineWeb in-distribution
Perplexity (PPL)19.73
2
Showing 10 of 12 rows

Other info

Follow for update