Share your thoughts, 1 month free Claude Pro on usSee more

Natural Language Understanding and Reasoning on Standard Downstream Benchmarks Two-Shot (val)

56.86ARC-E Accuracy (Normalized)

AdaGC

Updated 4mo ago

Evaluation Results

Method	Links
AdaGC 2025.02		56.86	29.61	59.36	57.89	33.6	73.99	57.62	26.46	85.9	53.47
GlobalGC 2025.02		55.81	28.58	60.7	56.54	33	73.72	56.75	25.51	83.2	52.64
AdaGC 2025.02		53.83	28.42	58.69	55.66	33.8	73.07	54.14	25.12	81.8	51.61
AGC 2025.02		52.95	28.67	56.15	55.69	35.4	73.07	56.43	26.88	82.8	52
Clippy 2025.02		52.86	29.1	56.48	53.76	31.8	73.07	55.72	26.03	82.6	51.27
ClipByValue 2025.02		51.94	26.88	57.55	53.36	32.4	72.31	54.14	26.63	81.6	50.75
GlobalGC 2025.02		50.34	27.39	58.81	52.96	34.2	71.16	54.06	25.37	79.9	50.47
GlobalGC 2025.02		47.26	25.6	50.31	46.44	32.2	69.64	52.33	25.07	77.8	47.41
ClipByValue 2025.02		47.1	25.77	56.54	43.97	30	68.88	52.96	26.09	77.2	47.61
Clippy 2025.02		46.55	25.85	49.76	45.71	30	70.02	53.2	25.69	77.7	47.16
AdaGC 2025.02		46.04	26.19	49.72	47.51	31	69.7	54.38	24.98	78.5	47.56