Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context language modeling evaluation on FDA (test)

0.8004Score

GA-S2

0.295480.4265650.557650.688735Dec 23, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0.8004
2025.12
0.7822
2025.12
0.7641
2025.12
0.7613
2025.12
0.7595
2025.12
0.7559
2025.12
0.7532
2025.12
0.7514
2025.12
0.7514
2025.12
0.7505
2025.12
0.7505
2025.12
0.7468
2025.12
0.7459
2025.12
0.7432
2025.12
0.7359
2025.12
0.7314
2025.12
0.7241
2025.12
0.7196
2025.12
0.7178
2025.12
0.7169
2025.12
0.7096
2025.12
0.7069
2025.12
0.706
2025.12
0.7051
2025.12
0.7042
2025.12
0.7033
2025.12
0.696
2025.12
0.6942
2025.12
0.6933
2025.12
0.686
2025.12
0.6842
2025.12
0.6833
2025.12
0.6806
2025.12
0.6788
2025.12
0.6779
2025.12
0.676
2025.12
0.6751
2025.12
0.6697
2025.12
0.6679
2025.12
0.6679
2025.12
0.6642
2025.12
0.6633
2025.12
0.6588
2025.12
0.6461
2025.12
0.6443
2025.12
0.6416
2025.12
0.6379
2025.12
0.6334
2025.12
0.6279
2025.12
0.6261
2025.12
0.6234
2025.12
0.6207
2025.12
0.6025
2025.12
0.6016
2025.12
0.6007
2025.12
0.588
2025.12
0.5771
2025.12
0.5726
2025.12
0.5672
2025.12
0.5644
2025.12
0.5572
2025.12
0.5563
2025.12
0.5508
2025.12
0.5345
2025.12
0.5299
2025.12
0.5227
2025.12
0.5163
2025.12
0.51
2025.12
0.5054
2025.12
0.5045
2025.12
0.5
2025.12
0.5
2025.12
0.4809
2025.12
0.4809
2025.12
0.4773
2025.12
0.4746
2025.12
0.4728
2025.12
0.4628
2025.12
0.4601
2025.12
0.4465
2025.12
0.4374
2025.12
0.4365
2025.12
0.4247
2025.12
0.4201
2025.12
0.4183
2025.12
0.4165
2025.12
0.4074
2025.12
0.4011
2025.12
0.3684
2025.12
0.3648
2025.12
0.3648
2025.12
0.3612
2025.12
0.3548
2025.12
0.3448
2025.12
0.3385
2025.12
0.3385
2025.12
0.3303
2025.12
0.3276
2025.12
0.3258
2025.12
0.3149
Showing 100 of 120 rows