Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context language modeling on RULER 1.0 (test)

0.977Accuracy (4K Context)

MInference

-0.018280.240110.49850.75689Jul 2, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.07
0.9770.9120.8850.850.8230.7760.87
2024.07
0.9720.9180.8730.8080.7740.7220.844
2024.07
0.9720.3810.3750.1720.1420.0940.35
2024.07
0.9470.8950.7640.6650.5680.5350.729
2024.07
0.9460.9310.910.8960.8550.840.896
2024.07
0.9380.9160.8930.8740.8520.8080.88
2024.07
0.9380.6690.5850.5140.4590.3910.593
2024.07
0.9230.8970.790.7380.6470.5690.747
2024.07
0.9190.9020.7880.7630.6810.6290.781
2024.07
0.9190.3780.3390.1860.130.1280.343
2024.07
0.8940.7980.7010.5560.430.3950.629
2024.07
0.8030.8390.6070.4520.3860.3020.565
2024.07
0.4480.4280.3850.2980.2680.2390.344
2024.07
0.2340.0070.0140.1880.1650.1560.127
2024.07
0.0260.0070.0060.0060.0120.0050.011
2024.07
0.020.0070.0060.0060.0070.0130.01