Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-form generation on LongBench Write-en (Binned Context Length Metrics)

92.2Average Sequence Length Success Rate

GPT-4o

77.6481.4285.288.98Feb 4, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.02
92.2--------93.5--
2025.02
90.59--------94.31--
88.6--------84.4--
2025.02
87.3890.6886.2777.2391.2593.3590.5388.2585.0688.28--
2025.02
85.69--------90.53--
2025.02
85.5590.9385.7876.6785.4690.0190.5381.0780.985.66--
2025.02
83.5488.9391.9185.4791.2588.6385.671.1485.4188.54--
2025.02
83.1288.18674.586.989.188.380.879.285.1--
2025.02
81.83--------91.91--
2025.02
81.79--------92.72--
2025.02
81.386.3288.2388.7189.1689.2884.0960.8978.8285.07--
2025.02
79.5190.887.9981.2784.3781.2184.8458.6978.1385.08--
78.2--------80.6--
----------56.876.3
----------5980.3
2025.02
----------67.890.9
2025.02
----------53.2793.01
2025.02
----------65.2987.88
2025.02
----------65.1987.48
2025.02
----------69.4389.21