Share your thoughts, 1 month free Claude Pro on usSee more

User Simulation on HUMANUAL (test)

40.58News Score

HUMANLM

Updated 4mo ago

Evaluation Results

Method	Links
HUMANLM 2026.02		40.58	57.1	46.21	40.68	46.21	43.63	45.7
GRPO-think 2026.02		38.07	55.48	46.33	40.06	39.7	42.3	43.7
Qwen3-8b-think 2026.02		36.33	55.35	44.5	39.78	38.17	40.7	42.5