Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Autonomous LLM Fine-tuning on HoC
Loading...
89.7
Macro-F1
TREX
44.46
56.205
67.95
79.695
Apr 15, 2026
Macro-F1
Updated 3d ago
Evaluation Results
Method
Method
Links
Macro-F1
TREX
Researcher Backend=Gem...
2026.04
89.7
TREX
Researcher Backend=Qwe...
2026.04
89.6
Qwen3-235B-2507
Model Category=Ref Model
2026.04
64.5
Qwen3-1.7B
Model Category=Base Model
2026.04
46.2
Feedback
Search any
task
Search any
task