Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Autonomous LLM Fine-tuning on TOMG-Bench
Loading...
68.1
Validation Score
TREX
16.204
29.677
43.15
56.623
Apr 15, 2026
Validation Score
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Validation Score
Accuracy
TREX
Researcher Backend=Gem...
2026.04
68.1
68.1
Qwen3-235B-2507
Model Category=Ref Model
2026.04
64.2
64.2
TREX
Researcher Backend=Qwe...
2026.04
55.7
55.7
Qwen3-1.7B
Model Category=Base Model
2026.04
18.2
18.2
Feedback
Search any
task
Search any
task