Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Context Learning Task-Solving on CL-Bench
Loading...
15.8
Overall Score
ContextGuard
9.3936
11.0568
12.72
14.3832
May 26, 2026
Overall Score
Domain Knowledge Reasoning Score
Rule System Application Score
Procedural Task Execution Score
Empirical Discovery & Simulation Score
Updated 7d ago
Evaluation Results
Method
Method
Links
Overall Score
Domain Knowledge Reasoning Score
Rule System Application Score
Procedural Task Execution Score
Empirical Discovery & Simulation Score
ContextGuard
Model=Qwen3.5-9B
2026.05
15.8
17.65
14.13
16.14
13.57
ContextGuard
Model=Qwen3.5-4B
2026.05
13.85
14.48
13.07
14.01
13.57
Self-Refine
Model=Qwen3.5-4B
2026.05
10.48
11.76
10.42
9.34
9.05
Baseline
Model=Qwen3.5-9B
2026.05
10.43
11.31
10.25
10.62
7.54
Baseline
Model=Qwen3.5-4B
2026.05
9.64
9.8
9.9
9.13
9.55
Feedback
Search any
task
Search any
task