Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cross-task Generalization on Cooking (test)
Loading...
0.6889
Similarity
OOWM 3-Stage
0.565764
0.597732
0.6297
0.661668
Feb 25, 2026
Similarity
Precision
Recall
F1 Score
Updated 5d ago
Evaluation Results
Method
Method
Links
Similarity
Precision
Recall
F1 Score
OOWM 3-Stage
Training Configuration...
2026.02
0.6889
34.47
44.48
38.24
Unstructured Baseline
2026.02
0.6357
20.1
42.19
26.94
OOWM 2-Stage
Training Configuration...
2026.02
0.6058
41.19
40.59
30.76
Hybrid Strategy
2026.02
0.5705
26.83
42.69
32.13
Feedback
Search any
task
Search any
task