Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn conversation evaluation on Lost-in-Conversation (test)
Loading...
95
Actions Score
Full†
38.216
52.958
67.7
82.442
May 26, 2026
Actions Score
Math Score
Code Score
Average Score
Updated 7d ago
Evaluation Results
Method
Method
Links
Actions Score
Math Score
Code Score
Average Score
Full†
Model=Llama 3.3-70B (G...
2026.05
95
91.7
72
86.2
Full†
Model=GPT-4o-mini (Ach...
2026.05
94.1
88.1
75.9
86
Full†
Model=Gemini 2.5 Flash...
2026.05
88.4
90.6
97
92
SeDT
Model=Llama 3.3-70B (G...
2026.05
85.9
71.8
61.8
73.2
SeDT
Model=Gemini 2.5 Flash...
2026.05
78.1
75.7
72
75.3
Sharded
Model=Llama 3.3-70B (G...
2026.05
74.7
62.9
51.6
63.1
SeDT
Model=GPT-4o-mini (Ach...
2026.05
73
74.4
58.8
68.7
Sharded
Model=GPT-4o-mini (Ach...
2026.05
46.7
62.9
50.4
53.3
Sharded
Model=Gemini 2.5 Flash...
2026.05
40.4
65
66.4
57.3
Feedback
Search any
task
Search any
task