Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Calendar Scheduling on Natural Plan Calendar Scheduling
Loading...
72
Success Rate
SMaRT
49.328
55.214
61.1
66.986
Oct 20, 2025
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
SMaRT
Model=GPT-4, Number of...
2025.10
72
CoT
Model=GPT-4, Number of...
2025.10
70.2
LLM-as-a-Judge
Model=GPT-4, Number of...
2025.10
69.4
Direct
Model=GPT-4, Number of...
2025.10
66.1
SMaRT
Model=Gemini-1.5, Numb...
2025.10
58.5
CoT
Model=Gemini-1.5, Numb...
2025.10
57.7
LLM-as-a-Judge
Model=Gemini-1.5, Numb...
2025.10
57
Direct
Model=Gemini-1.5, Numb...
2025.10
50.2
Feedback
Search any
task
Search any
task