Share your thoughts, 1 month free Claude Pro on usSee more

LLM Workflow Optimization on Big-Bench Hard (test)

78.6BBH Overall Accuracy

Trace

Updated 5mo ago

Evaluation Results

Method	Links
Trace 2024.06		78.6	75.8	80.6
DSPy-PO 2024.06		71.6	73.9	70
DSPy 2024.06		70.4	73.7	68
Trace 2024.06		59.5	70.9	51.1
DSPy-PO 2024.06		55.3	69	45.2
DSPy 2024.06		41.6	53.8	32.6