Share your thoughts, 1 month free Claude Pro on usSee more

tracking shuffled objects seven objects on BBH (test)

92.8Accuracy

TextGrad

Updated 2mo ago

Evaluation Results

Method	Links
TextGrad 2026.05		92.8
TextReg 2026.05		92.2
CoT 2026.05		89.1
EvoPrompt(GA)-OPTS(TS) 2025.03		88.33
REVOLVE 2026.05		87.7
EvoPrompt(DE)-OPTS(TS) 2025.03		81.83
EvoPrompt(DE) 2025.03		80.67
TextReg 2026.05		76.6
EvoPrompt(GA) 2025.03		75.67
TextGrad 2026.05		66.7
REVOLVE 2026.05		65.4
CoT 2026.05		48.5
TextReg 2026.05		48.3
REVOLVE 2026.05		41.8
REVOLVE 2026.05		40.2
TextReg 2026.05		39.1
TextGrad 2026.05		38
CoT 2026.05		36.5
CoT 2026.05		33.9
TextGrad 2026.05		33.1