Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MalAlgoQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Causal Variable IdentificationMalAlgoQA
F1 (X)84.1
7
Outcome ReasoningMalAlgoQA
M' (F1 Mean)85.1
7
Showing 2 of 2 rows