Complex Reasoning

Benchmarks

Dataset Name	SOTA Method	Metric
BBH	DECENTMEM	Accuracy90.5	99	1mo ago
SCALE MultiChallenge	BRAID	Accuracy65.1	81	1mo ago
LLaVA Bench (val)	MARS	Perplexity2.1875	44	4mo ago
BBH (val)	G2IS	Accuracy65.81	42	4mo ago
Video-TT	OmniJigsaw (CMM)	Accuracy46.5	39	1mo ago
SciFact (test)	LLM annotation	Macro-F176.17	37	1mo ago
VitaminC (test)	EvoPool	Macro-F178.4	37	1mo ago
FEVER (test)	EvoPool	Macro F185.18	37	1mo ago
BBH	l2-guided Selection	Accuracy (%)89.14	28	1mo ago
GraphRAG-Bench	HippoRAG 2	Rel86.02	27	25d ago
vcrbench	VideoLatent-7B (Ours)	Accuracy53.1	24	1mo ago
GAIA Text		Accuracy76.4	19	3mo ago
BBH		Acc83.03	16	3mo ago
LLaVA-Bench Wilder	Qwen3-VL-4B (Base)	Score74.3	14	1mo ago
Frames	Tongyi DeepResearch 30B	Accuracy90.6	13	3mo ago
Humanity's Last Exam (HLE)	gemini-2.5-pro	Pass@1 Score18.4	13	4mo ago
AIME	PRR	Decoding Speedup1.64	12	25d ago
Arena-Hard 2.0 (test)	TCR	Overall Accuracy52.9	12	2mo ago
AQuA		Accuracy28.35	12	2mo ago
GraphRAG-Bench Medical Dataset	HyperSU	Accuracy (ACC)75.03	9	25d ago
TOMATO	Qwen3-VL-8B + SynRL	Accuracy38.1	9	4mo ago
GSM8K (test)	Full Repetition	Accuracy87.3	8	22d ago
MMLU-Pro (test)		Accuracy28.6	8	22d ago
Retrieval Quality Evaluation (test)	FlowRAG	Recall91.89	8	1mo ago
G-Bench Medical (val)	MemGraphRAG	Recall90.42	8	1mo ago

Showing 25 of 39 rows