Share your thoughts, 1 month free Claude Pro on usSee more

Loong

Benchmarks

Task Name	Dataset Name	SOTA Result
Long-context evaluation (Financial)	Loong Fin	Fin Judge Score58.8	13
Overall	Loong Set 4: 200K–250K Tokens	LLM Score54.62	12
Chain-of-reasoning	Loong Set 4: 200K–250K Tokens	LLM Score36.17	12
Clustering	Loong Set 4: 200K–250K Tokens	LLM Score57.53	12
Comparison	Loong Set 4: 200K–250K Tokens	LLM Score55.8	12
Spotting	Loong Set 4: 200K–250K Tokens	LLM Score57.74	12
Overall	Loong Set 3: 100K–200K Tokens	LLM Score58.86	12
Chain-of-reasoning	Loong Set 3: 100K–200K Tokens	LLM Score0.5217	12
Clustering	Loong Set 3: 100K–200K Tokens	LLM Score58.85	12
Comparison	Loong Set 3: 100K–200K Tokens	LLM Score57.84	12
Spotting	Loong Set 3: 100K–200K Tokens	LLM Score0.6862	12
Overall	Loong Set 2: 50K–100K Tokens	LLM Score0.6361	12
Chain-of-reasoning	Loong Set 2: 50K–100K Tokens	LLM Score58.23	12
Clustering	Loong Set 2: 50K–100K Tokens	LLM Score61.67	12
Comparison	Loong Set 2: 50K–100K Tokens	LLM Score64.34	12
Spotting	Loong Set 2: 50K–100K Tokens	LLM Score69.92	12
Overall	Loong Set 1: 10K–50K Tokens	LLM Score71	12
Chain-of-reasoning	Loong Set 1: 10K–50K Tokens	LLM Score70.31	12
Clustering	Loong Set 1: 10K–50K Tokens	LLM Score0.6536	12
Comparison	Loong Set 1: 10K–50K Tokens	LLM Score75.65	12
Spotting	Loong Set 1: 10K–50K Tokens	LLM Score0.766	12
Long-Context Reasoning	LOONG	Accuracy65.43	11
Long-document Question Answering	Loong	Accuracy78.57	10
Structured Information Extraction	Loong Finance (test)	Spotlight Locating (AS)83.97	10
Structured output generation for long-document QA	Loong Finance	Spotlight Locating AS84.42	9

Showing 25 of 35 rows