Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Loong

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context evaluation (Financial)Loong Fin
Fin Judge Score58.8
13
OverallLoong Set 4: 200K–250K Tokens
LLM Score54.62
12
Chain-of-reasoningLoong Set 4: 200K–250K Tokens
LLM Score36.17
12
ClusteringLoong Set 4: 200K–250K Tokens
LLM Score57.53
12
ComparisonLoong Set 4: 200K–250K Tokens
LLM Score55.8
12
SpottingLoong Set 4: 200K–250K Tokens
LLM Score57.74
12
OverallLoong Set 3: 100K–200K Tokens
LLM Score58.86
12
Chain-of-reasoningLoong Set 3: 100K–200K Tokens
LLM Score0.5217
12
ClusteringLoong Set 3: 100K–200K Tokens
LLM Score58.85
12
ComparisonLoong Set 3: 100K–200K Tokens
LLM Score57.84
12
SpottingLoong Set 3: 100K–200K Tokens
LLM Score0.6862
12
OverallLoong Set 2: 50K–100K Tokens
LLM Score0.6361
12
Chain-of-reasoningLoong Set 2: 50K–100K Tokens
LLM Score58.23
12
ClusteringLoong Set 2: 50K–100K Tokens
LLM Score61.67
12
ComparisonLoong Set 2: 50K–100K Tokens
LLM Score64.34
12
SpottingLoong Set 2: 50K–100K Tokens
LLM Score69.92
12
OverallLoong Set 1: 10K–50K Tokens
LLM Score71
12
Chain-of-reasoningLoong Set 1: 10K–50K Tokens
LLM Score70.31
12
ClusteringLoong Set 1: 10K–50K Tokens
LLM Score0.6536
12
ComparisonLoong Set 1: 10K–50K Tokens
LLM Score75.65
12
SpottingLoong Set 1: 10K–50K Tokens
LLM Score0.766
12
Long-Context ReasoningLOONG
Accuracy65.43
11
Structured Information ExtractionLoong Finance (test)
Spotlight Locating (AS)83.97
10
Structured output generation for long-document QALoong Finance
Spotlight Locating AS84.42
9
Structured Data Extraction and ReasoningLoong
Spotlight Locating Accuracy (AS)73.95
8
Showing 25 of 34 rows