Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-domain

Benchmarks

Task NameDataset NameSOTA ResultTrend
Distributional Alignmentin-domain (test)
JSD0.11
56
Arithmetic ReasoningIn-domain (test)
Accuracy53.4
50
Visual Question AnsweringIn-Domain Multiple-Choice
OCR-VQA Accuracy92.9
15
Image RetrievalIn-domain (ID) Split
mAP0.571
14
Interactive SegmentationIn-domain (test)
IoU86.37
14
Instruction FollowingIn-domain
Win Rate14
11
Tool CallingIn-domain (unseen)
Exact Match (EM)57.24
10
Tool CallingIn-domain seen
EM74.05
10
Machine TranslationIn-Domain (ID) (val)
BLEU40.72
10
Tool IdentificationIn-domain unseen
EM75.86
9
Tool IdentificationIn-domain (seen)
Exact Match (EM)85.95
9
Singing Accompaniment GenerationIn-domain
CE Score7.3543
8
LLM RoutingIn-domain
Cost (Cost-first)2.3
7
Ambisonics encodingIn-Domain
Coherence54
7
AI-Generated Text DetectionIn-domain (test)
OA100
4
Shading EstimationIn-domain
MSE0.0265
3
Albedo EstimationIn-domain
MSE0.0051
3
Depth EstimationIn-domain
REL0.1072
3
Showing 18 of 18 rows