Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CAA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Behavior SelectionCAA (50 random samples)
Accuracy (coordinate-ais, pair)98
22
Hallucination SteeringCAA
Runtime19.5
13
Concept AlignmentCAA (test)
AIC75.18
12
Open-ended behavior generationCAA
CoAIS Score5.33
10
HallucinationCAA
Accuracy (pair)88
8
Showing 5 of 5 rows