Share your thoughts, 1 month free Claude Pro on usSee more

RTL code functionality equivalence checking on DeepRTL2 Benchmark

74.5AP

UniRTL

Updated 1mo ago

Evaluation Results

Method	Links
UniRTL 2026.05		74.5	74.7	75.3	73.4	77.3
GraphCodeBERT 2026.05		73	73.3	75.3	61.3	97.3
UniRTL (w/o graph) 2026.05		71.2	66.7	71.7	57.7	94.7
DeepRTL2 2025.05		66.7	-	-	-	-
DeepRTL2-Llama 2026.05		64.6	69.5	73.7	59.7	96.4
DeepRTL2-DeepSeek 2026.05		63.1	64	72.9	58.7	96
CircuitFusion 2026.05		62.8	66.7	73.6	60.2	94.7
GritLM-7B 2026.05		59.9	64	72.4	58.7	94.7
DeepRTL2 2025.05		59.1	-	-	-	-
text-embedding-3-small 2025.05		56.5	-	-	-	-
text-embedding-3-large 2026.05		56.4	58.7	68.7	55.3	90.7
NV-Embed-v2 2026.05		55.4	60.7	66.7	54.7	85.3
text-embedding-3-small 2026.05		54.3	61.3	69.6	54.5	96
GritLM 2025.05		54.1	-	-	-	-
DeepRTL2 2025.05		51.8	-	-	-	-
text-embedding-3-large 2025.05		49.8	-	-	-	-
DeepRTL2 2025.05		48.1	-	-	-	-