Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GUI Agent Task Completion on OSWorld 1.0 (test)

82.05Success Rate (GIMP)

EvoCUA-8B + LEARNWEAK

28.7542.587556.42570.2625Jan 7, 2026Jan 30, 2026Feb 22, 2026Mar 18, 2026Apr 10, 2026May 3, 2026May 27, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.05
82.05-41.1350.3555.07-66.6773.3356.8672.4662.24
2026.05
76.29-51.0652.9865.22-756064.6565.2263.8
2026.05
74.36-35.4648.2156.52-61.1157.7837.2572.7355.43
2026.05
73.08-80.8582.1973.91-79.178075.7191.379.53
2026.05
69.23-74.4770.2186.83-91.6766.6781.4172.7376.65
2026.01
69.235.510.629.837.7948.635.533.362.329.7
2026.01
69.236.912.829.847.89.745.84035.365.231.4
2026.05
66.15-28.0737.6650.43-60.8365.3345.7151.350.69
2026.01
61.534.81027.734.8841.74025.555.127.3
2026.05
57.69-19.1536.8840.58-59.4266.6747.0662.3248.72
2026.01
55.140.513.530.739.19.752.244.42552.829.7
2026.01
51.922.911.729.739.13.834.826.734.26324.5
2026.05
48.46-11.9131.4930.43-4054.6732.9451.337.65
2026.01
46.244.41331.839.14.830.466.723.556.528.1
2026.01
46.236.91736.243.59.737.566.738.560.931.2
2026.05
42.3--22.731.8---35.340.5-
2026.05
39.74-22.743.9752.17-41.6766.6744.1247.8344.86
2026.01
34.636.910.625.430.410.845.846.729.447.826
2026.01
34.641.28.529.739.110.85033.335.343.527.1
2026.05
30.8-44.742.634.7------