Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task-specific Risk Detection on Mind2Web-SC (test)

0.99LPA

GPT-4o

0.54280.65890.7750.8911Feb 17, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
0.990.990.990.990.99
2025.02
0.9840.990.980.9840.947
2025.02
0.960.9690.9490.9590.78
2025.02
0.9430.89810.9460.989
2025.02
0.940.9140.970.9410.958
2025.02
0.9330.89210.9430.99
2025.02
0.910.80.890.9
2025.02
0.7250.7920.610.6890.885
2025.02
0.560.930.130.23-