Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Phishing Detection on Mendeley and TR-OP averaged
Loading...
90.57
Precision
PhishDebate
59.37
67.47
75.57
83.67
Jun 18, 2025
Precision
Accuracy
Recall
F1 Score
Average Time (s)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Precision
Accuracy
Recall
F1 Score
Average Time (s)
PhishDebate
Base LLM=GPT-4o-mini
2025.06
90.57
93.9
98
94.14
37.5
CoT
Base LLM=GPT-4o-mini
2025.06
88.61
90.7
93.4
90.94
10.5
Single Agent
Base LLM=GPT-4o-mini
2025.06
60.57
67
97.4
74.69
4.7
Feedback
Search any
task
Search any
task