Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

About

The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.

Yongtong Gu, Songze Li, Xia Hu• 2026

Related benchmarks

TaskDatasetResultRank
Adversarial Evasion AttackMGTBench Essay
ASR95
24
Adversarial Evasion AttackMGTBench Reuters
ASR73
24
Adversarial Evasion AttackMGTBench WP
ASR90
24
Adversarial Evasion AttackMGT-Academic Humanity
ASR87
22
Adversarial Evasion AttackMGT-Academic Social Science
Attack Success Rate (ASR)98
22
Adversarial Evasion AttackMGT Academic STEM
ASR100
22
Showing 6 of 6 rows

Other info

Follow for update