MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

About

The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.

Yongtong Gu, Songze Li, Xia Hu• 2026

Related benchmarks

Task	Dataset	Result
Adversarial Evasion Attack	MGTBench Essay	ASR95	24
Adversarial Evasion Attack	MGTBench Reuters	ASR73	24
Adversarial Evasion Attack	MGTBench WP	ASR90	24
Adversarial Evasion Attack	MGT-Academic Humanity	ASR87	22
Adversarial Evasion Attack	MGT-Academic Social Science	Attack Success Rate (ASR)98	22
Adversarial Evasion Attack	MGT Academic STEM	ASR100	22

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord