Release Strategies and the Social Impacts of Language Models
About
Large language models have a range of beneficial uses: they can assist in prose, poetry, and programming; analyze dataset biases; and more. However, their flexibility and generative capabilities also raise misuse concerns. This report discusses OpenAI's work related to the release of its GPT-2 language model. It discusses staged release, which allows time between model releases to conduct risk and benefit analyses as model sizes increased. It also discusses ongoing partnership-based research and provides recommendations for better coordination and responsible publication in AI.
Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, Jasmine Wang• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine-generated text detection | MGT benchmark Essay | AUROC98.3 | 129 | |
| LGT Detection | Fast-DetectGPT PubMed (test) | AUROC0.884 | 96 | |
| LGT Detection | Fast-DetectGPT XSum (test) | AUROC97.9 | 96 | |
| LGT Detection | XSum Fast-DetectGPT benchmark | AUROC97.9 | 54 | |
| LGT Detection | WritingPrompts-small Fast-DetectGPT benchmark | AUROC97.6 | 54 | |
| LGT Detection | WritingPrompts small Fast-DetectGPT benchmark (test) | AUROC97.6 | 54 | |
| LGT Detection | PubMed Fast-DetectGPT benchmark | AUROC0.878 | 54 | |
| LGT Detection | MGTBench WritingPrompts | AUROC97.3 | 45 | |
| Machine-generated text detection | MGT benchmark Reuters | AUROC99.9 | 45 | |
| LGT Detection | Fast-DetectGPT WP-s (test) | AUROC98.1 | 42 |
Showing 10 of 140 rows
...