Multi-Paragraph Segmentation of Expository Text
About
This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts.
Marti A. Hearst• 1994
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Segmentation | Choi's Dataset (3-5) | Pk44 | 17 | |
| Text Segmentation | Choi's Dataset (6-8) | Pk37.6 | 17 | |
| Text Segmentation | Choi's Dataset (9-11) | Pk31.1 | 17 | |
| Text Segmentation | Choi's Dataset 3-11 | Pk31.7 | 17 | |
| Text Segmentation | Wiki-50 | Pk38.9 | 15 | |
| Text Segmentation | Elements | Pk49.6 | 15 | |
| Text Segmentation | Wiki-300 | Pk38.1 | 14 | |
| Text Segmentation | arXiv | Pk27.1 | 13 |
Showing 8 of 8 rows