On Dimensionality of Latent Semantic Indexing for Text Segmentation
| Authors | |
|---|---|
| Year of publication | 2007 |
| Type | Article in Periodical |
| Magazine / Source | Proceedings of the International Multiconference on Computer Science and Information Technology |
| MU Faculty or unit | |
| Citation | |
| web | http://www.papers2007.imcsit.org/ |
| Field | Informatics |
| Keywords | text segmentation; LSI; latent semantic indexing |
| Description | In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set. |
| Related projects: |