Computing Idioms Frequency in Text Corpora

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

BUŠTA Jan

Year of publication 2008
Type Article in Proceedings
Conference Proceedings of Recent Advances in Slavonic Natural Language Processing 2008
MU Faculty or unit

Faculty of Informatics

Citation
Web https://nlp.fi.muni.cz/raslan/2008/papers/12.pdf
Field Linguistics
Keywords frequency of idioms; headwords; text corpora; czech language
Description The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language. The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.