Collocations

The idea of collocations is an important one to many areas of linguistics. Khellmer (1991) has argued that our mental lexicon is made up not only of single words, but also of larger phraseological units, both fixed and more variable. Information about collocationals is important for dictionary writing, natural language processing and language teaching. However, it is not easy to determine which co-occurences are significant collocations, especially if one is not a native speaker of a language or language variety.

Given a text corpus it is possible to empirically determine which pairs of words have a statistically significant amount of "glue" between them. Two formulae are available: mutual information and the Z-score. Both tests provide similar data, comparing the probablities that two words occur together as a joint event (i.e. because they belong together) with the probability that they are simply the result of chance. For example, the words riding and boots may occur as a joint event by reason of their belonging to the same multiword unit (riding boots) while the words formula and borrowed may simply occur because of a one-off juxtaposition and have no special relationship. For each pair of words, a score is given - the higher the score the greater the degree of collocality.

Mutual information and the Z-score are useful in the following ways:

Read about the use of mutual information in parallel aligned corpora in Corpus Linguistics, Chapter 3, page 73.