**collocations**: Collocations are characteristic, co-occurence patterns of words. For example: "Christmas" may collocate with "tree", "angel", and "presents".

**cross-tabulation**: Put simply, this is just a table showing the frequencies for each variable across each sample. For example, the following table gives a cross-tabulation of modal verbs across 4 genres of text (labelled A, B, C, and D).

Modal Verb | Genre | |||

A | B | C | D | |

can | 210 | 148 | 59 | 89 |

could | 120 | 49 | 36 | 23 |

may | 100 | 86 | 15 | 46 |

might | 24 | 29 | 13 | 4 |

must | 43 | 34 | 12 | 28 |

ought | 3 | 4 | 0 | 1 |

shall | 12 | 4 | 0 | 10 |

**intercorrelation matrix**: This is calculated from a cross-tabulation (see above)and shows how statistically similar all pairs of variables are in their distributions across the various samples. The table below shows the intercorrelations between *can, could, may, might, must, ought* and *shall* taken from the table above.

Word | PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT | ||||||

can | could | may | might | must | ought | shall | |

can | 1 | 0.544 | 0.798 | 0.765 | 0.796 | 0.717 | 0.118 |

could | 0.544 | 1 | 0.186 | 0.782 | 0.807 | 0.528 | 0.026 |

may | 0.798 | 0.186 | 1 | 0.521 | 0.637 | 0.554 | 0.601 |

might | 0.765 | 0.782 | 0.521 | 1 | 0.795 | 0.587 | 0.032 |

must | 0.796 | 0.807 | 0.637 | 0.795 | 1 | 0.816 | 0.306 |

ought | 0.717 | 0.528 | 0.554 | 0.587 | 0.816 | 1 | 0.078 |

shall | 0.118 | 0.026 | 0.601 | 0.032 | 0.306 | 0.078 | 1 |

The closer the score is to 1, the better the correlation between the two variables. The relationship between *can* and *can* is 1, as they are identical. Some variables show a greater similarity in their distributions than others: for instance, *can* shows a greater similarity to *may* (0.798) than it does to *shall* (0.118).

**non-parametric test**: All statistical tests of significance belong to one of two distinct groups - parametric and non-parametric.

**Parametric**tests make certain assumptions about the data on which the test is performed. First, there is the assumption that the data is drawn from a normal distribution (see below), second that the data is measured on an interval scale (e.g. any interval between two measurements is meaningful - such as a person's height in cms). Thirdly, parametric tests make use of parameters such as the mean and standard deviation.**Non-parametric**tests make no assumptions at all about the population from which the data is drawn. Knowledge of parameters is not necessary either. These tests are generally easier to learn and apply.

**normal distribution**: A variable follows a normal distribution if it is continuous and if its frequency graph follows the characteristic, symmetrical, bell-shaped form in which all the values of mean, median and mode co-incide (see graph on the left).

**Type I and Type II errors**: Although we can be confident that the results of a significance test are accurate, there is always a small chance that the decision made might be wrong. There are two ways that this can occur:

**A Type I error**occurs when we decide the difference is significant (due to factors other than chance) when in fact it is not. The probability of this happening is the same as the significance level of the test. This is the most serious type of error to make (equivalent to a judge finding an innocent suspect guilty).**A Type II error**occurs when we decide that the difference is due to chance, when in fact it is not. This is not so serious relatively.