collocations: Collocations are characteristic, co-occurence patterns of words. For example: "Christmas" may collocate with "tree", "angel", and "presents".
cross-tabulation: Put simply, this is just a table showing the frequencies for each variable across each sample. For example, the following table gives a cross-tabulation of modal verbs across 4 genres of text (labelled A, B, C, and D).
Modal Verb | Genre | |||
A | B | C | D | |
can | 210 | 148 | 59 | 89 |
could | 120 | 49 | 36 | 23 |
may | 100 | 86 | 15 | 46 |
might | 24 | 29 | 13 | 4 |
must | 43 | 34 | 12 | 28 |
ought | 3 | 4 | 0 | 1 |
shall | 12 | 4 | 0 | 10 |
intercorrelation matrix: This is calculated from a cross-tabulation (see above)and shows how statistically similar all pairs of variables are in their distributions across the various samples. The table below shows the intercorrelations between can, could, may, might, must, ought and shall taken from the table above.
Word | PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT | ||||||
can | could | may | might | must | ought | shall | |
can | 1 | 0.544 | 0.798 | 0.765 | 0.796 | 0.717 | 0.118 |
could | 0.544 | 1 | 0.186 | 0.782 | 0.807 | 0.528 | 0.026 |
may | 0.798 | 0.186 | 1 | 0.521 | 0.637 | 0.554 | 0.601 |
might | 0.765 | 0.782 | 0.521 | 1 | 0.795 | 0.587 | 0.032 |
must | 0.796 | 0.807 | 0.637 | 0.795 | 1 | 0.816 | 0.306 |
ought | 0.717 | 0.528 | 0.554 | 0.587 | 0.816 | 1 | 0.078 |
shall | 0.118 | 0.026 | 0.601 | 0.032 | 0.306 | 0.078 | 1 |
The closer the score is to 1, the better the correlation between the two variables. The relationship between can and can is 1, as they are identical. Some variables show a greater similarity in their distributions than others: for instance, can shows a greater similarity to may (0.798) than it does to shall (0.118).
non-parametric test: All statistical tests of significance belong to one of two distinct groups - parametric and non-parametric.
normal distribution: A variable follows a normal distribution if it is continuous and if its frequency graph follows the characteristic, symmetrical, bell-shaped form in which all the values of mean, median and mode co-incide (see graph on the left).
Type I and Type II errors: Although we can be confident that the results of a significance test are accurate, there is always a small chance that the decision made might be wrong. There are two ways that this can occur: