Glossary

Click on the word you looked up to go back to the page you were previously at.

collocations: Collocations are characteristic, co-occurence patterns of words. For example: "Christmas" may collocate with "tree", "angel", and "presents".

cross-tabulation: Put simply, this is just a table showing the frequencies for each variable across each sample. For example, the following table gives a cross-tabulation of modal verbs across 4 genres of text (labelled A, B, C, and D).

Modal Verb	Genre
Modal Verb	A	B	C	D
can	210	148	59	89
could	120	49	36	23
may	100	86	15	46
might	24	29	13	4
must	43	34	12	28
ought	3	4	0	1
shall	12	4	0	10

intercorrelation matrix: This is calculated from a cross-tabulation (see above)and shows how statistically similar all pairs of variables are in their distributions across the various samples. The table below shows the intercorrelations between can, could, may, might, must, ought and shall taken from the table above.

Word	PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT
Word	can	could	may	might	must	ought	shall
can	1	0.544	0.798	0.765	0.796	0.717	0.118
could	0.544	1	0.186	0.782	0.807	0.528	0.026
may	0.798	0.186	1	0.521	0.637	0.554	0.601
might	0.765	0.782	0.521	1	0.795	0.587	0.032
must	0.796	0.807	0.637	0.795	1	0.816	0.306
ought	0.717	0.528	0.554	0.587	0.816	1	0.078
shall	0.118	0.026	0.601	0.032	0.306	0.078	1

The closer the score is to 1, the better the correlation between the two variables. The relationship between can and can is 1, as they are identical. Some variables show a greater similarity in their distributions than others: for instance, can shows a greater similarity to may (0.798) than it does to shall (0.118).

non-parametric test: All statistical tests of significance belong to one of two distinct groups - parametric and non-parametric.

Parametric tests make certain assumptions about the data on which the test is performed. First, there is the assumption that the data is drawn from a normal distribution (see below), second that the data is measured on an interval scale (e.g. any interval between two measurements is meaningful - such as a person's height in cms). Thirdly, parametric tests make use of parameters such as the mean and standard deviation.
Non-parametric tests make no assumptions at all about the population from which the data is drawn. Knowledge of parameters is not necessary either. These tests are generally easier to learn and apply.

normal distribution: A variable follows a normal distribution if it is continuous and if its frequency graph follows the characteristic, symmetrical, bell-shaped form in which all the values of mean, median and mode co-incide (see graph on the left).

Type I and Type II errors: Although we can be confident that the results of a significance test are accurate, there is always a small chance that the decision made might be wrong. There are two ways that this can occur:

A Type I error occurs when we decide the difference is significant (due to factors other than chance) when in fact it is not. The probability of this happening is the same as the significance level of the test. This is the most serious type of error to make (equivalent to a judge finding an innocent suspect guilty).
A Type II error occurs when we decide that the difference is due to chance, when in fact it is not. This is not so serious relatively.