- factor analysis
- principal components analysis
- multidimensional scaling
- cluster analysis

Although we will not attempt to explain the complex mathematics behind these techniques, it is worth taking time to understand the stages by which they work: All the techniques begin with a basic cross-tabulation of the variables and samples.

For **factor analysis** an intercorrelation matrix is then calculated from the cross-tabulation, which is used to attempt to "summarise" the similarities between the variables in terms of a smaller number of reference factors which the technique extracts. The hypothesis being that the many variables which appear in the original frequency cross-tabulation are in fact masking a smaller number of variables (the factors) which can help exaplain better why the observed frequency differences occur.

Each variable receives a **loading** on each of the factors which are extracted, signifying its closeness to that factor. For example, in analysing a set of word frequencies across several texts one might find that words in a certain conceptual field (i.e. religion) received high loadings on one factor, whereas those in another field (e.g. government) loaded highly on another factor.

Follow this link for an example of factor analysis.

**Correspondence analysis** is similar to factor analysis, but it differs in the basis of its calculations.

**Multidimensional scaling (MDS)** also makes use of an intercorrelation matrix, which is then converted to a matrix in which the correlation coefficients are replaced with rank order values. E.g. the highest correlation value recieves a rank order of 1, the next highest receives a rank order of 2 and so on. MDS then attempts to plot and arrange these variables so that the more closely related items are plotted closer together than the less closely related items.

**Cluster analysis** involves assembling the variables into unique groups or "clusters" of similar items. A matrix is created, in a similar fashion to factor analysis (although this may be a **distance matrix** showing the degree of *difference* rather than similarity between the pairs of variables in the cross-tabulation). The matrix is then used to group the variables contained within it.

*Read more about cluster analysis in Corpus Linguistics, Chapter 3, pages 76, 78 and 79.*