Log-linear Models

Here we will consider a different technique which deals with the interrelationships of several variables. As linguists, we often want to go beyond the simple description of a phenomenon, and explain what it is that causes the data to behave in a particular way. A loglinear analysis allows us to take a standard frequency cross-tabulation and find out which variables seem statistically most likely to be responsible for a particular effect.

For example, let us imagine that we are interested in the factors which influence whether the word for is present or omitted from phrases of duration such as She studied [for] three years in Munich. We may hypothesise several factors which could have an effect on this, e.g. the text genre, the semantic category of the main verb and whether or not the verb is separated by an adverb from the phrase of duration. Any one of these factors might be solely responsible for the omission of for, or it might be the case that a combination of factors are culpable. Finally, all the factors working together could be responsible for the presence/omission of for. A loglinear analysis provides us with a number of models which take these points into account.

The way that we test the models in loglinear analysis is first to test the significance of associations in the most complex model - that is the model which assumes that all of the variables are working together. Then we take away each variable at a time from the model and see whether significance is maintained in each case, until we reach the model with the lowest possible dimensions. So in the above example, we would start with a model that posited three variables (e.g. genre, verb class and adverb separation) and test the significance of a three variable model. Then we would test each of the two variable models (taking away one variable in each case) and finally each of the three one-variable models. The best model would be taken to be the one with the fewest number of variables which still retained statistical significance.

Read about variable rule analysis and probabilistic language modelling in Corpus Linguistics, Chapter 3, pages 83-84.