Frequency Counts

This is the most straight-forward approach to working with quantitative data. Items are classified according to a particular scheme and an arithmetical count is made of the number of items (or tokens) within the text which belong to each classification (or type) in the scheme.

For instance, we might set up a classification scheme to look at the frequency of the four major parts of speech: noun, verb, adjective and adverb. These four classes would constitute our types. Another example inolves the simple one-to-one mapping of form onto classification. In other words, we count the number of times each word appears in the corpus, resulting in a list which might look something like:

abandon: 5
abandoned: 3
abandons: 2
ability: 5
able: 28
about: 128
etc.....

More often, however, the use of a classification scheme implies a deliberate act of categorisation on the part of the investigator. Even in the case of word frequency analysis, variant forms of the same lexeme may be lemmatised before a frequency count is made. For instance, in the example above, abandon, abandons and abandoned might all be classed as the lexeme ABANDON. Very often the classification scheme used will correspond to the type of linguistic annotation which will have already been introduced into the corpus at some earlier stage (see Session 2). An example of this might be an analysis of the incidence of different parts of speech in a corpus which had already been part-of-speech tagged.