Introduction

In this session we'll be looking at the techniques used to carry out corpus analysis. We'll re-examine Chomsky's argument that corpus linguistics will result in skewed data, and see the procedures used to ensure that a representative sample is obtained. We'll also be looking at the relationship between quantitative and qualitative research. Although the majority of this session is concerned with statisitical procedures which can be said to be quantitative, it is important not to ignore the importance of qualitative analyses.

It should be noted, that with the statistical part of this session two points should be made.

First, that this section is of necessity incomplete. Space precludes the coverage of all of the techniques which can be used on corpus data.
Second, we do not aim here to provide a "step-by-step" guide to statistics. Many of the techniques used are very complex and to explain the mathematics in full would require a separate session for each one. Other books, notably Language and Computers and Statistics for Corpus Linguistics (Oakes, M. - forthcoming) present these methods in more detail than we can give here.

However, we will try to answer any questions that you have about anything in this session if you mail us using the "mail feedback" facility.

Tony McEnery, Andrew Wilson, Paul Baker.