The non-finite nature of language

All the work of early corpus linguistics was underpinned by two fundamental, yet flawed assumptions: The corpus was seen as the sole source of evidence in the formation of linguistic theory - "This was when linguists...regarded the corpus as the sole explicandum of linguistics" (Leech, 1991).

To be fair, not all linguistics at the time made such bullish statements - Harris [1951) is probably the most enthusiastic exponent of this point, while Hockett [1948] did make weaker claims for the corpus, suggesting that the purpose of the linguist working in the structuralist tradition "is not simply to account for utterances which comprise his corpus" but rather to "account for utterances which are not in his corpus at a given time."

The number of sentences in a natural language is not merely arbitrarily large - it is potentially infinite. This is because of the sheer number of choices, both lexical and syntactic, which are made in the production of a sentence. Also, sentences can be recursive. Consider the sentence "The man that the cat saw that the dog ate that the man knew that the..." This type of construct is referred to as centre embedding and can give rise to infinite sentences. (This topic is discussed in further detail in "Corpus Linguistics" Chapter 1, pages 7-8).

The only way to account for a grammar of a language is by description of its rules - not by enumeration of its sentences. It is the syntactic rules of a language that Chomsky considers finite. These rules in turn give rise to infinite numbers of sentences.