The Possibility of Determining Elizabethan Authorship by Statistical Analysis
The Shakespearean Authorship Society, 16, Autumn 1966
PROFESSOR M. G. KENDALL addressed the Society on this subject on Tuesday the 15th March, with Professor L. S. Penrose in the Chair. He said he had long felt that there was a mystery about the authorship of the Shakespearean plays, and it was one in which the statistician might be able to help.
The first step towards the modern methods of investigating problems of authorship was made by Augustus de Morgan, a Professor of Mathematics in London a hundred years ago. He suggested that it might be possible to identify an author by the length of the words he used. The idea lay fallow for many years, but was eventually taken up by an American, Mendenhall, some fifty years ago. Word length was simply defined as the number of letters in the printed word, and a count would be made, from the writings of an author, of the relative proportions of words of one, two, three, four and more letters. The proportions could be graphed, and would then form a curve which had characteristic features different for different authors; thus there were significant differences between the distributional curves of Dickens and of Thackeray. Mendenhall took the very large samples of 400,000 words from Shakespeare and 200,000 words from Bacon ; the curves, demonstrated on the blackboard, differed from each other. The peak in Shakespeare's case was at words of four letters, and fell away relatively rapidly to the right; Bacon's curve had a peak at three letters, but was a longer‑sloping curve, with higher proportions of words of seven letters and more. A comparison of Shakespeare's works at two different periods with those of Marlowe, showed that the latter was more like both the early and the late Shakespeare than the two were like to one another.
The next step was taken by Yule, a distinguished English statistical theorist, who spent the years of his retirement on problems of this kind. He studied not only word length, but also sentence length (which is capable of much wider variation), and working on mediaeval Latin writings, showed that sentence length was very characteristic of the individual. Yule succeeded in arousing interest both in literary and statistical circles, in the latter because the statistical theory he developed was applicable in other fields. A third point for study was later added with the filler words (pronouns, prepositions etc.), since authors differed in the proportions of such words used, and in their placing in the sentence. The Rev. A. Q. Morton aroused real excitement when, through the analysis of the Epistles of St. Paul in the Greek in respect of word length, sentence length and use of filler words, he threw doubt on the authenticity of some of the Epistles containing much cherished dogma. A further advance was made with the work of Professor Mosteller of Princeton on the Federalist papers. About 70 of these were known to be the work of Madison, and 20 the work of Hamilton, while a few were jointly written; but about the authorship of twelve of the papers there was much argument. In this case early trials were a failure. Both Madison and Hamilton used almost the same vocabulary, in the Johnsonian literary style of the period, with much the same word length and sentence length distributions, and the filler words not proving helpful. But it was eventually shown that some word forms were characteristic of each of the two possible authors, Hamilton preferring as a rule to write "while" and "on," Madison "whilst" and "upon." The question was settled in favour of Madison.
If we wish to compare the individual characteristic of, say, Shakespeare and Bacon, we must take into account that what we have of the first is mainly poetry on a dramatic theme, of the second prose on a philosophical theme. One must therefore find some feature which is constant to the author, even when he changes his subject matter and his treatment style. For this purpose Professor Kendall, together with Berners Lee and Jacob Bronowski, had examined 10,000 word samples of both the prose and the poetic writings of three authors distinguished in both verse and prose: Donne, Dryden and Eliot. The analysis was carried out with the aid of a computer, and covered the number of letters per word, the length of the sentence, the most frequently used words, and the position of filler words. Illustrations, shown on the blackboard, indicated that in each of these four features, differences were very much larger between styles used by the same author, than between authors using the same treatment style.
Professor Kendall pointed out that the use of such methods to solve problems of Shakespearean authorship faced formidable difficulties. The writings of Bacon and Shakespeare were, using known methods of analysis, not commensurable. In the cases of de Vere and Derby we didn't even have any adequate volume of writing to use for comparative purposes. With Elizabethan works, spelling was much more the printer's choice than the author's, and plays were set from manuscript and never edited by the author. However, the situation might look different when a more general study of works of that period had been made. Ph.D. students in the U.S.A. might well become attracted into this field, and would have both the leisure and the computer‑power to take our knowledge further. E.T.O.S.