1976, Previously unpublished

The purpose of this essay is to provide numerical data on Shakespeare's vocabulary which, it is hoped, will be helpful in investigating proolems of dating and canonicity, on the present occasion only in the Roman plays. The basis of the study is a card index of all the woras given in two Shakespeare concordances, which appear two to ten times in all, in more than one play, and so link one play with another. They are, then, rare but not unique words; and experience has shown that they have a very general tendency to come into use, with more than average frequency over the whole of the canon, at one phase of Shakespeare's writing (for the vocaoulary of the poems as linked in time with that of the plays) and then to be abandoned or to decline in frequency. By this process the individual words come to have a timed maximum o probability of appearance, a “time signature”, as it were. This is not an original observation, but emerges from the pioneer work on Shakespeare's vocabulary by Alfred Hart (1, 2).

    The card index was prepared first on the basis of Bartlett's concordance, and then enlarged and in part corrected by comparison with the Harvard concordance edited by Marvin Spevack (second printing, 1974). It was compiled as far as possible without recourse to personal judgments. The principles applied in the Oxford English Dictionary were relied on. Inflexions of a word, declension, conjugation, etc., were not taken to create a word‑difference; but homonyms were taken to be different words if they were different parts of speech. Then the quastior arose whether a single verbal form having different meanings was to be taken as one word or two or more, recourse was had to the Shorter Oxford English Dictionary or to the "Shakespeare Glossary" by C. T. Onions (Oxford, 1911[1969]). Hyphenated words were taken as single compound words; and if the Bartlett and Spevack Concordances departed from one another in hyphenation, the hyphenated form from either source was preferred to the single words. The effect of this was somewhat to enlarge the total numer of words available for statistical analysis. This was a general principle followed throughout, and led to a preference for the spellings used in Spevack' s Concordance to those used in Bartlett's. In the Globe edition, used by Bartlett, spellings were universally and systematically modernised; but by no means so completely in the modern‑spelling text of The Riverside Shakespeare (1974) which was the textual basis for Spevack's work. At times difficulties arose owing to differences between the sources used by the two concordances in the reading adopted of a doubtful or corrupt passage. An example is the word "fere” which links Titus with Pericles act 1 in the Bartlett Concordance but not in Spevack's, in which it appears only in Titus and in Pericles is replaced oy "peer". In such cases it was the writer's practice to consult the Cambridge Shakespeare and to accept the reading there. In this particular case the Cambridge editor, J. C. Maxwell, has "fere” in both plays, but in his notes discusses the two alternatives in Pericles.


    Table I shows the distribution of the 5,604 words of the card index in the nine classes of those occurring twice, thrice, etc., up to ten times. For convenience we can refer to these words as "two‑words', “three-words”, etc. It will be noticed that as these rare words increase in commonness they decrease in nunber. Unique words and words occurring in one play only were of no use for the study of linking, and so have not been recorded. According to Alfred Hart (2) there are far more of these words than of any commoner ones; and he gives the total of “one‑play words” as 6,738, or, if the poems are included, the total number of one‑opus words as 7,219.

    The numbers of two‑words, three‑words, etc., in the second column of Table 1 diminish by an almost equal proportion from one step to the next, and, in fact, are in nearly regular geometric progression. This can be visualised by substituting them by their logs, in column three, and plotting them on graph paper. A straight line can be fitted to these points by the method of least squares, and the calculated line (y = 3.41157 ‑ 0.162627x, in which x varies between 2 and 10 – see figure) passes close to the observed points. These are not idle observations; for on a priori grounds one can be fairly sure that the gradient of the line, given by the coefficient ‑ 0.162627, is author‑specific, and that a similar exercise carried out on the concordances of other authors would show different gradients.

    In Table 2 the words of the card index are taken together into two groups, the rarer words occurring two to six times and the less rare words occurring seven to ten times. The second and seventh columns show the number of these words in each of the plays of the canon; and the third to sixth columns, respectively the eighth to eleventh columns, show the number of their corresponding appearances in the four Roman plays. These are primary observations and can be checked by the independent observer. (A copy of all the words in the card index can be made available on request.)

    From the numbers in Table 2 the numbers in Table 3 can be calculated. This table shows the propoitionat e relationship between the number of index words of one play of the canon appearing in one of the Roman plays to the num­ber expected on a chance basis. For instance there are 420 words in Titus in the two‑ to six‑word classes which occur in 992 places elsewhere in the canon. In the first two acts of Pericles there are 119 words of these classes occurring elsewhere. There are 18 words common to the two lists, which, compared with the expected 7.95 shows an excess amounting to 226 per cent. The statistical significance of such observations in excess is conveniently tested by χ2. The deviations reaching a significance beyond the 0.05 limit of chance are exhibited in bold numbers, the limit of significance surpassed being indicated by a superscript. The logical basis of the χ2 tests is briefly explained in Appendix 1.

Titus Andronicus                   

The analysis shows that the vocabulary of Titus insofar as the rare words are concerned, is intimately linked with all the early plays of the canon down to Richard II with the exception of The Comedy of Errors and Love's Labour's Lost. Why there is no indication of linkage with these two plays ‑ with the latter indeed signs of a negative association ‑ poses a problem which should meet with some further examination. Beyond pointing to an anomaly the present data give us no further help. But the strong linkage with no fewer than eight early plays is a massive statistical fact which it would be very hard to reconcile with non‑Shakespearean authorship of Titus.

    Titus shows no sign of linkage with any other of the Roman plays. It would seem that by the time Shakespeare come to write Caesar, Antony and Coriolanus the specificities of his early vocabulary were left far behind and were not to be reawakened by any similarity in the setting.

     A feature of the Titus vocabulary which must arouse surprise is the kinship with the first two acts (but not the last three acts) of Pericles. The implication is, of course, that there is some early Shakespearean work in the first two acts of Pericles which has survived all the processes of erosion and corruption to which those two acts may have been subjected. It is not the generally accepted view that Shakespeare had anything at all to do with the first two acts of Pericles; but there are some good judges who believe it possible. Philips Edwards (3) attributed the very great differences in merit and style between the first two and the last three acts to the greater and lesser damage inflicted on the text by two different reporters. He advanced the hypothesis that the original play of Pericles could have been all of one standard, all by one author. And he concluded "but we can say that it would be a strange coincidence if the areas of the play covered by the two reporters exactly corresponded to the work of two distinct dramatists." This suggestion has found a more recent echo. James O. Wood (4) traces parallels to images in Pericles in other Shakespearean works, namely Titus, Venus, Lucrece, Macbeth, Richard II and John. In conclusion he writes "much of the vocabulary and imagery of the first part of Pericles is indigenous to Shakespeare's early work. Since passages that have been adduced as showing corruption, reporting and signs of an alien hand, appear on further in­spection to be Shakespearean, sometimes distinctively so, it seems worth inquiring again whether the whole play cannot be an apprentice work to vhich the poet later, at the height of his powers, added touches in the last three acts."

    The statistical evicence reported here certainly supports the view that the first two acts of Pericles might be apprentice work by Shakespeare, especially as their lose relative Titus is also primitive stuff. But the evidence of the vocab­ulary is not at all in favour of the hypothesis that the last three acts of Pericles were written out of the same authorial word treasury as the first two. Unpublished work by the present writer indicates that the last three acts of Pericles came out of a much later vocabulary, more closely related with The Tempest than with any other play. However, conclusions suggested by stat‑istical evidence have to find support from other areas to prove acceptable in the long run. The words linking Titus with Pericles acts 1 and 2 are given in Appendix 2.

Julius Caesar

Tables 2 and 3 exhibit the surprising fact that the rare‑word vocabulary of Julius Caesar shows very little linking with Antony and Cleopatra but a strong linkage with Coriolanus, these last two plays being strongly linked with one another. The fact that the links between Caesar and Antony are not only not in excees but actually in deficiency and no more than about 80 per cent of the expected figure os very surprising. Julius Caesar is remarkably barren of links with other plays altogether, the only other one with which it shows a statistically significant degree of linking in the two‑ to six-word class is the second part of Henry IV.  

    Julius Caesar is recognised as anonalous in the Shakespeare canon in several respects. It is exceptionally free from bawdy. Its poverty of imagery was pointed out by Caroline Spurgeon (5), having, by her count, less than half the number in Coriolanus, and less than one third of the number in Antony and Cleopatra. There is also a striking paucity of the image clusters recognised as characteristic of Shakespeare. The present writer has only found one of them, the flattery‑cur cluster in two places, and even then lacking its third distinctive member, sweets.

    The counts carried out by Hart (1, 2) show Julius Caesar to be poverty‑stricken, compared with other plays. In richness of vocabulary, as measured oy the number of aifferent words per 100 lines, JC is bottom but one of the list (see appendix ) with 91 words per 100 lines sur­passing only Richard III with 90. Julius Caesar is not only lacking in richness of vocabulary but also poor in unique or what Hart calls peculiar words, with only 70, fewer than any other play (appendix 3).

    It may be that the uncharacteristic qualities of Snakespeare's vocabulary in Julius Caesar have an easy and acceptable explanation (if, for instance, he large confined himself to the vocaoulary of his sources?)

Antony and Cleopatra

The rare‑word vocabulary of Antony shows positive associations with all the later plays from Hamlet onwards, with the ex­ception of Measure for Measure and Lear. The strongest and statistic­ally most significant linking is with The Tempest, then with Cymbeline, then with Macbeth and Coriolanus. The close kinship of the rare‑word vocabulary of Antony with that of The Tempest may be thought surprising. The linking words are specified in Appendix 2.


Coriolanus shows strong resemblances with Antony in its affiliations, to which it is itself strongly linked. Its links with Cymbeline are even stronger; and, like Antony, it has significant connections with All's Well, Macbeth and Winter's Tale.  Unlike Antony it is connected with Julius Caesar, but not to a significant degree with The Tempest.


     The results of this enquiry have been in part very much what might have seen expected in showing an abundance of links between Antony and Coriolanus and also between Julius Caesar and Coriolanus. But it is very surprising that links between Julius Caesar and Antony are in deficiency rather than statistical excess. The rare‑word vocabulary of Julius Caesar is anomalous in other ways, its relative poverty and its lack of connecting links with nearly all other plays of the canon. The extensive linking of Titus fortifies its now eneraliy accepted canonicity, and indeed argues against any extensive admixture of non‑Shakespearean work. The real surprises disclosed by this study are the stat­istically highly significant links between Titus and the first two acts of Pericles and, to a lesser degree, the linking between Antony and The Tempest. Both of these findings suggest a need for further exploration.


