Word links with 'The Tempest', 'Pericles' and 'Cymbeline'

August 1975, Previously unpublished

    Alfred Hart, from his enumeration (1) of the total number of different words in Shakespeare's plays and poems, concluded that Shakespeare's voc­abulary changed gradually over the course of his working life, new words being adopted or invented with every new work, while others were dropped, for the time being or altogether. This suggests that the degree of contemporaneity of two plays might be measured by the degree of community of their vocabularies, a view that seems to be widely accepted and provides one basis for research into parallels of diction. This hypothesis was the basis of two previous communications by the present writer, one (2) relating the poems to the plays, and the other (3) specifically relating the vocabulary of The Merry Wives of Windsor to that of other plays. The result here was to support the view of Mr. H. J. Oliver, the Arden editor, that this play was written at the same time as 2 Henry IV, a placing that is adopted in the paper that follows.

    It is generally considered that The Tempest is the last of Shakespeare’s plays written wholly by him, so that it provides a suitable point from which to start with the aim of working backwards in time, to the last play but one in time of composition, and the next in the series to that, etc. It is hoped in this way to throw an independent sidelight on the chronology of the plays, whose main outline we owe to Chambers.

    Bartlettts Concordance was examined for words occurring in The Tempest and providing ten or fewer citations in other plays. The statistics of Shakespeare's vocabulary shows that in the time‑related sequence of plays the more rarely used words tend to cluster more densely than the commoner ones, and so are more useful for dating. The statistical enumeration was in terms of citations, and if in one citation the index word occurred more than once this was ignored. In deciding whether two words were the same or not, Hart followed the usage of the Oxford English Dictionary, which is also followed here. The fact that Bartlett did not use the same model does not add. seriously to the labours of the enumeration, though it proves tiresome. In essence, different parts of speech are taken to be different words even though identically spelt; different inflexions of a word do not constitute a difference; homonyms of quite distinct meaning are taken to be different words. No words with an initial capital were included in the count.



    As is shown in Table I, words giving rise to 1, 2, 3 ... 10 citations were handled separately. These are the raw data, which can be used by anyone who wishes to check the statistical analysis that follows. The first column shows the abbreviated names of the plays and the one next to it shows the number of different words in the play according to Hartes enumeration given in his Table IV, p. 249 in the second of his two papers of 1943. Succeeding columns show the distribution of the citations. Hart gives 2442 as the num­ber of different words in Pericles but as we wish to distinguish the first two acts, which are generally considered not to be the work of Shakespeare, from the last three acts, which are generally thought to be authentic, we must allow each of these two parts of the play its own number of different words. We can estimate these two numbers from the relative length of the two parts. There are 2368 lines in Pericles (Cambridge edition), 1049 in Acts I and II, and 1351 in Acts III ‑ V. Dividing the total of 2442 different words of the n1y proportionately, we allow 1049 words to Acts I and II, and 1393 to Acts III ‑ V.

    According to the null hypothesis, to be disproved, the citations of any particular word should be distributed at random, the chance of any indiviuai play contaiing a citation being proportionate to the total number of different words in the play. Thus, excluding the number of different words in The Tempest (2562), the numbers for the other plays total up to 101031. From Table I we see that 81 words appeared again once only in other plays than The Tempest 59 appeared twice, 61 three times and 44 four times, providing us in all with 81 + 118 + 183 + 176 = 558 citations, taking all the first four numbered columns of Table I together. We calculate, then, that the total number of citations to be expected from Cymbeline is 558 x 3260 ÷ 101031 = 18.005. The number actually found (1 + 6  11 + 14) = 32, which is l78 of the expected number. We test the statistical significance of the deviation by calculating χ2 (32 ‑ 18.005)2/18.005 = 10.87; and statistical tables inform us that the probability of reaching so large a measure of the deviation of observation from expectation is less than 0.001. The figures of Table II were reached in this way. In order not to over­burden this Table with irrelevant information, only those plays are included in it where observation consistently exceeded expectation whether few or more of the separate columns of Table I were taken together.

    From Table II we can see that whether we take a narrow or a wide span of word frequencies, there are very many statistically highly significant corn­munalities of vocabulary between The Tempest and nearly all the plays of the latter third of the canon, back as far as Troilus but not including Measure for Measure, Othello, Macbeth and Timon nor, not unexpectedly, Henry VIII. 

     It is desirable to form some notion of a rating which would place the plays showing a positive connection with The Tempest as shown in Table II, in order in their degree of resemblance in vocabulary. For this purpose we can take an average of the seven different measures shown in the successive columns of Table II. Taking an average gives a differential weight to the words from which these numbers were derived, the greatest weight going to the words of least frequent occurrence, as indeed seems desirable. Taking the averages, then, the plays arrange themselves in the following order: Pericles Acts III ‑ V, 179; Cymbeline 154; Antony 152; Winter's Tale 144; Troilus 143; Lear 134; Coriolanus 133; Midsummer Night's Dream 132; All's Well 123; Henry V 123. The picture presented has its surprises, particularly in the relatively high place taken by Troilus not usually thought near in time of composition to The Tempest and still more in the actual appearance at all in this Table of such distant plays as Midsummer Night's Dream and Henry V. These are anomalies which we can note without being compelled to explain, putting them aside until further work on vocabulary kinship provides a wider context.

    For the time being we content ourselves with the first step. Pericles III ‑ V has nearly double its expected number of word links [1] with The Tempest and shows a larger and more consistently expressed excess of linkage than any other of the plays, outdistancing its runner‑up, Cymbeline by a wider margin than appears further down the list. Our next step is, then, to see where we get if we repeat, with these three acts of Pericles the exercise undertaken with The Tempest.

    The primary data obtained from these counts are shocn in Table III, in which the index words obtained from the non‑Gower parts have been kept separate from the index words in the Gower parts. Certain index words

('barge', 'din', 'travail', 'weave') occur in both Gower and non‑Gower parts, and these words have accordingly been used twice over. The fact that there are only five of them in itself suggests that the vocabularies of Gower and non‑Gower are to some extent different, and might show different kinships. What these kinships are is shown in Table IV.

    For this table expected numbers have been calculated in the same way as for Table II, but with a changed denominator, i.e. 103593 ‑ 2442 =101151. In this analysis both the Gower and the non‑Gower parts show a high degree of kinship with The Tempest observed numbers being consistently more than double the expected ones; but in other res sects they differ markedly.

    We may first consider the relationships with The Tempest. As the count taken from The Tempest led us to Pericles III‑V, it is hardly surprising that there is also a clear path in the opposite direction. But it must be borne in mind, that the relationship TempestPericles as it has been estimated here, is not symmetrical. For instance the word occurs four times in The Tempest and once only in Pericles. This means that it gave only one Pericles link for The Tempest in Table I, but four Tempest links for Pericles in Table III. Furthermore had there been, say, 8 citations of this word in other plays it would have been excluded from the Pericles index words, for having more than 10 citations,while keeping its place in Tempest index words where it would have provided only a total of 9. This is, of course, an arbitrary difference in procedure; but statistical techniques depend on the impartial application of defined operations however arbitrary they may seem, the essential requirement being that subjective judgments are excluded as far as possible. There is yet a further asymmetry in the TempestPericles relationship. In calculating the number of links with Pericles which we could expect from Tempest index words, we had to estimate the number of different words in the first two and in the last three acts. The estimate was based on Hart's count of 2442 words in the whole play, but did not itself involve a direct count. If the total number of different words in Acts III‑V were in fact larger than our estimate of 1393, the numbers of expected links would have been larger, and the excesses of observation over expectation would have been smaller in the Pericles row of Table II. However when we count the number of Tempest links with Pericles index words, it is the total number of different words in The Tempest for which we have Hart's direct enumeration, that comes into the calculation, and not the number of different words in Pericles III‑V. which is irrelevant. So when we see a great excess of Pericles links with The Tempest and also a great excess of Tempest links for Pericles the two observations are to some extent independent, and there­fore confirm one another, each separately supporting the evidence of kinship.

    Returning to Table IV, we note that the great richness of statistically significant positive deviations seen in Table II no longer appears. There are 42 of them in Table II, only 17 in Table IV. This is in part a necessary consequence of the smaller amount of material available, 1464 citations for Pericles III-V, as against 2086 for The Tempest. But the difference seems too marked to be wholly accounted for in that way. The relative badness of the Pericles text may well have brought in sources of random error to dilute and weaken the information. If, for instance, a reporter has used a word of his own instead of a Shakespearean one an item of information is thereby lost.

    Comparing Table IV with Table II, some of the surprising features of Table II are no longer showing. Henry V, Troilus, Lear, Antony, Coriolanus,  and also The Winter's Tale no longer provide any statistically significant excess of linkage. The association with A Midsummer Night's Dream is lost from the non‑Gower parts of Pericles III‑V but still shows up highly signific­antly in the Gower part. However the association with Cymbeline is main­tained, as if this were nearer to the last plays than any other of the late plays; and in due course we shall take Cymbeline as the next stepping off point after Pericles.

     Table IV has other anomalous features. Not only is there a deficiency of linkage with important members of the late plays, there is also the intrusive appearance of earlier ones: All's Well and, very surprisingly, Romeo. [2] These can be considered in connection with the unexpected appearances in the Gower column of Table IV, i.e. the kinship shown with The Taming of the Shrew, Love's Labour's Lost and A Midsummer Night's Dream. [3] These are all plays of the second sixth of Shakespeare's output in the Chambers chronology, and therefore provide a suggestion that the Gower parts of Pericles III‑V, and conceivably therefore also some of the non‑Gower parts, were written at that epoch. The connection with The Tempest for Gower as well as for non‑Gower, in conformity with scholarly opinion, falls in with a much later epoch. From these associations taken together there emerges a suggestion that Shakespeare's contribution to these three acts might have been made at different times, i.e. that he worked on the play for a time, dropped it, and then took it up again much later. This is of course a speculative explanation of the statistical data which would have to be con­sidered, if at all, in the context of a great amount of other evidence relating to tne play which will be in the mind of scholars. It is very likely that other explanations will have to be taken into account; and it is possible that some of these deviations from chance expect­ation may merely be unlikely events, that have happened to occur.

    The anomalies showing up in Pericles V suggest the desirability of taking a look also at the first two acts of the play. The primary count is shown in Table V, and the statistical analysis in Table VI. The remarkable result here is to show a general positive association of the vocabulary of these two acts with that of the earliest Shakespeare plays, and, particularly, consistent statistically significant above‑expectation linkage with the vocabulary of Titus Andronicus [4] over the entire range of word frequency. There are also statistically signi­ficant associations with All's Well which remind one that in their time scholars of a past era identified All's Well or a first draft of that play, with the early and lost Love's Labour's Won.

    The authorship of Pericles both in whole and in part, the possible existence of an earlier play now lost, the degree of corruption of the text which can be attributed to reporters, have been the subject of learned discussion on which it would not be appropriate for the present writer to comment. The balance of informed opinion has gone against any substantial contribution by Shakespeare to the first two acts at any stage, even before the impact of sources of corruption. Thus Mr. Hoeniger (4) writes: "The rnajority of present‑day scholars believe that Shakespeare had little or nothing to do with Acts I and II, out that he wrote either completely or in large part Acts III ‑ V.”  And Mr. Maxwell (5): "Shakespeare's hand is present and predominant in Acts 3 ‑ 5, out scarcely if at all detectable in Acts 1 ‑ 2."

    These opinions are expressed with some caution; and there are also some cautious dissenters. Professor Philip Edwards (6) attributed the very great differences in merit between the first two and the last three acts to the greater and the lesser damage inflicted on the text by two different reporters. He advanced the hypothesis that the origin play of Pericles could have been all of one standard, all by one author.  And he concluded, "we can say that it would be a strange coincidence if the areas of the play covered by the two reporters exactly corresponded to the work of two distinct dramatists." This suggestion has found a more recent echo. James O. Wood (7) traces parallels to images in Pericles in other Shakespearean works, namely Titus, Venus, Lucrece, Macbeth, Richard II and John. In conclusion he writes: "much of the vocabulary and imagery of the first part of Pericles is indigenous to Shakespeare's early work. Since passages that have been adduced as showing corruption, reporting, and signs of an alien band, appear upon further inspection to be Shakespearean, sometimes distinctively so, it seems worth inquiring again whether the whole play cannot be an apprentic work to which the poet later, at the height of his powers, added touches in the last three acts." To these opinions the statistical evidence reported here would seem to provide some support.

   We proceed now from Pericles to Cymbeline. The primary data are shown in Table VII and the results of the analysis in Table VIII. Cymbeline has a rich vocabulary (see Table I), being exceeded in this respect only by Hamlet, Troilus and Lear. Accordingly it provides a large nunoer of links with other plays and many statistically significant associations; there are 46 of them in Table VIII, extending in a substantial degree as far back as Henry V. Nevertheless among the plays succeeding Hamlet both Measure for Measure and Othello are poorly represented, and Timon is absent.  These are unexpected anomalies, no less than the high place taken by Hamlet in degree of association. Why should its vocabulary have so much in common with that of Cymbeline, when it must be more distant than several of the lower ranking plays? To such questions an answer can only be obtained from further research.

    There are eleven plays which show above‑expectation linkage with Cymbeline consistently over the entire range of word frequency, i.e. in all the nine columns of Table VIII. If we take averages for them, they arrange themselves in the following order: Winter's Tale 144, Hamlet 143, Coriolanus 140, Lear 139, Antony 135, Pericles III-V 135, Tempest 133, Troilus 131, All’s Well 123, Henry V 122, Macbeth 117.

    Of all these eleven plays The Winter's Tale [5] comes closest to Cymbeline in vocabulary, and would be chosen as the next target, if this system of attack were to be taken another stage.


[1] The words providing these links were: annual adj, belch 3, berry, billow n 3, boatswain, bourn, brine, celebrate, deity, diligence, din n 2, dive, dolt, duck n, filth, harpy, hatches n.pl, hearken, interrupt, island, mariner, mast, morsel, mute adj, nostril, overboard 3, pinful, paragon 2, peerless, prime adj, provision 2, pulse (arterial) n, refresh, relation, rope, shelter n, shroud v, south‑west, sulphurous, totter, treble v, untie, unwholesome, vast n, whistle n, wooden.

[2] Links with Romeo are given by; afore 2, baggage 2, beautify, brine, conven­iently, cricket, dignify, discern, distemperature, expire, fearfully, household interrupt, marriage‑day, morsel, parentage, pilot n 2, 'pothecary, prest, pro­rogue 2, provision n, pulse n (arterial), pupil (scholar), receptacle, repe­tition, rope, rosemary 3, sale, shaft, slack v, spice n, startle, steerage, stint v 2, thwart v, unhallowed, vestal ad‑; 2, visor 2, warmth, whistle v, wrench v.

[3] The index words, with the numbers oilinks they provide with these three plays respectively in the order given are: annual 010, cherry 002, din 200, dole n 100, duck n 001, espy 002, eyne 114, hight 021, inkle 010, lame adj 010, marriage‑feast 010, mute adj 110, needle 201, nill 100, painful 110, pupil (scholar) 120, relate 010,  stead v 100, vail 110, vie 100, votaress 002.

[4] The linking words are: adorn, archer, aspire, blithe, boot v, cuilding n, chariot 2, countless, delightful, detect, embracement, environ v, erst 2, favourer, fere, foster, gloze v, gnat, gratify 2, happily (=haply), interpret, jet v, lading n, lop 2, net n, overflow v 2, shaft, shield n, shipwreck, slumber v 2, suffrage 2, vast adj 2.

[5] Words linking with Winter's Tale are: admiration, adultery, advocate n 4, affront v, basilisk, behove 2, benediction, bracelet 2, brand n, browse, bug, certainty, chapel 2, churl, confirmation, cordial adj, counter n, cricket, curious, declare, deem, discern, disobedience, distaff n, distemper, dolour, dominion, dotard, earnest n, embracement, enclose, evidence, evident, fail n 2, fan v, forfend, foundation 2, franklin, fraught, gaoler, goddess‑like, graze, ha exci 2, handfast, importance 2, incur, infectious, jollity, lame v, languish, lawyer, lid, longing n, lout, mannerly adv, mart v, meterial adj, mature adj, mute adj, neat adj 2, neat‑herd, needless, nostril, pantler, paragon, perpetuity, physic v, playfellow 2, pond, posture, preferment, preserver, pretence, primrose, prone, puppy, rarely, recoil v 2, recreation, refer, relation 3, reflect, rustic adj 2, sear, second n, 'shrew v, singular, sir n 3, sleepy, slippery, sour v, sprightly, stomacher, stuoify, surmise n, swerve, tenth, rwinned, uncertain, unfledged, utterance, valley 2, vexation, visible, wages, wet adj, wildly 2, winner, yoke v.


