Shakespeare: Word Links Between Poems and Plays

Notes & Queries 22/4 (220), April 1975, pp. 157-163


An attempt [1] to find statistical evidence bearing on the authenticity of A Lover's Complaint as a work of Shakespeare led to inconclusive results. It was shown that the vocabulary of unusual words in Complaint was beyond the range of Venus and Adonis and The Rape of Lucrece, and that there was a deficiency of word links between these two poems and Complaint. Nevertheless, while the findings suggested a difference of authorship, this might be a difference between early Shake­speare and later Shakespeare rather than a difference between Shakespeare and another writer. Mr. J. C. Maxwell [2] has noted that the vocabulary belongs to the Shakespeare of the 1600s, "and the play to which it seems closest is Troilus and Cressida".

    The important work by Mr. MacD. P. Jackson [3] is mentioned by Mr. Maxwell in his Introduction, p. xxxvi. In this study, Mr. Jackson reviews earlier work, including the comprehensive summary by H. E. Rollins in the New Variorum edition of Shakespeare's Poems (1938), and a later review by John Munro (1958). The main weight of scholarly opinion seems to have been that the poem was non‑Shakespearian; but Mr. Jackson notes that both the poem and the problem it presents have been little noticed in recent years.

    Mr. Jackson deals in detail with the arguments of J. W. Mackail (1922) against the poem's authenticity, and himself advances arguments in its favour:

    (1) With the aid of Bartlett's Concordance and Schmidt's Lexicon he drew up a list of every word in the poem used elsewhere by Shakespeare only five or fewer times. This list, which he produces, contains forty‑nine words not used elsewhere by Shakespeare, i.e. one for every seven lines. "This pro­portion of once‑used words is what we should expect to find in a Shakespearian poem written at about the time of Hamlet or King Lear." The list provided eighty links in diction with the first nineteen works and two hundred and two links with the the other twenty (omitting the Sonnets as having been written over a long period). "Best represented are, in order: Cymbe­line, Hamlet, Troilus, Coriolanus, All's Well, Timon and Lear."

    (2) Complaint contains a very large number of new coinages. These follow the pattern shown by Hart to hold for Shake­speare's later diction, e.g. the use of nouns as verbs, perhaps by adding ‑ed, words beginning with un‑, etc.

    (3) Parallel phrases can be found in the plays for a number of idiosyncratic expres­sions in Complaint.

    (4) The imagery is Shakespearian, e.g. in the use of legal metaphors and especially the fusing of diverse images by concealed puns.

    (5) The appearance of stylistic devices such as Shakespearian "doublets ", e.g. "the grounds and motives of her woe", the inversion of the negative, etc.

    (6) Instances are given, e.g. "0 father, what a hell of witchcraft lies/In the small orb of one particular tear! " (288‑9) of a distinctively Shakespearian beauty. The abundance of poor poetry in the poem is not held to argue against Shakespeare's hand.

    (7) Finally, the subject of the poem is regarded as taking its place in the poet's oeuvre, especially as part of his concern with appearance and reality and man's sub­jection to illusion.

    Mr. Jackson concludes: "In fact, every­thing about the poem‑vocabulary, phrase­ology, imagery, stylistic mannerisms, subject matter‑confirms the correctness of Thorpe's attribution. It deserves to be taken seriously as a product of the latter half of Shakespeare's career."

    In a recently reprinted discussion, origin­ally published just before Mr. Jackson's, [4] Professor Kenneth Muir came to very similar conclusions, basing them on much the same arguments: (1) the number of words not found elsewhere in Shakespeare's works is higher than might have been expected, but does not in itself constitute a cogent argument against his authorship; (2) with regard to style and quality, it is easier to believe that Shakespeare wrote a poem with a number of feeble lines than that some other poet, having steeped him­self in Shakespeare's poetry, would have succeeded in equalling his models; (3) significant parallels are found with passages in Lear, Antony and Cleopatra, and a number of the most interesting ones with Hamlet; (4) the poem was published with the Sonnets, to which, being a substantial volume in itself, the publisher could feel no need for padding out with another man's work. Mr. Muir finds some links with the Sonnets and more significant ones with plays written after 160,0 than with those written before. He suggests that the poem was written about the turn of the century.

The present inquiry

It is the purpose of this paper to enlarge the evidence sketched above and to add precision by examining the vocabulary of A Lover's Complaint with the aid of statis­tical tests; and further to compare the results with the results of similar tests carried out on Venus and Adonis, The Rape of Lucrece and the Sonnets. Bartlett's Concordance was used. All words making a single or once repeated appearance in one or two of the poems were listed. No proper names or words with an initial capital were included. Different parts of speech (noun, verb, adjective, adverb, etc.) were taken to be different words even though identically spelt. Different inflexions of a word, e.g. declensions of nouns, conjugation of verbs and comparison of adjectives were not taken to constitute a difference. Words hyphenated by Bartlett were taken as one word. Without entering into subtleties, homonyms with radically different mean­ings were taken to be different words, e.g. bark (of dog, Venus 240), bark (of tree, Lucrece 1167, 1169), and bark (=ship, Sonnets 80, 7; 116, 7) would be three different words. These are the principles used by Hart in his enumeratons of the number of different words in Shakespeare's vocabulary; and he was following the principles adopted by the editors of the Oxford English Dictionary.

Links between the poems

The first analysis was in terms of the links between each of the four major poems, taken in pairs, i.e. the number of words appearing once and once only in each of the pair, and not otherwise in any of the poems covered by the Concordance. Set against the numbers expected on the hypo­thesis of chance connection only (i.e. that the numbers of links between two poems would be proportionate to the product of the totals available), two trends emerged. There was an excess of links between Venus and Lucrece (107% of expectation), and between Sonnets and Complaint (136% of expectation), other links being in deficiency. There is perhaps some support here for the view that Complaint is a genuine Shake­spearian poem but relatively late, and so closer in time to the Sonnets than to either Venus or Lucrece. But owing to the small­ness of numbers, only three hundred and three links altogether, the deviations from chance expectation are not statistically significant, and so carry no great weight.

Links between poems and plays

From this point onwards in the inquiry, only words occurring once or twice (and no more) in a single poem have been used. Against each of the words in this list was entered the number of citations given by Bartlett for the plays. The enumeration is in terms of citations, and a citation in which the index word occurs several times still ranks as a unit. With words for which there were five or fewer citations, the names of the plays from which they came were also recorded. It was thought that these relatively rare words might be more likely than commoner ones to be associated with a particular stage in Shakespeare's writing.

    The distribution of numbers of citations for each of the index words is shown in Table I. It will be seen that there is a steady fall off in numbers of link‑words from none to few to many citations. For each of the poems there is a precipitous drop of about 50% from the 0 to the 1 class; but after that, for each of the poems, the numbers are roughly in descending geometric progression. The rate of decre­ment from 1 to 2, 2 to 3 citations, etc., expressed as a percentage in Table I, is estimated with sufficient accuracy by divid­ing the number in the first class (e.g. Complaint 26) by the sum (2019). It will be seen that these rates vary a good deal, being very low for Pilgrim, high for Com­plaint but still higher for Venus. Complaint therefore in this respect falls within the Shakespearian span.


     Some interest attaches to the unique words which have no mention at all in the plays. Their distribution is shown in Table II. It will be seen that such words are more than twice as frequent in Complaint as in any other of the three major works. But they are still commoner in The Phoenix and the Turtle. Unique words are relatively rare in what are believed to be the non‑Shakespearian parts of The Pas­sionate Pilgrim (excluding therefore the items numbered 1, 2, 3, 5, 16 in the Cam­bridge edition). In the prevalence of unique words, the difference between Complaint and the other major poems is statistically significant.


    Attention was now focused on words of rare occurrence providing one to five cita­tions. The distribution of the plays provid­ing them could then be assigned to the index words of each of the poems separately. These counts are shown in Table III. The observed numbers could then be compared with expectations based on the null hypo­thesis that the chance of finding a given index word in a given play depends on the total number of different words in that play and not on any of its other features. The numbers of different words in each of the plays, as shown in Table III, have been taken from the enumeration by Alfred Hart. [5] Thus, referring to Table I, we note that Venus provided a total of 62 x 1 + 39 x 2 + 32 x 3 + 36 x 4 + 23 x 5 = 495 cita­tions. Distributed evenly over the total of 103,593 words in the summed vocabularies, one might have expected to find 495 x 3146 ÷ 103593 in the text of 2 Henry VI, i.e. 15.03. The number actually found, eigh­teen, is 120% of expectation, but the excess is not statistically significant. Apart from the empirical counts, Table III lists all excesses of observation over expecta­tion, those in bold type being the statisti­cally significant ones, having a less than 0.05 probability of appearing as a random event. If the play is a short one, so that numbers are small, observation may have to exceed expectation by a large margin to attain statstical significance. All places where observation falls below expectation are left blank in the Table, so that the positive associations may spring to the eye.

    From the Table it will be seen that both Venus and Lucrece are associated by these word links with the earliest plays (in the Chambers chronology), from 2 Henry VI down to King John. Venus has highly significant associatons with Titus Androni­cus, Two Gentlemen of Verona and A Midsummer Night's Dream, and Lucrece with 2 Henry VI, 1 Henry VI, Richard III, Titus and Romeo and Juliet. There is a good deal of similarity between both distri­butons, the main point of difference being that Venus has no above‑expectation asso­ciation with Romeo. If we take both poems together, we find them statistically significantly associated with a whole string of the earlier plays; i.e., with percent of exnectaton in brackets: 2H6 (134), 1H6 (159), R3 (134), Tit (195), TGV (141), R2 (135), MND (155). The strongest associa­tion of all is with Titus, which Venus and Lucrece bracket in the Chambers chron­ology. This strong link with both poems may perhaps be held to argue against those inclined to doubt the Shakespearian author­ship of Titus. Errors is positively associated, not significantly, with Venus only and not with the combined poems (85%); Shrew has no positive association with Lucrece or the combined poems, and only minimally with Venus.


   Turning to the Sonnets, we find a pre­dominantly negative balance of associations until we get to Love's Labour's Lost. The statistically significant ones are with that play and with Henry V. In the cluster of four plays, LLL, Romeo, Richard II and MND we have 93 citations instead of the 67.2 expected, a ratio of 138%, constituting a highly significant excess. If one wishes to be speculative, one can see the Sonnets associating with two series, an earlier one from Love's Labour's Lost to Midsummer Night's Dream and a later one with Much Ado and Henry V, with a gap between. This might be brought speculatively into relation with the three years of acquaintance with the Friend insisted on in S. 104, which also seems to imply some intermission in sonnet­eering. However that may be, the associa­tion of the Sonnets with plays of the second quarter of the whole series is highly sig­nificant.

A Lover's Complaint

Taking first the words yielding one to five citations, statistically significant asso­ciations were found with Hamlet, Troilus, All's Well and Cymbeline. These are rela­tively late plays; and the findings fit in with Mr. Maxwell's suggestion, quoted earlier. As expected, there is a good fit with the rather different count made by Mr. Jack­son. It was thought desirable to take the test somewhat further; and in successive analyses progressively more link words were added, to include (as shown in the column headings of Table III) words giving 1‑5, 1‑15, 1‑29, etc., citations. These successive tests in turn confirm the earlier ones, to leave us in the end with four plays firmly linked with Complaint: Hamlet, Troilus, All's Well and Cymbeline, with Lear as a rather probable addition.

    It was a matter of surprise to the writer to find quite commonly used words still maintaining the stamp of association with a particular stage in Shakespeare's dramatic writing. To take a rather striking instance, the word "extend " yields twenty‑two cita­tions, but none earlier than John, and 18 of them in the later plays from Troilus to Cymbeline. A convenient measure of the degree to which such a distribution is early or late is to take its middle member (or middle pair of members), i.e. the imedian. Citations scattered indifferently over the whole range of plays would be expected to have their median in the middle. In the Chambers chronology, Julius Caesar, the nineteenth, stands in the middle of the thirty‑seven plays; but if, following Mr. Oliver's views, [6] we place The Merry Wives of Windsor contemporaneously or immediately after 2 Henry IV, then the nineteenth play is Henry V. Alfred Hart's enumera­tion of the total number of lines in each of Shakespeare's plays provides a grand total of 101,798 lines, with the 50,899th and 50,900th lines being found towards the end of Henry V, or just about halfway between the beginning of Henry V and the end of Julius Caesar. On this basis, which is more precise than merely taking the nineteenth play, we can take these two plays as being at the middle of the canon.

    The link words in Complaint yielding twenty or more citations are listed opposite:

    One is struck by the degree to which quite commonplace and frequently used words carry a value for dating. As "late" words so heavily predominate over "early" ones, one can understand why successive tests, including more and more commonly used words, confirm the trend of earlier tests restricted to relatively rare words. Nevertheless, the dating value of very common words seems to be less than that of the more rarely used ones. For instance, in Table III, the association of Complaint with Lear shows up strongly with the more rarely used words, but becomes attenuated as the commoner ones are added.

    Accepting Mr. Oliver's emendation of the Chambers chronology, the three plays Hamlet, Troilus and All's Well, coming together constitute a period of creation with which also A Lover's Complaint is asso­ciated. The association is very secure, statstically speaking. For instance, to take only the rare words with one to five cita­tions, thirty‑four of them are found in these three phys when only 17.09 could have been expected, i.e. very nearly double as many. with a probability as a chance finding of less than 0.0005.

    Perhaps a surprising feature of Table III is the existence of linking not only with the three plays mentioned but also with Cymbeline, generally thought to be a late play. Of the intervening plays in the Chambers chronology, some, like Measure for Measure, Othello, Lear, Coriolanus, show signs of linking in greater or lesser degree: but Macbeth, Antony and Cleopatra and Timon show no sign of it at all. As with the Sonnets, it is perhaps possible that we have to do with interruption of the period of creation. Whatever the explana­tion, the association with Cymbeline is fully assured statistically.



The following conclusions seem to be justified.

    (1) The method of counting word‑links between poems and plays leads to results which accord with existing informed opinion. Venus and Lucrece show them­selves as associated with the vocabulary of the plays of the first third of the canon; and the Sonnets are shown to be related to plays of the second quarter.

    (2) This method, applied to A Lover's Complaint, shows statistically highly signi­ficant association with the vocabulary of the third quarter (Hamlet, Troilus, All's Well and possibly Lear), and also very definitely with Cymbeline.

    (3) The results appear to provide weighty evidence that A Lover's Complaint is an authentic work of Shakespeare. As a by‑product of the inquiry, the same can be said for the authenticity of Titus Andronicus.                     


[1] N. & Q., ccxviii (173), 138‑40.

[2] The Works of Shakespeare. The Poems, edited by J. C. Maxwell. Cambridge University Press, paper‑back edition 1969, p. xxxv.

[3] Shakespeare's A Lover's Complaint: its Date and Authenticity, by MacD. P. Jackson. University of Auckland Bulletin 72, English Series 13, 1965.

[4] 'A Lover's Complaint': A reconsideration" in Shakespeare the Professional and Related Studies, by Kenneth Muir, London, 1973.

[5] Alfred Hart: "Vocabularies of Shakespeare's plays," RES, xix (1943), 128‑140 and "The Growth of Shakespeare's vocabulary," jbjd, (1943), xix 242­254. In this paper the figures given in Hart's Table IV, p. 249, have been used. They differ to some extent from the numbers given in his Table I, p. 132. For 30 of the works (including poems and sonnets) the numbers in the two tables are identical; but there are the following differences between Table I (given first here) and Table IV (second): R3 3224. 3218; Lucrece 2826. 2812; Shr. 2462, 2463; R2 2832, 2833; H5 3147, 3162; AYL 2623, 2578; TN 2524, 2534; Tro. 3260, 3360; Oth. 3075, 3015; AWW 2697, 2705. In two cases, Shr. and R2, the difference is by a single unit; in five cases, R3, Lucr., H5, TN, AWW, the difference is less than 1% of either total. But in two cases it is more substantial, 1.7% with AYL and 3.1% with Tro. It is possible that some of these disagreements are due to inade­quate proof‑reading. There is an editorial footnote to p. 128: "owing to the long delays in mails between Australia and this country the author has not been able to read proofs". In the circum­stances it is not possible to say with any certainty which of the two sets of figures is to be preferred, though perhaps there is a presumption in favour of the figures of Table IV, as being of a later date (unless the editor received both MSS at the same time, but published them consecutively). It is to be noted that the number of different words shows a suspect identity between two plays in a total of three pairs: 2H4 and Cor. each 3130 in both tables. Tro. and Cym. each 3260 in Table I, and Tit. and AYL each 2578 in Table IV. Perhaps one coinci­dence should suffice, and to dispose of the other two we should allow AYL 2623, as in Table I, and Tro. 3360, as in Table IV. However, none of these dis­agreements between the two tables affect the estimates of statistical significance of the associa­tions noted in this paper.

[6] The Merry Wives of Windsor. Arden Edition. Edited by H. J. Oliver. London, Methuen. 1971. In a paner published later in this issue, the present writer shows that word‑links between The Merry Wives and 2 Henry IV are in highly significant excess. Mr. Oliver's placing is therefore adopted in the present paper.