A Statistical Note on A Lover's Complaint

Notes and Queries, 20/4 (218), April 1973, p. 138-40

There is no general agreement whether Shakespeare was the author of this poem. For instance, in the Cambridge edition of the Poems, [1] which is used in what follows below, the editor, J. C. Max­well, writes:

The case against Shakespearian author­ship is, basically, the general impression that the poem is neither good enough nor characteristic enough. The number of words that do not occur elsewhere in Shakespeare (or, occasionally, anywhere else at all), is not in itself particularly striking (see Muir).

 Mr. Maxwell considers a variety of evidence bearing on the point at issue, but it is the poem's vocabulary with which this note is concerned. On this aspect Kenneth Muir [2] wrote in 1964, "the parade of learned words is no more obvious than in Troilus and Cressida ". He noted that there are more than fifty words in the poem which are not found elsewhere in Shakespeare's works, but considered this inconclusive: these words are similar to words used by Shakespeare. In the later plays, he said, Shakespeare uses a previously unused word every ten or fourteen lines, and a word new to our literature every eighteen to twenty‑eight lines. There are 329 lines in A Lover's Complaint, so that one 'would expect to find in it at least 23 and possibly as many as 33 words not previously used by Shakespeare, and about 15 words which are not known to have appeared in print before. In fact, of the first there are con­siderably more than 33, but of the second only about 14. As the poem has been dated as early as 1585, and as late as 1603, pre­cision is impossible.

    If A Lover's Complaint is fairly to be compared with the other poems, the Glos­sary of the Cambridge edition can be used to carry out some simple statistical tests. The words in the Glossary (we can call them G words) are included there for a variety of reasons, e.g. because they are obsolete ' or have a meaning now forgotten, or a technical meaning, etc. But the reasons for their inclusion do not concern us, since we must assume that the same criteria for inclusion were effective through­out, regardless of the poem which was their source.

    The first test we can make is of the frequency of these words in each of the five poems (omitting the two Dedications and the Argument to Lucrece). As the lines are shorter in parts of The Passionate Pilgrim and in The Phoenix and the Turtle, their 409 and 67 lines have been scaled down to 376 and 47 respectively, so as to put the average number of words to a line on a par with the other poems (*). The inclusion of the Pilgrim, large parts of which are known not to be by Shakespeare, gives us a com­parison of our doubtful material two ways, with 'what is certainly authentic and what is mainly inauthentic. We get the following counts:


There is a statistically significant deficiency of G words in the Pilgrim; at the same frequency as the other poems, over 130 G words should have appeared, instead of the 73 found. The expected deviation from Shakespearean normality can be regarded as tending to validate the test applied.

    The frequency of G words does not differentiate Complaint from Venus; but a different result emerges when we take, instead of all G words, only those which appear uniquely in one poem only, and only once even in that. The counts now become:


Venus, Lucrece and Phoenix still run pretty close to one another in frequency of unique G words. Pilgrim again shows a marked deficiency; and Complaint now shows a great excess, 101 as against the expected 66. Both the deficiency in Pilgrim and the excess in Complaint are highly significant statistically. If we repeat this test on differ­ent criteria, instead of taking words taking the listed meanings of words, the difference still holds up. Pilgrim has only 45 per cent of the Glossary entries that might have been expected, and Complaint has a 32 per cent excess over expectation. Statistically speak­ing, this is strong evidence of heterogeneity. Judged by the standard of the other poems, the vocabulary of Pilgrim is unoriginal, while that of Complaint deviates towards eccentricity.

   We can proceed further to test the relatedness of the vocabularies in the five poems by counting the number of double entries of single G words. G words can be classified into those that appear in one poem but not a second, those that appear in the second but not the first, and those that appear in both. Thus there are a number of words v (=293) which appear in Venus but not in Lucrece, a number 1 (=492) which appear in Lucrece but not in Venus, and a number b (= 159) which appear in both. If we suppose that the number of twofold appearances in both Lucrece and Venus is determined by chance or coincidence, we can use a well‑known statistical principle to make an estimate of the total vocabulary of such words in the pool from which they were drawn. This is v+l+b+vl/b (=1851). Omitting Phoenix, which is too short a poem to be useful here, we have the following counts of double appearances, and the correspond­ing estimates of the word pool from which they derive:


The two estimates involving Complaint are from two and a half times to four times as great as the VenusLucrece estimate. Put­ting it another way, if the Complaint vocabulary were one with the Venus and Lucrece vocabularies, we should expect to find 31 G words in both Venus and Com­plaint, instead of the 12 observed, and 44 G words in both Lucrece and Complaint, instead of the 11 observed, both of them deviations from expectation which are highly significant statistically.

    To conclude, Muir's refutation of the view that the vocabulary of A Lover's Complaint is not Shakespearian, is not supported by these tests. However, all that one can say is that, while the vocabularies of Venus and Lucrece seem to be homo­geneous (as far as these tests go), there appears to be a real difference between their vocabulary and that of Complaint. This might be due to different authorship, but not necessarily so. The first publication of Venus was in 1593 and that of Lucrece in 1594; Maxwell suggests 1601 or 1602 for Complaint. The time difference might account for the vocabulary difference.


[1] The New Shakespeare. The Poems, edited by J. C. Maxwell. Cambridge University Press, paper­back edition 1969, p. xxxiv.

[2] "A Lover's Complaint: A Reconsideration ", by Kenneth Muir, in Shakespeare 1564‑1964, ed. Edward A. Bloom, Brown University Press, Provi­dence, RI., 1964.

[3] Expected numbers are calculated on the assump­tion that the total number of G words counted should be distributed evenly over all five poems, and appear in each proportionately to the number of lines.