A Thesaurus of Old English Online
Introduction
A Thesaurus of Old English (TOE) was first published by King's College London in its Medieval Studies Series in 1995. It is a substantial work of 1562 pages in two volumes, the first a conceptually arranged thesaurus of the extant vocabulary of Old English, the second an alphabetically arranged index of Old English words. When this edition sold out, paper publication was taken over by Rodopi, which produced a second impression in 2000. [1] Although TOE has been widely used in Old English lexical studies, it has all the drawbacks of a paper dictionary in that it can only be searched within the limitations of its format. The next logical step was therefore to unlock its resources by producing a fully searchable electronic version. A grant from the British Academy [2] enabled us to undertake this step, and TOE Online is now available free of charge at http://libra.englang.arts.gla.ac.uk/oethesaurus/.
The project was carried out by the original TOE editors, Jane Roberts and Christian Kay, and two members of the Glasgow Historical Thesaurus team, Irené Wotherspoon and Flora Edmonds, who supplied technical expertise. [3] Roberts took the opportunity to upgrade the comments field which she and Kay had used for discussion during editing of the original volumes. She also made corrections deriving from new knowledge, e.g. changing a flag if new evidence had been found and amending sources. This information came largely from completed sections of the Toronto Dictionary of Old English, which also yielded some words not in the original TOE. These are now being added, and, thanks to the flexibility of web publication, more can be added in future.
Electronic Issues
In making an electronic version, we were not starting from scratch since TOE was already held in an MS Access database. This had two serious disadvantages. The first was Microsoft's habit of continually upgrading to new and incompatible versions. The second was that the program is not robust enough to operate over a network and has to be installed on individual machines. We therefore decided to set up a database-driven website using MySQL, an open source relational database management system, and PHP, a widely used general scripting language especially suited to web development. These enabled us to make TOE available over the internet.
Simply making TOE available for electronic searching, was, we felt, a step forward. Experience of other electronic dictionaries, such as the Oxford English Dictionary (OED), Middle English Dictionary (MED), Dictionary of the Scots Language (DSL) and Dictionary of Old English A-F on CD, has shown that such improved accessibility can open up whole new areas of research. [4] Problems specific to TOE could also be tackled. Experienced users can become familiar enough with TOE's structure to find the words they want in the semantic classification, and to track down cross references to other places where a word appears. However, many, if not most, users approach the classification via the alphabetical index. For a polysemous word, such as bōsm, this can be a laborious process, involving looking up six possible categories:
- bōsm 02.04.03.03.05 Chest, breast, bosom
- 04.06.03.01 Cavities of internal organs
- 04.06.04.02 Female reproductive organs
- 10.05.03.04 Depth, deepness
- 12.01.09.03.01.03 Part of ship
- 01 Heart, spirit, mood, disposition
- ..A day's pay: dægwine
- ..Pay for haymaking: mǣdmēd°
- ..Payment for prayers: gebedbigen°
- ..Payment in meal/corn: corngesceot°, melugescot°
- ..Payment in ale: mealtgesceot°
- ..Log of wood as bonus: wægntrēow°
- ..Money for buying clothes: scrūdfeoh°
Now that a single click on a word form produces a list of all possible locations, together with instant access to them, retrievability is much improved.
For Old English characters we decided to use the same conventions as the Middle English Compendium and the Toronto Dictionary of Old English corpus, where the user enters an uppercase A for ash and an uppercase T for thorn when querying the database in a word search. Our result set then returns the data displaying the Old English symbols. Selecting options from a menu, as in Peter Baker's Unicode based font Junicode, is a desirable alternative.
Some issues of spelling were also sorted out. Basically, the TOE editors followed the headword spellings in Clark Hall's dictionary, but sometimes in the case of polysemous words they gave the form which appeared in the context. [5] Unique occurrences were sometimes given in non-normalized form. Thus Clark Hall has a single entry +/-setnes, which also covers +setednes and +setenes, where + stands for the ge- prefix. TOE splits these up, giving (ge)set(ed)nes, gesetednes, geset(ed)nes, (ge)setnes and gesetnes as separate headwords. These are now retrieved together.
Range of searches
Kay and Wotherspoon gave a paper at the 13th International Conference on English Historical Linguistics in Vienna in August 2004 and solicited opinions from likely users of the project during the subsequent discussion and by questionnaire. The results suggested that everyone would welcome an electronic TOE and that its main use would be in research. Five search areas and other improvements were then identified and implemented as described below.
1. Old English Word Search. This lists the Old English word under the semantic heading or headings where it appears in the thesaurus. Part of speech is displayed rather than deduced from the headings—this was one of the improvements on the parent volumes suggested by users. Thanks to the ingenuity of the computer (or the programmer) words can be entered without length marks, but forms both with and without these are retrieved and a choice of display is offered. There are also wild card searches on the beginning, middle or ends of words, useful to those with an interest in word structure, especially affixation or compounding.
2. Modern English Word Search. This finds any word which is used in the TOE category headings and gets round a complaint about the paper version, which was its lack of a Modern English index. This search is obviously limited by the metalanguage of the headings, which is influenced by the source dictionaries. [6] If the user does not get a response, s/he should think of a synonym in Modern English and try again.
3. Browsing searches. These allow the user to gain an overview of a semantic field or subfield and to study the structure and taxonomic levels of the classification. TOE is structured in hierarchical categories, with up to 12 degrees of semantic subordination available, represented either by number strings or, at lower levels, by dots. The degree of hierarchy across sections of different kinds is of interest, both for Old English research and for semantics generally.
4. Flags indicating restricted occurrence. The flags for words which are poetic or rare can be searched, either overall or in particular fields, and a numerical total is returned. There are no searches on the 'q' field for doubtful forms, as this was felt not to be of semantic interest.
5. Old English Phrases. Multiword forms, including phrasal or prepositional verbs, can be requested. These are found by identifying forms where two elements are separated by a space without punctuation. Where there are more than two elements separated by spaces, duplicate results occur, but are easily eliminated.
Searches on flags
During the original editing, Roberts spent considerable time and energy on attaching superscript flags to particular kinds of OE word forms. These are: o indicating infrequent use, p for poetic register, q for doubtful forms and g for words occurring only in glossed texts or glossaries. These flags are held in separate fields in the database and so can be searched either individually or in combination. [7] Their inclusion acknowledges the fact that the surviving OE lexicon is incomplete and skewed by the types of texts which our ancestors thought worth preserving, or, indeed, writing in the first place. They are often difficult to assign: a particularly problematic case is glosses occurring in several manuscripts deriving from a single source. These are undoubtedly 'g' but should they also be marked 'o'? On the whole in TOE we did not attempt to reconcile manuscript sources (although the OE versions of Bede were an exception).
We felt some trepidation about offering the facility for searching on the flags, but this was outweighed by the amount of potentially interesting information such searches might produce, with the proviso that the calculations should be treated as broad generalisations rather than an exact science. As Philip Durkin has pointed out, the results of any sort of computer-driven lexical research have to be treated with caution, if not scepticism. [8] Nevertheless, computer searching is certainly faster and probably more accurate in the long run than assembling data manually even from a relatively small corpus like TOE.
The database as a whole contains 50,706 headwords, that is, different senses as opposed to different forms. These meanings come from 33,976 lemmata (or 32,494 if the ge- prefix is discarded). Of the 50,706, 9293 or 18.3% have single flags attached to them while 20,851 or 41.1% have one or more, which is an indicator of the peculiar nature of the extant OE vocabulary. [9] Now that we are able to make changes in the online version, these figures will, of course, be subject to rolling revision.
More interesting perhaps than these global figures is the correlation between flags and semantic domains. The poetic flag p occurs strongly in some fairly predictable areas, such as 08 Emotions, and perhaps more surprisingly in others such as 01 The Physical Universe. Its heaviest occurrence is in section 13 Warfare, where 466 out of 1450 headwords (32.1%) are marked p or op. The specialized nature of this area of Old English is thus confirmed. Also interesting is the break-down by part of speech. Nouns at 50.3% predominate in TOE as a whole, followed by verbs (24%) and adjectives (18.7%). [10] However, when it comes to parts of speech accompanied by p flags, nouns (11.7%) are closely followed by adjectives (10%), with verbs a poor third (2.7%). These figures are, we think, striking enough to allow us to draw conclusions from them, if only as regards the general importance of adjectives in poetic writing.
The o flag by itself often occurs in areas of very specific meaning, such as some finely differentiated words for specific payments in 150204 Spending, disbursement:
Clusters of g flags occur in both predictable and less predictable places. It is no surprise to find them attached to vocabulary dealing with other cultures in domains such as government or astrology or theatricals, or in detailed lists of names of plants, birds and so on. More surprising, perhaps, is to find them referring to such everyday areas as ploughing, weaving and jewelry. More work is needed to relate the terms back to the original texts, but overall our preliminary investigation of the flags suggests that they have the potential for further research.
Compounds and phrases
Another area which can be investigated is the occurrence of compounds and phrasal forms. If one knows the elements involved, this can be done by inserting * for wildcard searches. Thus:
hēah* finds hēahmōd (courage), hēahmōr (a high moor), hēahnes (nobility), etc., while *mōd finds hēahmōd, unmōdnes, etc.
More refined searches to eliminate irrelevant results are achieved by restricting the area of search by sense. A search for the ASPNS [11] project required words to be found which meant a leah (probably 'open woodland') covered with specified plants. Restricting the search to 010102 Land eliminated results such as eagfleah (disorder of the eye). We have not yet found any other way of eliminating irrelevant results such as mōdor and its associated forms. Overall, there is interest in determining what level of category compounds generally occur in and the ratio of simplexes to compounds in particular semantic fields.
Polysemy and related matters
Defining polysemy is a problem for semantics generally, since we may wish to distinguish a polysemous form which has developed two or more meanings from forms which are homonyms, i.e. words which are physically the same but have no etymological connection. An alphabetical dictionary generally deals with polysemy by clustering meanings under a single headword, often in a sequence which shows the relationship of one meaning to the next. Homonyms, on the other hand, appear as separate headwords. In a thesaurus, as a glance at the TOE index will show, the number of meanings a form has is obvious from the number of semantic categories it appears in, but polysemy and homonymy are not distinguished. The general impression from such a search is that OE simplexes work hard—many forms have several meanings—and that this is a result of polysemy rather than homonymy. There seem to be very few homonyms in OE, both because its vocabulary derives largely from a single source and because it is an inflected language, less hospitable to borrowed forms.
Many polysemous meanings are linked by metaphor, where an original concrete concept such as 'fire' or 'heat' develops a metaphorical meaning such as 'passion' or 'anger'. One can explore this phenomenon by taking a category where there are many metaphors, and checking whether the forms also occur in unrelated sections of the thesaurus. In category 16 Religion, for example, the 3395 headwords appear under 964 headings within Religion and also in 2149 categories elsewhere. Similarly, the 940 entries in 02.04 Body appear in 247 categories there and 673 elsewhere. Another approach to identifying possible metaphorical transference is to identify recurrent words in the modern English category headings, since many common metaphors are still in use. [12]
Future plans
We hope to continue to update TOE online and to implement suggestions made by users. Several people have requested that we return search results alphabetically as well as by thesaurus category. This is an excellent idea which will be implemented as soon as time allows. We expect other new uses and requests to emerge as the resource becomes more widely known. We have received a grant from the U. K. English Subject Centre to make a teaching version of TOE Online for students of Old English and the History of English by selecting relevant subsets and adding notes, tasks, background information, etc., and hope to complete this in 2006. One very tempting idea that we have done some work on is a special kind of reverse dictionary, which can be made by generating a list of words with their sets of headings as preliminary (and partial) definitions. This would be a version of the paper index, but would exploit the full taxonomy and display the kind of definition which the user can currently construct by tracking back from the most specific to the most general level of the classification. Any such plans will, however, have to go on hold until we complete the much larger and more complex Historical Thesaurus of English (HTE) project, which covers the vocabulary from Old English to the present day. Paper publication of HTE by Oxford University Press is scheduled for 2007 and an online version will follow.
NOTES
[1] Jane Roberts and Christian Kay with Lynne Grundy, A Thesaurus of Old English (TOE), London: King's College London Medieval Studies XI, 1995, 2 vols. Second impression, Amsterdam: Rodopi, 2000.
[2] TOE Online was supported by British Academy Grant LRG-37362. We are very grateful to the Academy for their assistance.
[3] The late Lynne Grundy, who contributed so much to the first edition, was sadly missed.
[4] See DSL http://www.dsl.ac.uk/dsl/; MED in the Middle English Compendium http://ets.umdl.umich.edu/m/mec; OED http://www.oed.com; DOE http://www.doe.utoronto.ca/
[5] J. R. Clark Hall, with a supplement by Herbert D. Meritt, A Concise Anglo-Saxon Dictionary, 4th edn. Toronto: University of Toronto Press, 1960.
[6] For some of the problems encountered in working with the traditional Anglo-Saxon dictionaries, see Christian Kay and Jane Roberts, "Definitions for a New Age", Poetica 62 (2004), 53-68.
[7] The way in which they are applied is fully described in TOE Introduction xxi ff.
[8] Philip Durkin, "Loanword etymologies in the third edition of the OED: Some questions of classification," in Christian Kay, Carole Hough and Irené Wotherspoon, eds., New Perspectives On English Historical Linguistics, Volume 2: Lexis and Transmission (Amsterdam: John Benjamins, 2004), 79-90.
[9] For lovers of figures: o alone = 4327; o + op + og = 10,106; p alone = 1632; p + op = 4335; g alone = 2593; g + og = 5669.
[10] Although not directly comparable, it is interesting to note OED figures of 50% for nouns, 25% for adjectives and 14% for verbs. OED http://www.oed.com.
[11] ASPNS = Anglo-Saxon Plant-name Survey, directed by Carole Biggam. http://www.arts.gla.ac.uk/SESLL/EngLang/ihsl/projects/plants.htm.
[12] On hunting for metaphors, see Christian Kay, "Metaphors We Lived By: Pathways between Old and Modern English," in J. Nelson and J. Roberts, eds., Essays on Anglo-Saxon and Related Themes in Memory of Dr Lynne Grundy (London: King's College London Medieval Studies, 2000), 273-285.