Monthly Archives: March 2018

COBUILD: The early years. Part 2 – A dictionary from a corpus

By the time I arrived at COBUILD as part of the 1993 intake recruited to work on the second edition of the dictionary, the whole project had been fully computerised for several years. This meant working on screen at terminals linked to mainframe computers that hummed away in a separate room, still with the green text on a black background, as described by Andrew Delahunty in Part 1. The mainframe computers were named after Shakespeare characters –Titania was one – and would occasionally overheat and need time to recover, giving us the afternoon off.

A mainframe computer, similar to those used at the University of Birmingham in the 1990s

A mainframe computer, similar to those used at the University of Birmingham in the 1990s

There was a pleasing contrast between the high-tech, cutting-edge nature of the project and the elegant Victorian building where we worked, with its large sash windows overlooking a beautiful garden where we would sometimes eat our lunch in the summer. It was also a great place for seminars and parties, both of which would bring in members of the English department of the University of Birmingham to which COBUILD was attached and the wider university.

Compiling on screen using a purpose-built text editor required the acquisition of a whole new set of skills, since I had only ever worked on paper; but what really blew my mind was the corpus. Previously I had only seen concordances – the output of a corpus – on paper, since on my previous project we were able to request a printed sample of lines for particularly tricky entries. Engaging at close quarters with the corpus was a revelation. I was almost paralysed for several weeks, overwhelmed by the quantity and quality of the data I was expected to process. This corpus – soon to be rebranded as The Bank of English – was tiny by today’s standards, but the insights it provided into the behaviour of English were like nothing I had ever come across before.

Concordance lines for chair, generated by the corpus

Concordance lines for chair, generated by the corpus

At COBUILD we worked with the corpus differently from the way I have ever known it to be used anywhere else. Using specially developed software, we lexicographers (and grammarians) would analyse the evidence for the word we were compiling. We would then base our revisions of existing entries from the first edition, as well as all the new entries and senses we were adding, on that evidence. We were a large team and there was always a colleague available to discuss problematic entries or tricky decisions on how to divide up senses, but the evidence provided by the corpus was the basis of everything we did. I don’t think we ever looked at another learner’s dictionary. It sounds horribly arrogant, but we had no need to; we had all the material we needed right there in front of us.

I have worked on many corpus-based dictionaries and other projects since, and I rarely work on a dictionary that does not use corpus evidence to some degree. A corpus is always my first port of call when I encounter a new word or meaning. However, I think the COBUILD dictionary remains unique in being based so directly and completely on what only a corpus can give, which is evidence of how the language actually works.


This blog post has been written by Liz Potter, who is a freelance lexicographer, editor and translator.

Find out more about our new editions of the Collins COBUILD dictionaries and other COBUILD materials here.

COBUILD: Design and Layout – Changes over the last 30 years

Where were you 30 years ago? I was in the middle of my university studies, still to embark on my ELT career, and as such, a smidgin too late to be part of the intrepid and free-spirited COBUILD dictionary team. Led by the late John Sinclair, this large young team was involved in bringing to life his vision: to create a dictionary for learners that was based on a large digital language database – or a corpus. The corpus would be used for analysing word frequencies, for identifying new uses, collocations, colligation, connotation, and typical contexts for words and phrases. Definitions would be written in full sentences in the type of everyday English a teacher might use to explain a word to a learner, with the added advantage that users would see how the word would work in a sentence.

Looking back at the pages of that first edition, you might be struck by the density of the page design. It seems that we now need our text to be broken up with white space, boxes and varying fonts and colours: our modern brains seem to need a bit of a break between lines and entries. Was my intrepid and free-spirited self really so much better at reading tiny words all squished together on a page? Well, the answer is probably yes, as I remember my first encounters with COBUILD dictionaries were ones of delight; I don’t recall thinking ‘What? How do you expect me to wade through all of that?’

A page from the first edition of COBUILD Advanced Learner’s Dictionary

A page from the first edition of COBUILD Advanced Learner’s Dictionary

The other feature that jumps out at us from the pages of the first edition is the ‘extra column’. This was a narrow column down the right-hand side of each main column of dictionary text. It provided information on parts of speech and typical syntactical patterns, such as ‘V + O’ (= verb plus object) for transitive verbs, so that students didn’t have to search through the denser dictionary text for this type of information. Parts of speech were very specific; for example, adjectives might be ADJ CLASSIF: ATTRIB (a classifying adjective that occurs in attributive position) or ADJ QUALIT (a qualitative adjective), and verbs could be V ERG (ergative verb), v-link (linking verb) or V + O (transitive verb). The user can see the examples of use in the main dictionary text next to this information.

See the ‘extra column’ information for accusation below:

An extract from the entry accusation from the first edition

An extract from the entry accusation from the first edition

The ‘extra column’ information here means:

Accusation is a countable noun. If it’s followed by a preposition, then that preposition is of or against (e.g. accusations of cheating). It can also be followed by a reporting clause, as in The accusation against us was that we were biased.

COBUILD’s ‘extra column’ was something of a showcase for the incredible amount of hard work that lexicographers and grammarians put into analysing the newly-built corpus. It told us all sorts of previously undocumented facts about how the English language works.

Sadly, though, the extra column was not to survive. Market research told us that most learners did not read or even understand the vast majority of information in the extra column and in 2008 it was quietly put out to grass. The information in the extra column was re-worked with the modern learner in mind. The reintegration of much of the material into the main text meant that the main columns could be widened and more words and meanings could be covered in the same number of pages.

So, what does our mature 30-year-old dictionary look like now? Well, it has grown into an incredibly user-friendly go-to treasure trove of the English language, thanks to its sophisticated font design, useful information boxes, colourful images, and plenty of restful white space. It has a hugely popular online sibling, available at www.collinsdictionaries.com, and has inspired learners and lexicographers alike to use corpora to continue to learn ever-more fascinating facts about our language. Happy 30th birthday, COBUILD!

A page from the ninth edition, published in 2018

A page from the ninth edition, published in 2018


This blog post has been written by Penny Hands, who is an ELT lexicographer and materials editor.

Find out more about our new editions of the Collins COBUILD dictionaries and other COBUILD materials here.

COBUILD: The early years. Part 1 – Where it all began

I have always counted myself as incredibly fortunate to have worked as part of the COBUILD team at the time that I did, between October 1983 and the end of 1986. I was not quite 24 when I arrived in Birmingham, not knowing one end of a dictionary definition from another. By the time I left I was pretty sure lexicography was going to be my career and, over 30 years later, I’m still doing it.

My three years at COBUILD spanned the move in compiling practice from paper to computer. For the first year or so we were writing out individual dictionary entries on slips of paper, in much the same way that Samuel Johnson, James Murray, and all our other illustrious predecessors had done before us.

Pink slips were for each sense of a headword, on which we’d write the definition together with accompanying syntactic (and other) information. White slips were for individual example sentences selected from the corpus, all 7.3 million words of it, together with any example-specific information that needed recording. I can remember laying out on the floor a mosaic of hundreds of slips for a long, complicated word like live or way, shuffling all these meanings around into various groupings in an effort to settle on the best arrangement. An academic paper could be written on the role of the floor in lexicography. Once compiled, entries were then typed up into the dictionary database.

The slip of paper used for compiling the entry veritable

The slip of paper used for compiling the entry veritable

The equivalent entry for veritable once typed up into the dictionary database

The equivalent entry for veritable once typed up into the dictionary database

The corpus, the primary evidence for all our observations about the language, may have been created computationally, but initially we consulted that on paper too, in the form of printed-out concordances. In the early days of compiling we’d often highlight with coloured felt-tip pens individual concordance lines that illustrated different meanings of a word.

Concordances for veritable from the 7.3 million word corpus

Concordances for veritable from the 7.3 million word corpus

Within a year or so, lexicographers were compiling and editing text directly into the dictionary database on newly-installed computer terminals, displaying green text on a black background. Compiling a dictionary on a computer was hugely innovative at the time but within a few years this would become the norm. So, I have a real sense of having been present at a moment of transition as one great lexicographical tradition was coming to an end and another was taking its first steps.

We were a fairly young team and for many of us this was our first experience of lexicography. So, I didn’t then have much to compare it with, in terms of methods and approach. But there was certainly a palpable buzz about the place. We knew we were doing something new. In some ways, it was only once the dictionary was published that I began to appreciate quite how radical and groundbreaking the COBUILD project was.

Look out for Part 2 coming soon…


This blogpost has been written by Andrew Delahunty, who is a freelance lexicographer, dictionary editor and reference book author with almost 35 years’ experience.

Find out more about our new editions of the Collins COBUILD dictionaries and other COBUILD materials here.

New editions of Collins COBUILD Dictionaries – Out now!

In celebration of COBUILD’s 30th anniversary, Collins is proud to launch new editions of its three most popular Collins COBUILD dictionaries, out today:

  • Primary Learner’s Dictionary
  • Intermediate Learner’s Dictionary
  • Advanced Learner’s Dictionary

To find out more about these dictionaries, take a look at our new COBUILD Dictionaries leaflet.

So what’s new?

  • Up-to-date coverage of today’s English – based on the constantly updated 4.5-billion-word database of today’s English language, the Collins Corpus
  • Authentic examples – real-life examples of English from the Collins Corpus show how words are used in everyday language
  • Vocabulary-building features – brand-new features on collocation, word history, usage and synonyms to help learners use English with accuracy and confidence
  • New supplements – offer guidance on effective communication in English
  • Full sentence definitions – all words and phrases are covered in depth and explained in full sentences to show words in context
  • Frequency – the most important words are clearly highlighted to indicate which to learn first.

Authentic English at your fingertips

www.collinselt.com/cobuild