Extracting collocations

Monco's collocations module can be used to extract and aggregate word co-occurrences from spans matching corpus queries. There are three special features of this module. Firstly, you can use any query supported by Monco as the "node" of your collocation query. It can be a single word, but it can also be a phrase or a grammatical pattern. Secondly, for every result you can see all the distinct spans of words bounded by the co-occurring words. Finally, you can access all the results contributing to the total observed frequency of a particular result. We illustrate these features in the examples below.

Let's say you want to extract a list of nouns frequently pre-modified by the adjective 'religious'. You could set the following collocation extraction options:

The first two options are used to define the 'seed query'. We want collocates of a single word, so we leave them as they are. Let's say 5000 sentences in which the word 'religious' occurs are enough for our purposes. The part of speech of the collocate is set to 'noun'. This is a shorthand notation for any common nouns. You may pick the option 'any' if you do not trust our tags. The positions of the collocate are set to either 1, 2, 3, 4 or 5. Negative position values can be used to specify collocates which should precede the node.

We've set the options, so we can click the search button. After a few moments you should get your first results screen which may look similar to the one below.

Clicking on any row of this table should bring up a separate dialog with detailed information about the collocation, similar to the one below. Here we have direct links to all the phrases bounded by the node and the collocate. By clicking on any of these links we can There is also a barplot showing the typical positions of the collocate's lemma with respect to the node.

Here is a few more examples of collocation queries with direct links to their results.

  1. Nouns following the phrase 'avoid unnecessary' or 'avoid unwanted'
  2. Nominal collocates of 'flimsy'
  3. Adjectival collocates of 'stay' as a noun.