Community-source structured metadata of English place-names – Stuart Dunn

Thursday 16th February

Speaker: Stuart Dunn | website

Theme: Spectrum of Citizen Science

The DEEP project started in November, it’s very recent, but we can talk about what it will be and where it comes from. The historical developments of place-names over time have been systematically surveyed since 1922 by the specialists of the English Place-Name Society (EPNS) who produced the Survey of English Place-names (SEPN), a complex and fascinating document. It is a true community effort, comprising 86 volumes, compiled by different place-name scholars over the years.

A previous pilot project was conducted in 2010, called CHALICE with the aim of encoding the Survey of English Place-names in XML. But there was no particular focus on normalization or giving a structure to those documents. It was mainly to make them machine-readable.

The current project is more ambitious in terms of data structuring. Hierarchy of places, for example, is an aspect that will be considered. Place names are much more complicated that we might think: they dynamic, change in time, they are incredibly diverse, they appear in sources whose interpretation can me challenging. Digitizing the SEPN poses a certain number of challenges. For example the structure of surveys differs from county to county. In terms of quantity, 32 English counties collected in 86 volumes means more than 30,000 pages, or around 4 million individual place-names! When it comes to pre-OS sources, there is very little data on geographic association of place-names. It might seem a minor problem, however we must keep in mind that administrative geographies changes over time. And across centuries, even natural features can be misleading. The modern course of River Irthing, for example, moved from where the Roman bridgehead was located.

What’s the value of crowd sourcing for this project?

• Correct errors/omissions in the Natural Language Processing

• Validate our output with local knowledge

• Add geographic data where it is lacking (e.g. field names)

• Identify crossovers with users of other sources

• Enrich our point data with raster and string data

DEEP is an AHRC-funded project. More information is available on the JISC website.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.