Edited (version 30)
Identifying colonial traces in early modern travelogues
Challenge
Zentralbibliothek Zürich provides a text corpus of printed travelogues from the 16th to 19th century. Can you identify and extract colonial traces in these French and German texts? For instance, you could try describe the gaze on the "other" by tracking down mentions of geographic regions, different languages, certain ethnicities or one of the following semantic fields with NLP methods:
- the concept of the "noble savage"
- slavery
- representation of dominance
- hierarchies and structures of control
For instance, the last field could be looked at from an economic, military, cultural or political perspective. Starting from a military perspective, word clusters like "Truppe – Fortifikation – Kriegszug - verfolgt – Beute – Treueeid – schwören – Vasall – kniend - Tribut" could be interesting to search for inside the German texts.
Postcolonial reading of travelogue data
What perspectives on colonialism can be gained by the extracted entities, that are normally not considered in travel narratives?
Listing names of persons
- Shift the focus away from the positioning and the role of the author in the colonial project
- Undertake a schematic grouping: Identify intermediaries\, rand-and-file employees or marginalized voices
Listing the names of organisations
- Map the colonial infrastructure in an arena
Listing the names of historical places
- Compare the different naming for places in time
Process
We used the SBB-NER-Tagger, developed by the Staatsbibliothek zu Berlin (SBB), for Named Entity Recognition in the travelogues. We were able to extract person, organization and place entities from the OCR texts. The SBB-NER-Tagger contains a BERT-based model which has been trained on the SBB collections of early modern prints in German, French and English language, and (at first sight) seems to produce some decent results.
We created one JSON file per text page, starting from the original JSON file of the dataset, and enhancing each page object with the identified entity strings and types (person, place, organization).
Result
We have a prototypical frontend displaying the entities present per book page: https://luminous-speculoos-e2e07f.netlify.app/
For a reference of the entity categories used, please see the BERT documentation.
A possible next step would be to use the Named Entity Linking Tool SBB-NED from SBB to link the entities found in the texts to Wikidata objects.
Challenge
Identifying colonial traces in early modern travelogues
Description
Zentralbibliothek Zürich provides a text corpus of printed travelogues from the 16th to 19th century. Can you identify and extract colonial traces in these French and German texts? For instance, you could try describe the gaze on the "other" by tracking down mentions of geographic regions, different languages, certain ethnicities or one of the following semantic fields with NLP methods:
- the concept of the "noble savage"
- slavery
- representation of dominance
- hierarchies and structures of control
For instance, the last field could be looked at from an economic, military, cultural or political perspective. Starting from a military perspective, word clusters like "Truppe – Fortifikation – Kriegszug - verfolgt – Beute – Treueeid – schwören – Vasall – kniend - Tribut" could be interesting to search for inside the German texts.
Project
Event finish
Update README.md (@annalauraw)
Update README (@annalauraw)
Project
Update README (@annalauraw)
Requirements (@annalauraw)
Project
Joined the team
Additional JSON files with entities (@annalauraw)
Delete old data structure (@annalauraw)
Script to produce JSON files containing entity info per page (@annalauraw)
Project
4 title files with entities (@annalauraw)
Joined the team
Project
Entities per page (@annalauraw)
Example file with entities (@annalauraw)
Joined the team
Project
Raw text file per title (@annalauraw)
Initial commit (@annalauraw)
Start
Joined the team
Repository updated
Challenge shared
Tap here to review.