Identifying colonial traces in early modern travelogues

05

Identifying colonial traces in early modern travelogues (GLAMhack23 challenge)

Name

Identifying colonial traces in early modern travelogues

Description

Zentralbibliothek Zürich provides a text corpus of printed travelogues from the 16th to 19th century. Can you identify and extract colonial traces in these French and German texts? For instance, you could try describe the gaze on the "other" by tracking down mentions of geographic regions, different languages, certain ethnicities or one of the following semantic fields with NLP methods:

  • the concept of the "noble savage"
  • slavery
  • representation of dominance
  • hierarchies and structures of control

For instance, the last field could be looked at from an economic, military, cultural or political perspective. Starting from a military perspective, word clusters like "Truppe – Fortifikation – Kriegszug - verfolgt – Beute – Treueeid – schwören – Vasall – kniend - Tribut" could be interesting to search for inside the German texts.

This content is a preview from an external site.
 

Challenge

Zentralbibliothek Zürich provides a text corpus of printed travelogues from the 16th to 19th century. Can you identify and extract colonial traces in these French and German texts? For instance, you could try describe the gaze on the "other" by tracking down mentions of geographic regions, different languages, certain ethnicities or one of the following semantic fields with NLP methods:

  • the concept of the "noble savage"
  • slavery
  • representation of dominance
  • hierarchies and structures of control

For instance, the last field could be looked at from an economic, military, cultural or political perspective. Starting from a military perspective, word clusters like "Truppe – Fortifikation – Kriegszug - verfolgt – Beute – Treueeid – schwören – Vasall – kniend - Tribut" could be interesting to search for inside the German texts.



Postcolonial reading of travelogue data

What perspectives on colonialism can be gained by the extracted entities, that are normally not considered in travel narratives?


Listing names of persons

- Shift the focus away from the positioning and the role of the author in the colonial project

- Undertake a schematic grouping: Identify intermediaries\, rand-and-file employees or marginalized voices


Listing the names of organisations

- Map the colonial infrastructure in an arena


Listing the names of historical places

- Compare the different naming for places in time




Process

We used the SBB-NER-Tagger, developed by the Staatsbibliothek zu Berlin (SBB), for Named Entity Recognition in the travelogues. We were able to extract person, organization and place entities from the OCR texts. The SBB-NER-Tagger contains a BERT-based model which has been trained on the SBB collections of early modern prints in German, French and English language, and (at first sight) seems to produce some decent results.

We created one JSON file per text page, starting from the original JSON file of the dataset, and enhancing each page object with the identified entity strings and types (person, place, organization).

Result

We have a prototypical frontend displaying the entities present per book page: https://luminous-speculoos-e2e07f.netlify.app/

For a reference of the entity categories used, please see the BERT documentation.

A possible next step would be to use the Named Entity Linking Tool SBB-NED from SBB to link the entities found in the texts to Wikidata objects.

Edited content version 30

02.10.2023 08:08 ~ annabellewiegart

Edited content version 29

02.10.2023 08:08 ~ annabellewiegart

Event finished

30.09.2023 15:30

Edited content version 28

30.09.2023 13:54 ~ MauriceBonvin

Edited content version 27

30.09.2023 13:53 ~ MauriceBonvin

Edited content version 26

30.09.2023 13:53 ~ MauriceBonvin

Edited content version 25

30.09.2023 13:51 ~ MauriceBonvin

Edited content version 24

30.09.2023 13:50 ~ MauriceBonvin

Edited content version 23

30.09.2023 13:34 ~ MauriceBonvin

Edited content version 22

30.09.2023 13:27 ~ MauriceBonvin

Update README.md (@annalauraw)

Update README (@annalauraw)

Find

30.09.2023 13:15

Edited content version 21

30.09.2023 13:15 ~ annabellewiegart

Edited content version 19

30.09.2023 13:00 ~ annabellewiegart

Edited content version 18

30.09.2023 13:00 ~ annabellewiegart

Edited content version 17

30.09.2023 12:59 ~ annabellewiegart

Edited content version 16

30.09.2023 12:58 ~ annabellewiegart

Edited content version 15

30.09.2023 12:57 ~ annabellewiegart

Edited content version 14

30.09.2023 12:56 ~ annabellewiegart

Update README (@annalauraw)

Requirements (@annalauraw)

Find

30.09.2023 12:41

Joined the team

30.09.2023 12:25 ~ ibrahim_halil_kuray

Edited content version 12

30.09.2023 11:24 ~ annabellewiegart

Additional JSON files with entities (@annalauraw)

Delete old data structure (@annalauraw)

Script to produce JSON files containing entity info per page (@annalauraw)

Find

30.09.2023 11:05

Edited content version 11

30.09.2023 11:05 ~ annabellewiegart

4 title files with entities (@annalauraw)

Find

30.09.2023 08:04

Joined the team

30.09.2023 08:04 ~ MauriceBonvin

Entities per page (@annalauraw)

Example file with entities (@annalauraw)

Find

30.09.2023 07:28

Joined the team

30.09.2023 07:28 ~ Basil

Raw text file per title (@annalauraw)

Initial commit (@annalauraw)

Event started

29.09.2023 09:00

Ask

28.09.2023 15:34

Edited content version 8

28.09.2023 15:34 ~ gaston

Ask

25.09.2023 07:54

Edited content version 5

25.09.2023 07:54 ~ annabellewiegart

Edited content version 4

25.09.2023 07:52 ~ annabellewiegart

Edited content version 3

25.09.2023 07:49 ~ annabellewiegart

Edited content version 2

25.09.2023 07:48 ~ annabellewiegart

Joined the team

25.09.2023 07:46 ~ annabellewiegart

Repository updated

22.09.2023 07:15 ~ annabellewiegart

Challenge posted

22.09.2023 07:15 ~ annabellewiegart
 
Contributed 2 months ago by annabellewiegart for GLAMhack 2023
All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.

GLAMhack 2023