Metadata Enhancement through Deep Learning
Create an app to semi-automatically add “depicts” statements to photographs on Wikimedia Commons.
“depicts” statements are used on Wikimedia Commons to describe the content of images by linking them to corresponding Wikidata items.
By a semi-automatic process, we mean the combination of machine-learning and crowdsourcing approaches in an attempt to maximize both accuracy and efficiency in order to allow for the “tagging” of large numbers of images in a relatively short time.
What already has been done:
Three pretrained out-of-the-box object recognition services (IBM Watson, Clarifai, Microsoft Azure) have been tested on photographs from a photo collection by Leo Wehrli and tested on a second collection. The best results were achieved through a combination of the three services, but the tagging is not reliable enough to allow for fully automatic processing. To facilitate double-checking of the tags by a human, an appropriate user interface should be created. Furthermore, the human feedback could be used to further train the algorithm. This is however not possible in the case of the three out-of-the box object recognition services. Instead, an algorithm should be trained from scratch. Eventually, human feedback could be gathered through a crowdsourcing approach and the algorithm be improved over time in order to maximize both the accuracy and efficiency of the tagging.
Most objects recognized by the three services have been successfully mapped to Wikidata items, and the metadata entries on Wikimedia Commons have been updated accordingly. The mapping between the objects recognized by the three services and Wikidata can partly be done automatically (for this, a python script on the basis of the SPARQL Endpoint was used); partly, the corresponding Wikidata items need to be looked up manually. For most objects there is a corresponding item on Wikidata; in very rare cases items on Wikidata need some cleanup. Some tags extracted by the services concern colors and not objects; they were not included in the data ingested on Wikimedia Commons. For the data ingest, the QuickStatements tool was used. A future app should allow users to write the verified data directly to Wikimedia Commons.
For further information what already has been done, see the presentation "Does AI perform better? - Metadata Enhancement trough Deep Learning", scheduled on 2 June 2020.
What could be done during the hackathon:
Brainstorming what use cases would be of interest using the enriched linked data of such image collections (e.g. "Show me all nature images in Bern 1900")
Train a custom model. See "instructions.pdf" in "Explore") to generate the "depicts" statement of Wikimedia Commons.
Develop a prototype to show a practical application of the whole process (include crowdsourcing on the tasks where it is reasonable). See "imagetaggingV04.py" in "Explore" as example python script of how to get the tags from pretrained models with the embedded APIs of the used service.
Link to video recording of the project presentation during the GLAMhack side programme: https://fhgr.webex.com/recordingservice/sites/fhgr/recording/playback/c3098c0cf14845ddb6f9338f400d4be7 (password: GLAMhack2020)