Demo

Audio Analysis Challenge

(07) Retrieve as much information as possible from an audio collection, through various Machine Learning/Natural Language Processing methods

🌘🌗🌖🌕🌔🌓🌒 Team 7 is retrieving information from the audio collection "Radio Pleine Lune" through various machine learning & natural language processing methods > @AContestataires @memoriav_ch #GLAMhack2022 @supsi_ch // https://t.co/tGb1S7P3cB https://t.co/mKbJ34Pa8K pic.twitter.com/yh6JM1rItk
— Opendata.ch (@OpendataCH@mastodon.social) (@OpendataCH) November 5, 2022

Challenge

Retrieve as much information as possible from an audio collection, through various Machine Learning/Natural Language Processing methods:

speech-to-text
speech emotion recognition / sentiment analysis (from the transcription text or directly on audio, if doable): classify and tag speech/speakers’ sentiment based on their polarity (positive, negative, or neutral) or beyond (different emotions)
eventually data visualizations based on the results (e.g., https://50-jahre-hitparade.ch/analysis/ - from where the chart above comes from)

Dataset

Collection “Radio pleine lune”: Radio Pleine Lune, was a feminist radio program in the Geneva region that started with pirate broadcasts in 1979. The collection has been deposited in the Archives contestataires in Geneva, which collects, preserves, and valorizes documents from social movements of the second half of the 20th century. The program existed from 1980 to 1999. It is of particular importance for the Archives contestataires insofar as it gives an account of the various media forms used by protest movements in the second half of the 20th century. The materials represent broadcasts, thus direct recordings in the studio, as well as some rush documents, essentially interviews.

Information about the collection:

http://inventaires.archivescontestataires.ch/index.php/fonds-radio-pleine-lune https://memobase.ch/fr/recordSet/acc-001

Metadata:

https://api.memobase.ch/record/advancedSearch?q=isPartOf:mbrs:acc-001 Metadata are in French. Most relevant fields are the title, the abstract and the keywords (hasSubject).

Data: 443 audio recordings.

Possible issues:

not enough training data
chaotic corpus (multiple voices, live speaking)

Needs: developers with experience with audio analysis algorithms; eventually, web designers.

✨ Demo

All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody.

The contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.

Previous
GLAMhack 2022
Next project