ALOES 2024 pre-conference workshop

Pre-conference Workshop on Internet Spoken Corpora of English

Thursday 28  March l 2024

 
Programme
 
14h 00  Opening
 
session 1. Youtube scraping: three methods

14h 15 Adrien Méli the PEASYV pipeline

14h 45 Peter Uhrig (Erlangen)  A pipeline for the creation of multimodal corpora from YouTube (zoom presentation)

 

15h 15 Coffee break

session 2. Hands-on session : Using Python notebooks to stream your data
15h 45  convenant : Steven Coats (University of Oulu)

Participants need to bring their own laptops and to have a Google account as we will use a Google Colaboratory notebook

 

session  3. Post-processing and corpus curation

17h Steven Coats : CoANZSE/CoANZSE Audio: The Corpus of Australian and New Zealand Spoken English on CLARIN

 

 

4. round table and conclusion

17h30 Richard Wright : some extra requirements for corpora  : the ATAROS corpus
discussants : Sylvain Navarro (UPCité), Rory Turnbull (Newcastle) & Richard Wright (Seattle)

 
 
 
 
 
For inquiries please contact nicolas.ballier@u-paris.fr and sylvain.navarro@u-paris.f
 

Participation to the workshop is free of charge, but participants must register https://forms.gle/wXBpjiJ8dHatCcdYA

Coming to us: Bâtiment Olympe de Gouges, Place Paul Ricoeur  75013 Paris (building 10 on the map)

Room 720 (7th floor)

Ask for a badge at the information desk (“accueil”)

 

Zoom link: https://u-paris.zoom.us/j/84241112950?pwd=c0syNGhFTk9BZC9iNGx2MFFrZW1hQT09

 

 

 

Accès au bâtiment Olympe de Gouges

 

 

Main Pointers :  
  • Steven Coats. 2023b. A new corpus of geolocated ASR transcripts from Germany. Language Resources and Evaluation. https://doi.org/10.1007/s10579-023-09686-9
  • Steven Coats. 2023c. A pipeline for the large-scale acoustic analysis of streamed content. In Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities (CMC- Corpora 2023), page 51–54. Mannheim: Leibniz- Institut für Deutsche Sprache.
  • Méli, Adrien, Steven Coats and Nicolas Ballier. (2023). Methods for phonetic scraping of Youtube videos. In Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), 244–249. https://aclanthology.org/volumes/2023.icnlsp-1/
  • Adrien Méli and Nicolas Ballier. 2023. PEASYV: A procedure to obtain phonetic data from subtitled videos. Proceedings of the International Congress of Phonetic Sciences, pages 3211 – 3215 https://hal.science/hal-04319467/document
  • Adrien’s presentation: https://adrienmeli.xyz/aloes/#/title-slide

 

  • Dykes, N., Wilson, A., & Uhrig, P. (2023, September). A Pipeline for the Creation of Multimodal Corpora from YouTube Videos. In Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing (pp. 1-5). https://aclanthology.org/2023.limo-1.1.pdf
 

 

 

Contact person : Nicolas Ballier nicolas.ballier@u-paris.fr

À lire aussi