Parcours / bio-blurb
I taught at the Université de Rouen, Paris 13 and Université Paris Diderot now called ‘Université Paris Cité’. (tbc)
Coordonnées/ personal data
nicolas.ballier AT u-paris DOT fr
bureau 712 (7th floor)
phone : 33+ (0)1 57 27 58 74
Bâtiment Olympe de Gouges
Place Paul Ricoeur 75013 Paris
How to get there
Snail mail address :
Université Paris Cité
Nicolas Ballier – UFR EA
Service courrier – case 7046
Bâtiment Olympe de Gouges
27 rue Jean Antoine de Baif
75025 Paris cedex 13
Présentation générale
Domaines de recherche / Research areas
prosodie de corpus
traduction neuronale
analyse automatique des corpus d’apprenants
humanités numériques
épistémologie de la linguistique (3e révolution de la grammatisation)
My background (PhD) is the epistemology of linguistics. I have a strong interest in eighteenth century representation of the Phonetics of English and in digital humanities. My research investigates the third revolution of grammatisation: what computers do to linguistic data. I have worked on computational analyses of spoken and written learner data. My recent research focuses on interpetability of neural machine translation and on the representation of speech in Whisper, an audio large language model.
Funded European Projects I have been involved in:
DOKTORAND (2012-2016)
Project title – PHD STUDENT Project “Modernization of PhD Study Programmes in Natural Sciences and Humanities at UPJŠ (Doctoral Student) ”, Agreement No. 001/2010/1.2/OPV, ITMS code: 26110230013, European Social Fund
I coordinated the French team (University Paris Diderot at the time) for a project coordinated by P. J. Safarik University in Kosice (other partners : University of Utrecht, University of Jaén and University of Ostrava)
– Main goal: preparation of joint PhD study programmes with partner universities that will allow double PhD diplomas to be awarded to students from partner institutions.
– Expected result: agreements for joint PhD study programme and for double diplomas signed by all partner institutions.
Two jointly supervised PhDs were successfully defended in Kosice as an outcome of this project.
KVARK project (2014-2026): – quality education and skills development for doctoral and post-doctoral students of Pavol Jozef Šafárik University in Košice“, Contract number 020/2012/1.2/OPV, ITMS code: 26110230084. European Social Fund
– Main goal: preparation of joint PhD study programmes with partner universities that will allow double PhD diplomas to be awarded to students from partner institutions.
– Expected result: agreements for joint PhD study programme and for double diplomas signed by all partner institutions.
The selected PhD student for our jointly supervised PhD became an actor and did not finished his PhD.
PHC Ulysses (2019) : Investigating criterial features of learner English and AI-driven automatic language level assessment (ref 43121RJ)
I was PI on the French side for this PHC Ulysses with NGUI University of Galway, involving two PhD students. This project aimed to investigate criterial features in learner English and to build a proof-of-concept system for language level assessment in English. Our research focus was to identify linguistic features and to integrate them within a system based on Artificial Intelligence (AI). The purpose was to create a system to analyse learner English essay writings and map them to specific levels of the language levels of the Common European Framework of Reference for Languages (CEFRL).
https://clillac-arp.u-paris.fr/phc-ulysse-2019/
multitraiNMT (2021) (coordinated by UAB)
I took part in the ErasmusPlus multitraiNMT project near the end of its completion that resulted in the Lang Sci Press volume MT for Everyone
I tested the material during the April summer school and gave a talk on my research during the Autran summer school. We
wrote a joint chapter describing our experiments with the MultiTraiNMT project in French universities after our joint presentation at the TRALOGY III conference.
LT-LIder project (2024-Nov 2026) (coordinated by UAB)
We are teamed with Grenoble as French partners for the ErasmusPlus LT-LIder project.
We are to write a chapter (Compiling and using linguistic data as a resource for translation learning) for the volume dedicated to Machine Translation Literacy.
Projets récents/ Recently (Co)-supervised funded projects
PAPTAN( co-porteur avec Maria-Zimina Poirot)
plateforme pour les expériences en IA et en traduction neuronale. This is our platform for Neural Machine Translation and NLP/AI experiments.
MAKE-NMT VIZ : porteurs Genoble/Swansea (2023-2024)
Responsable d’un WP pour Université Paris Cité. This is a project aimed at investigating Machine Translation interpretability using Visualisation tools.
See for example our paper published in the TAL journal or the project paper we wrote for the EAMT conference.
DLLA Deep Learning for Language Assessment (2022-2023)
porteur du projet avec Helen Yannadoukakis pour KCL. A joint project designed to investigate CEFR levels with keylog data. See our LREC paper that describes our KUPA-KEYS dataset published on Hugging Face.
Neuroviz (2021-2022)
porteur principal: Guillame Wisniewski. A project designed to probe information flux neura
SPECTRANS (2020-2022)
porteur principal de ce projet interdisciplinaire sur la traduction neuronale spécialisée (SPECTRANS)
Github du projet avec les données
Dernières publications
# Accepted papers for 2024
Marco Dinarelli, Dimitra Niaouri, Fabien Lopez, Gabriela Gonzalez-Saez, Mariam Nakhlé, Emmanuelle Esperança-Rodier, Caroline Rossi, Didier Schwab* and Nicolas Ballier (2024) Context-Aware Neural Machine Translation Models Analysis And Evaluation Through Attention, TAL, n° 64, vol. 3, 67-91. Special issue on interpretability. PDF
Velentzas, G., Caines, A., Borgo, R., Pacquetet, E., Hamilton, C., Arnold, T., Nicholls, D., Gaillat, T., Ballier N. and Yannakoudakis, H. (2024). Logging Keystrokes in Writing by English Learners. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 10725-10746).
Nicolas Ballier, Taylor Arnold, Adrien Méli, Tori Thurston, Jean-Baptiste Yunès (accepted) Whisper for L2 speech scoring , to appear in The International Journal of Speech Technology.Springer,
Ballier, N & Méli, A. (accepted) Investigating Acoustic Correlates of Whisper Scoring for L2 Speech Using Forced alignment with the Italian Component of the ISLE corpus, NLP4CALL 2024, Rennes, 24-25 oct 2024, sciencesconf.org:nlp4call2024:569611, to be published in the ACL anthology.
Ballier, N., Burin, L., Namdarzadeh, B., Ng, S, Wright, R. and Yunès, J.-B. (accepted) Probing Whisper Predictions for French, English and Persian Transcriptions, 7th International Conference on Natural Language and Speech Processing, October 19-20, 2024, Trento, Italy, to be published in the ACL Anthology.
# communications
Ballier, N. & Helen Yannakoudakis, H. (2022) Towards crowdsourcing research for learner keylogging data, LCR 2022, Padova, 22-24 sept.
Chamoun, J. & Ballier, N. 2022, Automatic Analysis of Learner Essays based on Complexity Metrics using Machine Learning Algorithms, LCR 2022, Padova, 22-24 sept.
Ballier, N. (2022) Faut-il former à ce que voit le réseau de neurones pour l’entraînement de la traduction ?, colloque Université libre de Bruxelles, Enseigner la traduction et l’interprétation à l’heure neuronale, 28-29 septembre 2022
Namdarzadeh, B. & Ballier,N. 2022 What Does Neural Machine Translation Learn ? A Snapshot from Google Translate & DeepL (2021-February 2022), colloque Université libre de Bruxelles, Enseigner la traduction et l’interprétation à l’heure neuronale, 28-29 septembre 2022. https://tradital.ltc.ulb.be/medias/fichier/2022-colloque-tradital-programme-online_1660741236130- pdf
Ballier, Nicolas (2022), Traduire les dislocations de l’oral avec la traduction neuronale, Le cas des dislocations à gauche dans le CFPP
du Corpus de Français Parlé Parisien (CFPP) des années 2000, colloque TROL – Traduire l’oralité à l’ère de l’IA,
Université de Turin – 5-6 décembre 2022
# articles de conférences
Cyriel Mallart, Andrew J. Simpkin, Rmi Venant, Nicolas Ballier, Bernardo Stearns, Jen-Yu Li, Thomas Gaillat (2023) Exploring a New Grammatico-functional Type of Measure as Part of a Language Learning Expert System. BEA@ACL 2023: 466-476
Adrien Méli, Steven Coats, Nicolas Ballier (2023) Methods for Phonetic Scraping of Youtube Videos. ICNLSP 2023: 244-249
Nicolas Ballier, Adrien Méli, Maelle Amand, Jean-Baptiste Yunès (2023) Using Whisper LLM for Automatic Phonetic Diagnosis of L2 Speech, a Case Study with French Learners of English. ICNLSP 2023: 282-292
Lichao Zhu, Maria Zimina, Maud Bénard, Behnoosh Namdar, Nicolas Ballier, Guillaume Wisniewski, Jean-Baptiste Yunès (2023) Investigating Techniques for a Deeper Understanding of Neural Machine Translation (NMT) Systems through Data Filtering and Fine-tuning Strategies. WMT 2023: 275-281
Fabien Lopez, Gabriela González Sáez, Damien Hansen, Mariam Nakhlé, Behnoosh Namdarzadeh, Nicolas Ballier, Marco Dinarelli, Emmanuelle Esperança-Rodier, Sui He, Sadaf Mohseni, Caroline Rossi, Didier Schwab, Jun Yang, Jean-Baptiste Yunès, Lichao Zhu (2023) The MAKE-NMTVIZ System Description for the WMT23 Literary Task. WMT 2023: 287-295
Namdarzadeh, B. & Ballier, N. (2022a) The Neural Machine Translation of Dislocations, Antonis Botinis (ed.) Proceeings of 13th International Conference of Experimental Linguistics (EXLING), Université Paris Cité,, 17-19 October 2022, 121-125.
Namdarzadeh, B., Ballier, N., Zhu, L., Wisniewski, G., and Yunès, J.-B. (2022b) Toward a Test Set of Dislocations in Persian for Neural Machine Translation, NSUR Proceedings, ACL
Wisniewski, G., Zhu, L., Yunès, J.-B. & Ballier, N. (2022) La robustesse de la traduction neuronale: les systèmes de traduction automatique neuronale à l’épreuve de la reproductibilité de l’expérience, Actes de la journée d’étude
sur la robustesse des systèmes de TAL,
Avec le soutien de l’ATALA et du laboratoire STIH, Caio Corrovet Gaël Lejeune (éditeurs),
25 novembre, ATALA, 29-32
https://www.atala.org/sites/default/files/robustal2022.pdf
Tighidet, Z. and Ballier, N. (2022) Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision Tree: the Case of Postnominal “that” in Noun Complement Clauses vs.Relative Clauses, ALTA2022, ACL anthology
Wisniewski, G. Zhu, L. Ballier, N. and Yvon, F. (2022) Analyzing Gender Translation Errors to Identify Information Flows between the Encoder and Decoder of an NMT System, BlackboxNLP2022, EMNLP2022,
https://preview.aclanthology.org/emnlp-22-ingestion/2022.blackboxnlp-1.13/
Nicolas Ballier, Jean-Baptiste Yunès, Guillaume Wisniewski, Lichao Zhu, Maria Zimina-Poirot (2022)
The SPECTRANS System Description for the WMT22 Biomedical Task, WMT22.
Publications sur ACL anthology
CV sur HAL
https://cv.hal.science/nicolas-ballier?langChosen=fr
DDLP (Digital Bibliography & Library Project):
https://dblp.org/pid/203/5539.html
Publications sur HAL
- [hal-04848426] The acoustic and prosodic correlates of SO when used as a discourse marker: From corpus-based analysis to more...par ano.nymous@ccsd.cnrs.fr.invalid (Adrien Méli) le décembre 19, 2024 à 12:20
[...]
- [hal-04782648] Overview of the linguistic features: creating measures – Joint presentationpar ano.nymous@ccsd.cnrs.fr.invalid (Nicolas Ballier) le novembre 24, 2024 à 11:14
[...]
- [hal-04781585] Enhancing Translation Quality: A Comparative Study of Fine-Tuning and Prompt Engineering in Dialog-Oriented...par ano.nymous@ccsd.cnrs.fr.invalid (Lichao Zhu) le novembre 16, 2024 à 16:43
<div><p>For this shared task, we have used several machine translation engines to produce translations (en ⇔ fr) by fine-tuning a […]
- [hal-04781595] Enhancing Translation Quality: A Comparative Study of Fine-Tuning and Prompt Engineering in Dialog-Oriented...par ano.nymous@ccsd.cnrs.fr.invalid (Lichao Zhu) le novembre 13, 2024 à 19:04
[...]
> consulter toutes les publications
Vous serez redirigé vers HAL !