< Retour à la liste
profil pic BallierNicolas

Nicolas Ballier



Parcours / bio-blurb

I taught at the Université de Rouen, Paris 13 and Université Paris Diderot now called ‘Université Paris Cité’. (tbc)

Coordonnées/ personal data

nicolas.ballier AT u-paris DOT fr
bureau 712 (7th floor)
phone : 33+ (0)1 57 27 58 74
Bâtiment Olympe de Gouges
Place Paul Ricoeur 75013 Paris
How to get there

Snail mail address :
Université Paris Cité
Nicolas Ballier – UFR EA
Service courrier – case 7046
Bâtiment Olympe de Gouges
27 rue Jean Antoine de Baif
75025 Paris cedex 13


Présentation générale

Présentation générale

Domaines de recherche / Research areas

prosodie de corpus
traduction neuronale
analyse automatique des corpus d’apprenants
humanités numériques
épistémologie de la linguistique (3e révolution de la grammatisation)

My background (PhD) is the epistemology of linguistics. I have a strong interest in eighteenth century representation of the Phonetics of English and in digital humanities. My research investigates the third revolution of grammatisation: what computers do to linguistic data. I have worked on computational analyses of spoken and written learner data. My recent research focuses on interpetability of neural machine translation and on the representation of speech in Whisper, an audio large language model.

Funded European Projects I have been involved in:

DOKTORAND (2012-2016)
Project title – PHD STUDENT Project “Modernization of PhD Study Programmes in Natural Sciences and Humanities at UPJŠ (Doctoral Student) ”, Agreement No. 001/2010/1.2/OPV, ITMS code: 26110230013, European Social Fund
I coordinated the French team (University Paris Diderot at the time) for a project coordinated by P. J. Safarik University in Kosice (other partners : University of Utrecht, University of Jaén and University of Ostrava)
– Main goal: preparation of joint PhD study programmes with partner universities that will allow double PhD diplomas to be awarded to students from partner institutions.
– Expected result: agreements for joint PhD study programme and for double diplomas signed by all partner institutions.
Two jointly supervised PhDs were successfully defended in Kosice as an outcome of this project.

KVARK project  (2014-2026): – quality education and skills development for doctoral and post-doctoral students of Pavol Jozef Šafárik University in Košice“, Contract number 020/2012/1.2/OPV, ITMS code: 26110230084. European Social Fund
– Main goal: preparation of joint PhD study programmes with partner universities that will allow double PhD diplomas to be awarded to students from partner institutions.
– Expected result: agreements for joint PhD study programme and for double diplomas signed by all partner institutions.
The selected PhD student for our jointly supervised PhD became an actor and did not finished his PhD.

PHC Ulysses (2019) : Investigating criterial features of learner English and AI-driven automatic language level assessment  (ref 43121RJ)
I was PI on the French side for this PHC Ulysses with NGUI University of Galway, involving two PhD students. This project aimed to investigate criterial features in learner English and to build a proof-of-concept system for language level assessment in English. Our research focus was to identify linguistic features and to integrate them within a system based on Artificial Intelligence (AI). The purpose was to create a system to analyse learner English essay writings and map them to specific levels of the language levels of the Common European Framework of Reference for Languages (CEFRL).

multitraiNMT (2021) (coordinated by UAB)
I took part in the ErasmusPlus multitraiNMT project near the end of its completion that resulted in the Lang Sci Press volume MT for Everyone
I tested the material during the April summer school and gave a talk on my research during the Autran summer school. We
wrote a joint chapter describing our experiments with the MultiTraiNMT project in French universities after our joint presentation at the TRALOGY III conference.

LT-LIder project (2024-Nov 2026) (coordinated by UAB)
We are teamed with Grenoble as French partners for the ErasmusPlus LT-LIder project.
We are to write a chapter (Compiling and using linguistic data as a resource for translation learning) for  the volume dedicated to Machine Translation Literacy.

Projets récents/ Recently  (Co)-supervised funded projects

PAPTAN (co-porteur avec Maria-Zimina Poirot)
plateforme pour les expériences en IA et en traduction neuronale. This is our platform for Neural Machine Translation and NLP/AI experiments.

MAKE-NMT VIZ : porteurs Genoble/Swansea (2023-2024)
Responsable d’un WP pour Université Paris Cité. This is a project aimed at investigating Machine Translation interpretability using Visualisation tools.
See for example our paper published in the TAL journal or the project paper we wrote for the EAMT conference.

 DLLA Deep Learning for Language Assessment (2022-2023)
porteur du projet avec Helen Yannadoukakis pour KCL. A joint project designed to investigate CEFR levels with keylog data. See our LREC paper that describes our KUPA-KEYS dataset published on Hugging Face.

Neuroviz (2021-2022)
porteur principal: Guillame Wisniewski. A project designed to probe information flux neura

SPECTRANS (2020-2022)
porteur principal de ce projet interdisciplinaire sur la traduction neuronale spécialisée (SPECTRANS)
Github du projet avec les données

Dernières publications


Gabriela González Sáez, Fabien Lopez, Mariam Nakhlé, James Turner, Nicolas Ballier, Marco Dinarelli, Emmanuelle Esperança-Rodier, Sui He, Caroline Rossi, Didier Schwab, Jun Yang: The MAKE-NMTViz Project: Meaningful, Accurate and Knowledge-limited Explanations of NMT Systems for Translators. EAMT (2) 2024: 12-13 PDF

Gabriela González Sáez, Mariam Nakhlé, James Turner, Fabien Lopez, Nicolas Ballier, Marco Dinarelli, Emmanuelle Esperança-Rodier, Sui He, Raheel Qader, Caroline Rossi, Didier Schwab, Jun Yang: Exploring NMT Explainability for Translators Using NMT Visualising Tools. EAMT (1) 2024: 396-410 PDF

Bernardo Stearns, Nicolas Ballier, Thomas Gaillat , Andrew Simpkin , John P. McCrae (2024) Evaluating the Generalisation of an Artificial Learner
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning, 199-208. PDF

Marco Dinarelli, Dimitra Niaouri, Fabien Lopez, Gabriela Gonzalez-Saez, Mariam Nakhlé, Emmanuelle Esperança-Rodier, Caroline Rossi, Didier Schwab* and Nicolas Ballier (2024) Context-Aware Neural Machine Translation Models Analysis And Evaluation Through Attention, TAL, n° 64, vol. 3, 67-91. Special issue on interpretability. PDF

Velentzas, G., Caines, A., Borgo, R., Pacquetet, E., Hamilton, C., Arnold, T., Nicholls, D., Gaillat, T., Ballier N. and Yannakoudakis, H. (2024). Logging Keystrokes in Writing by English Learners. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 10725-10746). PDF

Nicolas Ballier, Taylor Arnold, Adrien Méli, Tori Thurston, Jean-Baptiste Yunès (2024) Whisper for L2 speech scoring, The International Journal of Speech Technology.Springer,

Ballier, N & Méli, A. (2024) Investigating Acoustic Correlates of Whisper Scoring for L2 Speech Using Forced alignment with the Italian Component of the ISLE corpus, NLP4CALL 2024, Rennes, 24-25 Oct 2024, 20-32, published in the ACL anthology.

Ballier, N., Burin, L., Namdarzadeh, B., Ng, S, Wright, R. and Yunès, J.-B. (2024) Probing Whisper Predictions for French, English and Persian Transcriptions, 7th International Conference on Natural Language and Speech Processing, October 19-20, 2024, Trento, Italy, 129-138, published in the ACL Anthology.

# communications

Ballier, N. & Helen Yannakoudakis, H. (2022) Towards crowdsourcing research for learner keylogging data, LCR 2022, Padova, 22-24 sept.
Chamoun, J. & Ballier, N. 2022, Automatic Analysis of Learner Essays based on Complexity Metrics using Machine Learning Algorithms, LCR 2022, Padova, 22-24 sept.

Ballier, N. (2022) Faut-il former à ce que voit le réseau de neurones pour l’entraînement de la traduction ?, colloque Université libre de Bruxelles, Enseigner la traduction et l’interprétation à l’heure neuronale, 28-29 septembre 2022

Namdarzadeh, B. & Ballier,N. 2022 What Does Neural Machine Translation Learn ? A Snapshot from Google Translate & DeepL (2021-February 2022), colloque Université libre de Bruxelles, Enseigner la traduction et l’interprétation à l’heure neuronale, 28-29 septembre 2022. https://tradital.ltc.ulb.be/medias/fichier/2022-colloque-tradital-programme-online_1660741236130- pdf

Ballier, Nicolas (2022), Traduire les dislocations de l’oral avec la traduction neuronale, Le cas des dislocations à gauche dans le CFPP
du Corpus de Français Parlé Parisien (CFPP) des années 2000, colloque TROL – Traduire l’oralité à l’ère de l’IA,
Université de Turin – 5-6 décembre 2022

# articles de conférences

Cyriel Mallart, Andrew J. Simpkin, Rmi Venant, Nicolas Ballier, Bernardo Stearns, Jen-Yu Li, Thomas Gaillat (2023) Exploring a New Grammatico-functional Type of Measure as Part of a Language Learning Expert System. BEA@ACL 2023: 466-476

Adrien Méli, Steven Coats, Nicolas Ballier (2023) Methods for Phonetic Scraping of Youtube Videos. ICNLSP 2023: 244-249

Nicolas Ballier, Adrien Méli, Maelle Amand, Jean-Baptiste Yunès (2023) Using Whisper LLM for Automatic Phonetic Diagnosis of L2 Speech, a Case Study with French Learners of English. ICNLSP 2023: 282-292

Lichao Zhu, Maria Zimina, Maud Bénard, Behnoosh Namdar, Nicolas Ballier, Guillaume Wisniewski, Jean-Baptiste Yunès (2023) Investigating Techniques for a Deeper Understanding of Neural Machine Translation (NMT) Systems through Data Filtering and Fine-tuning Strategies. WMT 2023: 275-281

Fabien Lopez, Gabriela González Sáez, Damien Hansen, Mariam Nakhlé, Behnoosh Namdarzadeh, Nicolas Ballier, Marco Dinarelli, Emmanuelle Esperança-Rodier, Sui He, Sadaf Mohseni, Caroline Rossi, Didier Schwab, Jun Yang, Jean-Baptiste Yunès, Lichao Zhu (2023) The MAKE-NMTVIZ System Description for the WMT23 Literary Task. WMT 2023: 287-295

Namdarzadeh, B. & Ballier, N. (2022a) The Neural Machine Translation of Dislocations, Antonis Botinis (ed.) Proceeings of 13th International Conference of Experimental Linguistics (EXLING), Université Paris Cité,, 17-19 October 2022, 121-125.
Namdarzadeh, B., Ballier, N., Zhu, L., Wisniewski, G., and Yunès, J.-B. (2022b) Toward a Test Set of Dislocations in Persian for Neural Machine Translation, NSUR Proceedings, ACL

Wisniewski, G., Zhu, L., Yunès, J.-B. & Ballier, N. (2022) La robustesse de la traduction neuronale: les systèmes de traduction automatique neuronale à l’épreuve de la reproductibilité de l’expérience, Actes de la journée d’étude
sur la robustesse des systèmes de TAL,
Avec le soutien de l’ATALA et du laboratoire STIH, Caio Corrovet Gaël Lejeune (éditeurs),
25 novembre, ATALA, 29-32

Tighidet, Z. and Ballier, N. (2022) Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision Tree: the Case of Postnominal “that” in Noun Complement Clauses vs.Relative Clauses, ALTA2022, ACL anthology

Wisniewski, G. Zhu, L. Ballier, N. and Yvon, F. (2022) Analyzing Gender Translation Errors to Identify Information Flows between the Encoder and Decoder of an NMT System, BlackboxNLP2022, EMNLP2022,

Nicolas Ballier, Jean-Baptiste Yunès, Guillaume Wisniewski, Lichao Zhu, Maria Zimina-Poirot (2022)
The SPECTRANS System Description for the WMT22 Biomedical Task, WMT22.

Publications sur ACL anthology

CV sur HAL



DDLP (Digital Bibliography & Library Project):


Publications sur HAL


> consulter toutes les publications

Vous serez redirigé vers HAL !