Beyond CEFR level prediction of texts in learner corpora: Exploring feedback to learners and learning analytics

A one day workshop at Université de Paris, 30 Oct 2019
organised by Manon Bouyé and Nicolas Ballier
8 Place Paul Ricœur, 75013 Paris
Olympe de Gouges Building, room 115, first floor

Funding

With the financial support of the French Ministry for Europe and Foreign Affairs (Ministères de l’Europe et des affaires étrangères, MEAE) and the French Ministry of Higher Education, Research and Innovation (Ministère de l’Enseignement supérieur, de la Recherche et de l’Innovation, MESRI).

Program

MORNING: discussing our results

9h00 Opening Nicolas Ballier The Ulysse PHC project : aims, data and limitations

9h20 Thomas Gaillat Investigating learner micro-systems and customizing CEFR criterial features : the micro-system feature set and its regex syntax.

9h40 Discussion

10h30 Bernardo Stearns and Annanda Sousa : The user interface prototype demo
We hope to deliver a docker and a github version of our user interface that allows you to paste a text, have a coffee while the text is processed and then get the probability of the text of being of a given CEFR level.

10h45 Discussion

11h15 Bernardo Stearns for Andrew Simpkins : Classifying learner level
Overfitting ? comparison with a graded corpus
As a preliminary step, we have tested our current User Interface with the CEFR ASAG corpus to check whether our model is biased to the A1 level.

11h30 General discussion

12h15 LUNCH BREAK (poster session at Diderot)

Posters displayed at Diderot and on a shared google drive for distant participants.

Thomas Gaillat et al. (Rennes) : Vizualisations of linguistic profiles in learner written productions

Elena Volodina (Gothenburg) : Overview over text-based CEFR research for L2 Swedish: on the intersection between NLP, L2 corpora and CALL

Arnold et al. poster (paper presented at the Cap2018 conference). A paper, adding syntactic complexity metrics to the CAp2018 dataset, was also accepted for this French conference of Machine Learning:
Arnold, T., Ballier, N, Gaillat, T. & Lissón, P., 2018 , Predicting CEFRL levels in learners of English on the basis of metrics and full texts, CAp2018 conference. Université de Rouen. 19-21 juin 2018. Paper 31 in the proceedings of the conference. Arxiv

AFTERNOON: Learner corpora and beyond: collecting and interpreting learning process and product data

A blueprint was circulated pointing out potential future directions.

14h STRAND 1 Adding more metrics/NLP-based methods for error detection / problematic areas for learners

15h STRAND 2 Exploring the relation between Learner corpus annotation, language testing, and individual feedback to learners

16h30 Сoffee break

17h STRAND3 Should we try to link learner corpus and learning analytics research – and what is there to be gained? Ideas for Tracking Development path ? (Fuchs, Götz & Werner 2016) How to develop learner profiles based on student input?

1815 Closing remarks and future plans

1830 End of the workshop

Call for participation

As a closing event of a European-funded project, we invite colleagues to share their ideas about the automatic analysis of learner corpora and how they can be applied towards interlanguage analysis, CEFR level prediction, and error detection – and extended to support individual feedback to learners and learning analytics.

The morning session will present some of the results of this French-Irish project “PHC Ulysse 2019”: the features of the EFCAMDAT corpus we used as the first step for our experiments, the methodology we developed, and our main findings. We will present our prototype of user interface for automatic detection of CEFR levels and discuss aspects such as overfitting of a model based on the French and Spanish components of EFCAMDAT. We will also discuss the shared task we held on a portion of this

We will discuss posters over coffee breaks recapitulating some of the issues.

Admission is free but registration is compulsory (on a first come, first served basis) on Framapad.

The summary of the Ulysse PHC Project can be found here.

Discussants

Discussants at Diderot

Taylor Arnold (University of Richmond) is Assistant Professor of Statistics at the University of Richmond and has a strong interest in NLP as a data scientist and digital humanist, see Arxiv.

Detmar Meurers (University of Tübingen) is Professor of Computational Linguistics and head of the research group on Intelligent Computer-Assisted Language Learning there.

Discussants (videoconference)

Mick O’Donnell (Universidad Autónoma de Madrid, Departamento de Filología Española). See the WricLE corpus, the TREACLE Project and the Adaptive Learning of English Grammar Online.

Elena Volodina (Gothenburg). See the SweLL project – research infrastructure for Swedish as a second language.

Olga Vinogradova (Moscow, National Research University Higher School of Economics). See the Realec project, Russian Error-Annotated Learner English Corpus.

See the 59 features : the link and short description attached.

Contact person

Nicolas Ballier

À lire aussi

Journée de présentation des posters EILA-STEP, 14 janvier 2025

Archives colloques

Début à 10h, Bât. Olympe de Gouges (2ème étage, espace CRL) Université Paris Cité, ALTAE Organisateurs : Pr. Natalie Kübler, Alexandra Mestivier Mcf, Pr Mojca Pecman et Licaho Zhu Mcf (UFR EILA), Pr Bénédicte Menez et Pr Cécile Prigent (IPGP) Journée de présentation...

Interaction in TED Talks – TransQuest Project

Archives colloques

September 13, ODG 830 Université Paris Cité, CLILLAC-ARP Journée d'Études du projet TransQuest Organiser: Agnès Celle Accès au bâtiment Olympe de Gouges Programme 9:30-10:15 Fiona Rossette-Crake, guest speaker, Université Paris Nanterre, CREATED Talks : Oratory, “New...

The sound patterns of Whisper : an informal workshop on audio LLM response to speech stimuli

Archives colloques

29 April 2024 zoom : https://u-paris.zoom.us/j/85172751178?pwd=Zm5tZm42d0FPN0JHVWFVd3E0MkFoZz09 room 134 (first floor) Bâtiment Olympe de Gouges 8 Place Paul Ricoeur 75013 PARIS Accès au bâtiment Olympe de Gouges This informal workshop is intended to discuss various...

JE – « Corpus d’apprenants / corpus d’experts : Quels enseignements pour la caractérisation du discours scientifique? »

Archives colloques

Organisée dans le cadre du projet CarDiBiomed. PROGRAMME Salle 720, Bâtiment Olympe de Gouges 10h00-10h30 Accueil (café) 10h20-10h30 Natalie Kubler Université Paris Cité Ouverture de la J.E /Directrice CLILLAC-ARP 10h30-11h05 Magali Paquot ...

Beyond CEFR level prediction of texts in learner corpora: Exploring feedback to learners and learning analytics

Funding

Program

MORNING: discussing our results

AFTERNOON: Learner corpora and beyond: collecting and interpreting learning process and product data

Call for participation

Discussants

Discussants at Diderot

Discussants (videoconference)

Contact person

À lire aussi

Journée de présentation des posters EILA-STEP, 14 janvier 2025

Interaction in TED Talks – TransQuest Project

The sound patterns of Whisper : an informal workshop on audio LLM response to speech stimuli

JE – « Corpus d’apprenants / corpus d’experts : Quels enseignements pour la caractérisation du discours scientifique? »

JE – « Corpus d’apprenants / corpus d’experts : Quels enseignements pour la caractérisation du discours scientifique? »