A one day workshop at Université de Paris, 30 Oct 2019 
organised by Manon Bouyé and Nicolas Ballier
8 Place Paul Ricœur, 75013 Paris
Olympe de Gouges Building, room 115, first floor

Funding

With the financial support of the French Ministry for Europe and Foreign Affairs (Ministères de l’Europe et des affaires étrangères, MEAE) and the French Ministry of Higher Education, Research and Innovation (Ministère de l’Enseignement supérieur, de la Recherche et de l’Innovation, MESRI).

Program

MORNING: discussing our results

9h00 Opening Nicolas Ballier The Ulysse PHC project : aims, data and limitations

9h20 Thomas Gaillat Investigating learner micro-systems and customizing CEFR criterial features : the micro-system feature set and its regex syntax.

9h40 Discussion

10h30 Bernardo Stearns and Annanda Sousa : The user interface prototype demo
We hope to deliver a docker and a github version of our user interface that allows you to paste a text, have a coffee while the text is processed and then get the probability of the text of being of a given CEFR level.

10h45 Discussion

11h15 Bernardo Stearns for Andrew Simpkins : Classifying learner level
Overfitting ? comparison with a graded corpus
As a preliminary step, we have tested our current User Interface with the CEFR ASAG corpus to check whether our model is biased to the A1 level.

11h30 General discussion

12h15 LUNCH BREAK (poster session at Diderot)

Posters displayed at Diderot and on a shared google drive for distant participants.

Thomas Gaillat et al. (Rennes) : Vizualisations of linguistic profiles in learner written productions

Elena Volodina (Gothenburg) : Overview over text-based CEFR research for L2 Swedish: on the intersection between NLP, L2 corpora and CALL

Arnold et al. poster (paper presented at the Cap2018 conference). A paper, adding syntactic complexity metrics to the CAp2018 dataset, was also accepted for this French conference of Machine Learning:
Arnold, T., Ballier, N, Gaillat, T. & Lissón, P., 2018 , Predicting CEFRL levels in learners of English on the basis of metrics and full texts, CAp2018 conference. Université de Rouen. 19-21 juin 2018. Paper 31 in the proceedings of the conference. Arxiv

AFTERNOON: Learner corpora and beyond: collecting and interpreting learning process and product data

A blueprint was circulated pointing out potential future directions.

14h STRAND 1 Adding more metrics/NLP-based methods for error detection / problematic areas for learners

15h STRAND 2 Exploring the relation between Learner corpus annotation, language testing, and individual feedback to learners

16h30 Сoffee break

17h STRAND3 Should we try to link learner corpus and learning analytics research – and what is there to be gained? Ideas for Tracking Development path ? (Fuchs, Götz & Werner 2016) How to develop learner profiles based on student input?

1815 Closing remarks and future plans

1830 End of the workshop 

Call for participation

As a closing event of a European-funded project, we invite colleagues to share their ideas about the automatic analysis of learner corpora and how they can be applied towards interlanguage analysis, CEFR level prediction, and error detection – and extended to support individual feedback to learners and learning analytics.

The morning session will present some of the results of this French-Irish project “PHC Ulysse 2019”: the features of the EFCAMDAT corpus we used as the first step for our experiments, the methodology we developed, and our main findings. We will present our prototype of user interface for automatic detection of CEFR levels and discuss aspects such as overfitting of a model based on the French and Spanish components of EFCAMDAT. We will also discuss the shared task we held on a portion of this

We will discuss posters over coffee breaks recapitulating some of the issues.

Admission is free but registration is compulsory (on a first come, first served basis) on Framapad.

The summary of the Ulysse PHC Project can be found here.

Discussants

Discussants at Diderot 

Taylor Arnold (University of Richmond) is Assistant Professor of Statistics at the University of Richmond and has a strong interest in NLP as a data scientist and digital humanist, see Arxiv.

Detmar Meurers (University of Tübingen) is Professor of Computational Linguistics and head of the research group on Intelligent Computer-Assisted Language Learning there.

Discussants (videoconference)

Mick O’Donnell (Universidad Autónoma de Madrid, Departamento de Filología Española). See the WricLE corpus, the TREACLE Project and the Adaptive Learning of English Grammar Online.

Elena Volodina (Gothenburg). See the SweLL project – research infrastructure for Swedish as a second language.

Olga Vinogradova (Moscow, National Research University Higher School of Economics). See the Realec project, Russian Error-Annotated Learner English Corpus.

See the 59 features : the link and short description attached.

Contact person

Nicolas Ballier

À lire aussi

DLLA Closing event

DLLA Closing event

30 November - 1st of December Deep learning for language assessment closing event rooms 715 (Th morning) and 720 Bâtiment Olympe de Gouges 8 Place Paul Ricoeur 75013 PARIS Accès au bâtiment Olympe de Gouges We take the opportunity of this closing event to present and...

ALOES 2024 Pre-conference Workshop

ALOES 2024 Pre-conference Workshop

    ALOES 2024 pre-conference workshop Pre-conference Workshop on Internet Spoken Corpora of English Thursday 28  March l 2024   Programme   14h 00  Opening   session 1. Youtube scraping: three methods 14h 15 Adrien Méli the PEASYV pipeline 14h 45 Peter...

Rencontres des jeunes traductologues 2023

Rencontres des jeunes traductologues 2023

Traduction et interprétation : entre théorie et pratique 4 mai 2023, de 9h30 à 18h Bâtiment Olympe de Gouges, salle 720 Comité d'organisation : Maud Bénard, Marie Bouchet, Anastasia Buturlakina (Université Paris Cité); Bérengère Denizeau, Valentine Pieplu, Sara Salmi...