The 9th International Workshop on Computational Linguistics for Uralic Languages IWCLUL 2024

The 9th International Workshop on Computational Linguistics for Uralic Languages (IWCLUL 2024) will be organized as a self-standing event. The proceedings of the event will be published in the ACL anthology. The conference will take place in November 28-29, 2024 in Helsinki, Finland at Metropolia University of Applied Sciences.

The purpose of IWCLUL is to bring together researchers working on computational approaches to Uralic languages (e.g. Finnish, Hungarian, Estonian, Võro, the Sámi languages, Komi (Zyrian, Permyak), Mordvin (Erzya, Moksha), Mari (Hill, Meadow), Udmurt, Nenets (Tundra, Forest), Enets, Nganasan, Selkup, Mansi, Khanty, Veps, Karelian (Olonets), Karelian, Ingrian (Izhorian), Votic, Livonian and Ludic). All Uralic languages exhibit rich morphological structure, which makes processing them challenging for state-of-the-art computational linguistic approaches, the majority also suffer from a lack of resources and many are endangered. Appropriate topics include (but are not limited to):

  • Multilingual approaches in NLP presenting work on at least one Uralic language
  • LLMs and their use in the context of (endangered) Uralic languages
  • Position papers
  • Parsers, analysers and processing pipelines of Uralic languages
  • Lexical databases, electronic dictionaries
  • Finished end-user applications aimed at Uralic languages, such as spelling or grammar checkers, machine translation or speech processing
  • Evaluation methods and gold standards, tagged corpora, treebanks
  • Reports on language-independent or unsupervised methods as applied to Uralic languages
  • Surveys and review articles on subjects related to computational linguistics for one or more Uralic languages
  • Any work that aims at combining efforts and reducing duplication of work
  • How to elicit activity from the language community, agitation campaigns, games with a purpose

Paper submission

We solicit original and unpublished work related to NLP approaches for Uralic languages. Short papers can be up to 4 pages in length and long papers up to 8 pages. Both submission formats can have an unlimited number of pages for references. All submissions must follow the ACL stylesheet (Overleaf template).

The submissions must be anonymous and they will be peer-reviewed by our program committee. The peer review is double blind.

Papers must be submitted using OpenReview by the submission deadline. At least one of the authors of an accepted paper must attend the event to present the paper. There will be no registration/publication fees.

Accepted papers (short and long) will be published in the proceedings that will appear in the ACL Anthology. Accepted papers will also be given an additional page to address the reviewers’ comments. The length of a camera ready submission can then be 5 pages for a short paper and 9 for a long paper with an unlimited number of pages for references.

You may also contribute to the event by submitting a lightning talk. Lightning talks are submitted as 750-word abstracts. Lightning talks are suited for discussing ideas or presenting work in progress. The abstracts will be published in a lightning proceedings.

Schedule

Thursday 28.11.

10:00-10:10 Workshop opening  
10:10-11:00 Lightning talks  
11:00-12:00 Oral session 1  
11:00-11:20 Aspect Based Sentiment Analysis of Finnish Neighborhoods: Insights from Suomi24 Laleh Davoodi, Anssi Öörni, Ville Harkke
11:20-11:40 Political Stance Detection in Estonian News Media Lauri Lüüsi, Uku Kangur, Roshni Chakraborty, Rajesh Sharma
11:40-12:00 Scaling Sustainable Development Goal Predictions across Languages: From English to Finnish Melany Macias, Leo Huovinen, Lev Kharlashkin, Mika Hämäläinen
12:00-13:00 Lunch  
13:00-14:20 Oral session 2  
13:00-13:20 Multilingual Approaches to Sentiment Analysis of Texts in Linguistically Diverse Languages: A Case Study of Finnish, Hungarian, and Bulgarian MIKHAIL KRASITSKII, Olga Kolesnikova, Grigori Sidorov,Liliana Chanona Hernandez, Alexander Gelbukh
13:20-13:40 Towards standardized inflected lexicons for the Finnic languages Jules Bouton
13:40-14:00 DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT Mika Hämäläinen
14:00-14:20 Keeping Up Appearances—or how to get all Uralic languages included into bleeding edge research and software: generate, convert, and LLM your way into multilingual datasets and more! Flammie A Pirinen
14:20-14:40 Coffee break  
14:40-16:00 Oral session 3  
14:40-15:00 Towards the speech recognition for Livonian Valts Ernštreits
15:00-15:20 Using large language models to transliterate endangered Uralic languages Niko Tapio Partanen
15:20-15:40 Specialized Monolingual BPE Tokenizers for Uralic Languages Representation in Large Language Models Iaroslav Chelombitko, Aleksey Komissarov
15:40-16:00 Leveraging Transformer-Based Models for Predicting Inflection Classes of Words in an Endangered Sami Language Khalid Alnajjar, Jack Rueter, Mika Hämäläinen

Friday 29.11

10:00-11:00 Keynote Jack Rueter
11:00-12:00 Oral session 4  
11:00-11:20 Compressing Noun Phrases to Discover Mental Constructions in Corpora – A Case Study for Auxiliaries in Hungarian Balázs Indig, Tímea Borbála Bajzát
11:20-11:40 Challenges and Opportunities in Revitalizing Uralic Languages in the Age of Technology Alexander Nazarenko
11:40-12:00 Applying the transformer architecture on the task of headline selection for Finnish news texts Maria Adamova, Maria Khokhlova
12:00-13:00 Lunch  
13:00-14:20 Oral session 5  
13:00-13:20 Kola Saami Christian Text Corpus Michael Rießler
13:20-13:40 Prune or Retrain: Optimizing the Vocabulary of Multilingual Models for Estonian Aleksei Dorkin, Taido Purason, Kairit Sirts
13:40-14:00 Universal-WER: Enhancing WER with Segmentation and Weighted Substitution for Varied Linguistic Contexts Samy Ouzerrout
14:00-14:20 On Erzya and Moksha Corpora and Analyzer Development, ERME-PSLA 1950s Jack Rueter, Olga V. Erina, Nadezhda Kabaeva
14:20-14:40 Coffee break  
14:40-15:40 SIGUR business meeting  

Remote attendance

We aim for an inclusive event and we understand that some people have difficulties to travel. If you have a valid reason why you cannot attend the event in person (visa issues, health issues etc.) you may present your paper remotely.

How to attend?

No advance registration is needed. Interested people can just walk in on workshop days.

Important dates:

Paper submission (full and short): October 25, 2024 (extended)
Notification of acceptance: November 3, 2024
Camera ready deadline: November 10, 2024
Workshop: November 28-29, 2024

All times are Anywhere on Earth (AoE).

Venue

Metropolia University of Applied Sciences, Helsinki

Hall AR128, Arabia campus Hämeentie 135 D, 00560, Helsinki, Finland

Arabia is known for art and design. Read more about the neighborhood on Helsinki City’s website.

Organizers

  • Mika Hämäläinen, Metropolia University of Applied Sciences

  • Flammie Pirinen, UiT The Arctic University of Norway

  • Melany Macias, Metropolia University of Applied Sciences

  • Mario Crespo Avila, Complutense University of Madrid

In case of questions, you can send an email to mika.hamalainen@metropolia.fi

Program committee

  • Fejes László - Hungarian Research Centre for Linguistics
  • Heiki-Jaan Kaalep - University of Tartu
  • Gunta Kļava - University of Latvia
  • Oleg Belyaev - Lomonosov Moscow State University
  • Trond Trosterud - The Arctic University of Norway
  • Linda Wiechetek - The Arctic University of Norway
  • Khalid Alnajjar - F-Secure Oyj
  • Niko Partanen - University of Helsinki
  • Jack Rueter - University of Helsinki
  • Miikka Silfverberg - University of British Columbia
  • Janne Kauttonen - Haaga-Helia University of Applied Sciences
  • Michael Rießler - University of Eastern Finland
  • Aleksei Dorkin - University of Tartu
  • Jeremy Bradley - University of Vienna
  • Xinqiao Zhang - UC San Diego
  • Irina Khomchenkova - Lomonosov Moscow State University
  • David Dale - Meta
  • Timofey Arkhangelskiy - University of Hamburg
  • Viktor Martinović - University of Vienna