SLTC 2024

OBS: Room Change: The Swedish Workshop on Conversational AI will be in Systemet.

The Tenth Swedish Language Technology Conference (SLTC) will take place on 27^th–29^th of November 2024 at Linköping University, Sweden, with the main conference on Wednesday–Thursday 27^th-28^th of November and workshops on Friday, 29^th of November. All researchers and students working on NLP and related fields are invited to participate.

Program

The main conference program is available here. The main conference will start on the 27. at 13:00 and end on the 28. at 17:00. For workshop programs, we refer to their respective websites.

Location: We will be in the B-building at Campus Valla, best accessed via entrance 27 or 27C. All oral presentations take place in the Ada Lovelace lecture hall, and poster sessions and coffee breaks in Ljusgården.

Social Event: The social event of the main conference will be in Flygvapenmuseum with a dinner and a guided tour. A bus to the venue will depart from Vasavägen close to Scandic Frimurarehotellet in the city centre at 17:30 and from Campus Valla (near Ada Lovelace and Ljusgården, about here) at 17:40. If you attend the first day of the conference, the program ends at 17:00, so you will probably join us at the Campus Valla departure.

Lunch: For lunch, convenient options are Universitetsklubben, Pegs & Tails and Kårallen. If you're looking for takeaway options there are also Falafelhuset and Pressbyrån. If you have your own or takeaway food, you can eat in Ljusgården.

Workshops

This year, we have ~~four~~ three workshops taking place on the 29th of November:

Computational Social Science and Language Technology. Room: John von Neumann.
Swedish Workshop on Conversational AI. Room: Systemet
~~Towards the Positronic Brain: Workshop on Embodied Language Processing and Multimodal Interaction~~ (cancelled)
Applications of Universal Dependencies. Room: Alan Turing.

If you can arrive early on the 27th, you could consider attending this seminar at LiU's TEMA institute: The Rawness of the Data.

Invited Speakers

We will have two keynotes by our exciting invited speakers:

David Samuel is a PhD student in the Language Technology Group at the University of Oslo. His team's submission won the CoNLL 2023 Shared Task on training a language model on a fixed data budget of 10 or 100 million tokens, known as the BabyLM challenge.

Talk Abstract: The BabyLM Challenge represents a unique opportunity in language model research: by limiting training data to just 10–100M words - comparable to what children encounter during language acquisition - it creates an accessible environment for exploring fundamental questions about language learning. Unlike typical language model research that requires massive computing resources and trillions of training tokens, BabyLM's constrained setting enables rapid experimentation and architectural innovation. We present our findings from multiple submissions to this challenge, including a novel GPT-BERT hybrid that combines masked and causal language modeling objectives, and an innovative layer-selective transformer that allows each layer to choose its inputs from previous layers. We also investigated latent bootstrapping as an alternative to traditional self-supervision. Our experiments demonstrate that architectural innovations can dramatically improve data efficiency, with our hybrid models and layer-selective transformers achieving strong performance on linguistic benchmarks despite using orders of magnitude less data than conventional language models. These results point to promising directions for developing more efficient language models while highlighting the value of constrained experimental settings in advancing our understanding of language learning.

Tiago Pimentel is a Postdoc at ETH Zürich and holds a PhD from the University of Cambridge. His work on information theory as a tool to understand linguistics and language models has been groundbreaking, with important applications such as probing representations for linguistic structure and improving sampling strategies in generative models.

Title: Duplicating Vocabularies to Analyse Generalisation in Language Models

Talk Abstract: Despite many recent advancements in language modelling, there is still much we do not understand about their inner workings. The non-linearity of language models (LMs)—along with the complexity of natural language—make it hard to isolate factors of interest we want to investigate. In this talk, we explore how duplicating a language model’s vocabulary can create targeted and controlled experiments, which we leverage to address two research questions. First, we will leverage vocabulary duplication to investigate lexical generalisation in LMs; particularly, we will look at the effect of near duplicate subwords (e.g., vocabulary items such as Now and now) in a LM's performance. Second, we will analyse cross-linguistic generalisation, investigating how language data (im)balance affects performance.

Important Dates

Workshop Proposal Submission Deadline: ~~23.8.~~
Workshop Notification of Acceptance: ~~26.8.~~
Extended Abstracts Submission Deadline: ~~15.9.~~
Extended Abstracts Notification of Acceptance: ~~14.10.~~
Camera-Ready Abstracts: ~~1.11.~~
Conference: 27-28.11.
Workshops: 29.11.

Registration

The official registration date has passed. For late registrations, please contact us first at sltc2024@googlegroups.com.

Call for Extended Abstracts

Papers are invited on all theoretical, practical and applied aspects of language technology, including natural language processing, computational linguistics, speech technology and neighbouring areas. Papers can describe completed or ongoing research, as well as practical applications of language technology, and may be combined with system demonstrations. More Information.

The conference does not publish any proceedings but accepted contributions will be made available on the conference web page as extended abstracts. Hence, it is possible to submit abstracts related to work that has been, or will be, published elsewhere, as long as this is compatible with the conditions of the respective publication channels.

Organizers

SLTC 2024 is organized by Linköping University and supported by WARA Media and Language.

Local organization committee:

Lars Ahrenberg
Arne Jönsson
Marco Kuhlmann
Jenny Kunz

Past events: 2022 (KTH), 2020 (GU), 2018 (SU), 2016 (UmU), 2014 (UU), 2012 (LU), 2010 (LiU), 2008 (KTH), 2006 (GU).

Swedish Language Technology Conference (SLTC) 2024

Linköping University, Sweden