Swedish Language Technology Conference (SLTC) 2024

Linköping University, Sweden

The Tenth Swedish Language Technology Conference (SLTC) will take place on 27th–29th of November 2024 at Linköping University, Sweden, with the main conference on Wednesday–Thursday 27th-28th of November and workshops on Friday, 29th of November. All researchers and students working on NLP and related fields are invited to participate.

Important Dates

Registration

The official registration date has passed. For late registrations, please contact us first at sltc2024@googlegroups.com.

Program

The preliminary program is available here. The main conference will start on the 27. at 13:00 and end on the 28. at 17:00. For workshop programs, we refer to their respective websites.

We will be in the B-building at Campus Valla, best accessed via entrance 27 or 27C. All oral presentations take place in the Ada Lovelace lecture hall, and poster sessions and coffee breaks in Ljusgården.

Workshops

SLTC 2024 traditionally hosts selected workshops on relevant topics. This year, we have four workshops taking place on the 29th of November:

If you can arrive early on the 27th, you could consider attending this seminar at LiU's TEMA institute: The Rawness of the Data.

Invited Speakers

We will have two keynotes by our exciting invited speakers:

david

David Samuel is a PhD student in the Language Technology Group at the University of Oslo. His team's submission won the CoNLL 2023 Shared Task on training a language model on a fixed data budget of 10 or 100 million tokens, known as the BabyLM challenge.

Talk Abstract: The BabyLM Challenge represents a unique opportunity in language model research: by limiting training data to just 10–100M words - comparable to what children encounter during language acquisition - it creates an accessible environment for exploring fundamental questions about language learning. Unlike typical language model research that requires massive computing resources and trillions of training tokens, BabyLM's constrained setting enables rapid experimentation and architectural innovation. We present our findings from multiple submissions to this challenge, including a novel GPT-BERT hybrid that combines masked and causal language modeling objectives, and an innovative layer-selective transformer that allows each layer to choose its inputs from previous layers. We also investigated latent bootstrapping as an alternative to traditional self-supervision. Our experiments demonstrate that architectural innovations can dramatically improve data efficiency, with our hybrid models and layer-selective transformers achieving strong performance on linguistic benchmarks despite using orders of magnitude less data than conventional language models. These results point to promising directions for developing more efficient language models while highlighting the value of constrained experimental settings in advancing our understanding of language learning.

Tiago Pimentel is a Postdoc at ETH Zürich and holds a PhD from the University of Cambridge. His work on information theory as a tool to understand linguistics and language models has been groundbreaking, with important applications such as probing representations for linguistic structure and improving sampling strategies in generative models.

Title: Duplicating Vocabularies to Analyse Generalisation in Language Models

Talk Abstract: Despite many recent advancements in language modelling, there is still much we do not understand about their inner workings. The non-linearity of language models (LMs)—along with the complexity of natural language—make it hard to isolate factors of interest we want to investigate. In this talk, we explore how duplicating a language model’s vocabulary can create targeted and controlled experiments, which we leverage to address two research questions. First, we will leverage vocabulary duplication to investigate lexical generalisation in LMs; particularly, we will look at the effect of near duplicate subwords (e.g., vocabulary items such as Now and now) in a LM's performance. Second, we will analyse cross-linguistic generalisation, investigating how language data (im)balance affects performance.

Call for Extended Abstracts

Papers are invited on all theoretical, practical and applied aspects of language technology, including natural language processing, computational linguistics, speech technology and neighbouring areas. Papers can describe completed or ongoing research, as well as practical applications of language technology, and may be combined with system demonstrations. More Information.

The conference does not publish any proceedings but accepted contributions will be made available on the conference web page as extended abstracts. Hence, it is possible to submit abstracts related to work that has been, or will be, published elsewhere, as long as this is compatible with the conditions of the respective publication channels.

Organizers

SLTC 2024 is organized by Linköping University and supported by WARA Media and Language.

Local organization committee:

Past events: 2022 (KTH), 2020 (GU), 2018 (SU), 2016 (UmU), 2014 (UU), 2012 (LU), 2010 (LiU), 2008 (KTH), 2006 (GU).