NLIP Seminar Series

Typological Diversity in NLP: What, Why and a Way Forward

Friday, 7 March, 2025 - 12:00 to 13:00

To justify the generalisability of multilingual NLP research, multilingual language technology is frequently evaluated on ‘typologically diverse’ language selections. Yet, what this means often remains vague. In this talk, I first discuss what typological diversity means in NLP, and why it matters. Then, I introduce a...

Assessing language-specific capabilities of LLMs: Lessons from Swedish NLP

Friday, 21 February, 2025 - 11:00 to 12:00

Abstract: In this talk, I discuss benchmarking and interpreting large language models in the context of Swedish. I present a selection of work from my PhD thesis, which analyze LLMs Swedish-specific capabilities in different areas: English-Swedish language transfer, multi-task benchmarking on Swedish NLU and targeted...

Formal syntactic theory in the current NLP landscape

Friday, 14 March, 2025 - 12:00 to 13:00

Natural language processing used to rely on formal methods in its early days, and this included formal theories of syntax where sentence structure was of relevance. In the statistical era, the focus shifted to annotation schemes such as Penn Treebank and Universal Dependencies, which still rely on formal theory in their...

Preference Alignment, with Reference Mismatch, and without Reference Models

Friday, 31 January, 2025 - 12:00 to 13:00

Abstract: In this talk, I'll cover two recent papers for preference alignment: Odds-Ratio Preference Optimisation (ORPO, EMNLP 2024), discussing the role of the reference model for preference alignment (e.g. DPO, RLHF), and Margin-aware Preference Optimization (under review @ CVPR), thinking about the risks of reference...

Natural Language meets Control Theory

Wednesday, 12 March, 2025 - 16:00 to 17:00

Note this seminar has been rescheduled from its original date and will be taking place at 4 pm. Control theory is fundamental in the design and understanding of many natural and engineered systems, from cars and robots to power networks and bacterial metabolism. It studies dynamical systems—systems whose properties evolve...

Metrized Deep Learning: Fast & Scalable Training

Friday, 14 February, 2025 - 12:00 to 13:00

We build neural networks in a modular and programmatic way using software libraries like PyTorch and JAX. But optimization theory has not caught up to the flexibility of this paradigm, and practical advances in neural net optimization are largely heuristics driven. In this talk we argue that, if we are to treat deep...

Scansion-based Lyric Generation

Friday, 22 November, 2024 - 12:00 to 13:00

Abstract: Yiwen Chen's study looks at generating lyrics in Mandarin that match well with both the melody and the tonal contour of the language. The approach uses mBART and treats lyrics generation as a sequence-to-sequence (seq2seq) task. Instead of generating lyrics directly from the melody, which is the usual way, the...

The Past, Present and Future of Tokenization

Friday, 29 November, 2024 - 12:00 to 13:00

Abstract: Current large language models (LLMs) predominantly use subword tokenization. They see text as chunks (called "tokens") made up of individual words, or parts of words. This has a number of consequences. For example, LLMs often struggle with seemingly simple tasks involving character-level knowledge, like counting...

Linguistics in the Age of Large Language Models.

Friday, 15 November, 2024 - 12:00 to 13:00

Recent chatbots have amazed everyone with their human-like language output. However, their relationship to research in the linguistics is opaque; even their inventors do not fully understand why they are so successful. Further, when probed in depth, some of their outputs are less human-like than first impressions would...

10 Slides on Human Feedback

Friday, 8 November, 2024 - 12:00 to 13:00

In this talk, Max Bartolo will share a brief overview of the critical role human feedback plays in enhancing Large Language Model (LLM) performance and aligning model behaviours to human expectations. We will delve into key aspects of human feedback, examining some of its requirements, benefits, and challenges. We will...

Typological Diversity in NLP: What, Why and a Way Forward

Assessing language-specific capabilities of LLMs: Lessons from Swedish NLP

Formal syntactic theory in the current NLP landscape

Preference Alignment, with Reference Mismatch, and without Reference Models

Natural Language meets Control Theory

Metrized Deep Learning: Fast & Scalable Training

Scansion-based Lyric Generation

The Past, Present and Future of Tokenization

Linguistics in the Age of Large Language Models.

10 Slides on Human Feedback

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge