- PhD Student
I am a PhD student at the University of Cambridge (Gonville & Caius College). I previously completed my BA and MEng in Computer Science & Linguistics at Gonville & Caius College, obtaining a “starred First” and a Distinction respectively. I specialise in Machine Learning and Natural Language Processing, exploring alternatives to Transformer-based Large Language Models (LLMs). My academic work spans Machine Learning and Cognitive Science, with a focus on Explainable and Interpretable Machine Learning, and fundamental questions about the human capacity for natural language.
My research is primarily concerned with engineering scalable, data-efficient Small Language Models, specifically hybrid state-space/Transformer architectures, and cognitively-inspired AI. This emerging research paradigm aims to enhance the cognitive capabilities of cutting-edge computational systems within a cognitively plausible environment.
Supervised by Professor Paula Buttery, in my PhD, I am working toward creating cognitively-inspired computational systems, leveraging insights from human cognition to benchmark and interpret state-of-the-art Language Models, and build more adaptive Language Models for small-scale data regimes.
Biography
Before my PhD, I completed the Linguistics and Computer Science Triposes at the University of Cambridge, where I had the opportunity to work on a funded internship in the ALTA Institute with Prof Paula Buttery, Dr Andrew Caines, Dr Russell Moore and Dr Thiemo Wambsganss, as a Research Assistant on a code-switching project with Dr Li Nguyen, and as a research student with Prof Nigel Collier. My past experience includes work on Multimodal Vision-Language Models in the Language Technology Lab with Prof Nigel Collier and Fangyu Liu (now at Google DeepMind). I have probed vision-language models, such as CLIP, investigating their semantic representations, and explored Nearest Neighbour Algorithms for Offline Imitation Learning (IL). I have also researched Explainable AI, Argumentation Mining, and Shortcut Learning in Natural Language Inference. Within Linguistics, I have interests in Typology (and typological applications in multilingual NLP), Syntactic Theory (especially Neo-Emergentism and Biolinguistics), and Morphological and Phonological Theory.
Outside of academia, I lead Per Capita Media, Cambridge University's newest independent publication supported by a team of students and academics from Cambridge and other academic institutions nationwide, including the University of Oxford and the University of the Arts London. I founded the publication in 2024, with the generous support of Lady Stothard, Dr Ruth Scurr FRSL. My journalistic output has seen me work with The One Show, and liaise with journalists from The Sunday Times and BBC Radio 5Live. I am also involved in student policy think tanks, as the Head of Policy at The Wilberforce Society, the UK's oldest student think tank in the UK based at the University of Cambridge, and organise several speaker events throughout the University. In the past, I have helped organise policy events with the Editor of the BBC Russian Service and the Foreign Minister of Sri Lanka.
Research
Small Language Models: The viability of 'Small LMs' as a coherent research programme relies on a successful consideration of efficiency, acceleration and architectural questions in pretraining.
- Our group released PicoLM, the Cambridge Small Language Model & Learning Dynamics Framework in March 2025 to investigate these research questions. Check out the YouTube Video put together by Zeb Goriely: Introducing PicoLM | YouTube.
- I have worked on dynamic tokenization and supported similar projects in the NLIP group and the L65 (Geometric Deep Learning) course on the MPhil ACS.
Cognitively-Inspired AI: The emergent capabilities of Transformers are subject to a great deal of interpretability work, however there is a clear mismatch between human language acquisition (which is data-efficient in many regards) and the data-hungriness of Transformers. I am personally very invested in research questions that draw on insights from language acquisition in the context of the BabyLM Shared Task, leading and working as part of teams working on the Multimodal, Multilingual and Interaction Tracks of the Shared Task.
I also have interests in alignment, interpretability and multilingual NLP. See my Cambridge Language Sciences page for more information on my research interests in Cognitive Science and Linguistics.
Teaching
Guest Lecturer and Teaching Assistant
Guest Lecturer and Teaching Assistant for L95 (ACS/Part III) Introduction to Natural Language Syntax and Parsing (Prof Paula Buttery, Dr Fermin Moscoso del Prado Martin).
Teaching Assistant for Machine Learning & Real World Data (Part IA, Computer Science Tripos)
Delivered a lecture on Language Model Evaluation and Mechanistic Interpretability (Nov 2024).
Thesis & Research Supervision
Supervisor for MPhil Dissertation on Small Language Models (Vision-Language Models) and Learning Dynamics.
Other
Co-organised a Phonological Theory Discussion Group with Prof Bert Vaux, 2022-23.
Supervisions
Machine Learning and Bayesian Inference (Part II, Computer Science Tripos)
Formal Models of Language (Part IB, Computer Science Tripos)
Artificial Intelligence (Part IB, Computer Science Tripos)
Probability (Part IA, Computer Science Tripos)
College Supervisor for Linguistics Tripos (Gonville & Caius College) – Linguistic Theory (Part IIB, Linguistics Tripos), Part I Linguistics Tripos.
Professional Activities
Co-organiser of the Natural Language & Information Processing (NLIP) Seminars 2024.
Reviewer for the CoNLL BabyLM Shared Task (in EMNLP 2024).
Collaborating with Kinds of Intelligence Programme in the Leverhulme Centre of the Future of Intelligence (CFI) on cognitively-inspired benchmarking and interpretability.
Publications
Key publications:
Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies.
Suchir Salhan, Richard Diehl-Martinez, Zebulon Goriely, Paula Buttery
In Preparation for CoNLL BabyLM Challenge (Paper Track), 2024 (Accepted, Poster)
On the Potential for Maximising Minimal Means in Transformer Language Models: A Dynamical Systems Perspective.
Suchir Salhan
In Cambridge Occasional Papers in Linguistics, Department of Theoretical & Applied Linguistics , 2023
Other publications:
LLMs “off-the-shelf” or Pretrain-from-Scratch? Recalibrating Biases and Improving Transparency using Small-Scale Language Models.
Suchir Salhan, Richard Diehl-Martinez, Zebulon Goriely, Andrew Caines, Paula Buttery
Learning & Human Intelligence Group, Department of Computer Science & Technology, 2024