My current research interests roughly lie on the intersection of machine learning, representation learning, and interpretable deep learning. More specifically, I am interested in:

Designing learning algorithms that can construct explanations for their predictions in terms of high-level concepts (e.g., think of explaining "cats" based on high-level features like "whiskers", "paws", "tails", etc); and,
Understanding how one could use those algorithms to allow experts to communicate feedback to a model after the model has been trained (i.e., exploring designing models that can ask and receive expert help to improve their performance).

Research

I am broadly interested in the following four core research subfields within AI and ML:

Explainable Artificial Intelligence (XAI)
Interpretable Deep Neural Architectures
Concept-based Explainability
Representation Learning

Themes

Machine Learning and Artificial Intelligence

Teaching

Explainable Artificial Intelligence (MPhil ACS, Part III)

Lent 2023 and 2024: Explainable Artificial Intelligence (MPhil Submodule for 255: Advanced Topics in Machine Learning)
Michaelmas and Lent 2021 - 2023: Discrete Mathematics

Current MPhil/Part III Project Proposals

Project 1 - Is Concept Leakage Truly a Bad Thing?

This project will be co-supervised by Prof Mateja Jamnik.

As Deep Neural Networks (DNNs) continue to outperform competing methods in many fields, there has been a growing concern regarding the ethical and legal use of DNNs in sensitive tasks (e.g., healthcare). These concerns have inspired the development of interpretable-by-construction neural architectures that explain their explanations using "high-level concepts" (e.g., "has paws", "has whiskers", etc). A crucial framework for constructing these architectures is a Concept Bottleneck Model (CBM) [CBM]. CBMs are neural architectures that can be decomposed into two sequential subcomponents: (i) a concept encoder network, which maps input features to a vector (i.e., a bottleneck) where each activation represents whether a specific concept is "off" or "on" (e.g., one of the neurons could represent "has whiskers" while another represents "has a tail"), and (ii) a label predictor network, which maps the concept bottleneck to an output label for a task of interest (e.g., whether the image is that of a "dog" or a "cat"). The utility of these models arises as, when they predict a task, they must always first generate a bottleneck of concept activations that serves as an explanation for the output prediction (e.g., if the CBM predicted that an image has a "cat" in it, a possible concept explanation is that it found the concepts "whiskers", "long ears", and "paws" to be active in the input image). Furthermore, these models enable so-called "concept interventions", where experts can correct mispredicted concepts during deployment and improve the downstream performance of the model as a byproduct.

Follow-up work studying CBMs [Leakage] suggests that these models are prone to "information leakage": concept representations learnt by CBM-like models are prone to encoding unnecessary information about each other ("cross-concept leakage") and about the downstream task ("downstream leakage"). This results in "impure" representations that, as the argument usually goes, are likely not to be as "interpretable" as they are believed to be. This line of argumentation has led to various approaches that try to mitigate this leakage [GlanceNet, LeakageFree, Orthogonality]. Nevertheless, although an argument can be constructed about why cross-concept leakage may be detrimental for interventions on CBMs [Impurity], it is still highly unclear why leakage, particularly downstream leakage, is a negative thing in the first place. If anything, preliminary works have shown that increasing downstream leakage may improve model performance and better interventions [IntCEM].

In this project, you will explore the true nature of information leakage by trying to answer whether information leakage is truly undesirable for CBM-like models. This is a very open-ended question and research project that could be tackled empirically (by methodologically quantifying leakage and understanding when and how it may impact performance metrics we care about) or theoretically (by formalising leakage and deriving a framework that quantifies its effect on model performance). Answering this question would settle once and for all a very active and lively debate ongoing in the XAI community, potentially leading to an impactful publication. An initial research direction may involve exploring controlling leakage by borrowing ideas from the bias mitigation literature (where models are trained to learn representations that do not capture undesired information [AdversarialMitigation]) and exploring the effects of varying leakage on downstream tasks and concept representations. For inspiration, you can start by looking at the references provided below to read on some recent developments.

Through this project, we hope a candidate may:

Develop their knowledge within explainable AI (XAI), causal learning, and representation learning, all highly relevant and growing fields of study.
Get hands-on practice with code development, deep learning architecture design, training and deployment (highly desirable in academia and industry).
Get comfortable with analysing and designing interpretable pipelines/architectures which can have practical use in critical-sensitive tasks where interpretability is paramount.

The ideal candidate for this project will have a strong background in deep learning and mathematics (or a strong drive and the mathematical maturity to pick up these concepts quickly). Some familiarity with traditional XAI (feature importance methods, saliency maps, prototype explainability, etc), or concept learning/representation learning is a big plus.

References

Project 2 - Learning to Admit and Explain Ignorance

This project will be co-supervised by Prof Mateja Jamnik.

(If you’ve read the introduction of my previous proposed project, you may skip this introduction)

A fundamental limitation of CBMs and similar models (e.g. Label-free CBMs [LabelFree], Post-hoc CBMs [PostHoc], Concept Embedding Models [CEMs]) is that they always have to generate a prediction for every concept, even if that concept is in principle not available or predictable from the input (e.g., it may be occluded). This leads to these models learning to accidentally overexploit inter-concept relationships when predicting their values [Locality] and, more importantly, failing to indicate when they cannot simply predict a concept.

Previous work has looked at this gap by introducing probabilistic concept representations where uncertainties may be computed [ProbCBM] or using a side channel to override the CBM's output [Abstain]. These works, however, still require humans to judge whether a model is abstaining/failing to predict a concept (e.g., they must decide when the uncertainty is "too" high for a model to be considered faulty). More importantly, they all fail to explain the source of the uncertainty (is it uncertain because the model is dealing with a difficult input or because the concept is not predictable at all?). This latter limitation is particularly important when considering concept interventions: when providing test-time feedback to a CBM, we want to avoid even considering intervening on a concept whose value we cannot predict in the first place. Yet current intervention policies, i.e., models that learn to decide which concepts to intervene on next (e.g., [IntCEM]), completely ignore this. All of these gaps, therefore, represent a crucial set of hurdles that still need to be solved for CBMs to become practical.

In this project, you will explore how to design CBM-like models that can abstain from predicting concepts while clarifying why this abstention was made (e.g., was it because of difficulty or impossibility?). This is a very open-ended question whose answer can take multiple forms (Bayesian approaches may be helpful here, as well as traditional deep learning approaches). In particular, we encourage students to explore this question within models that use Large Language Models (LMMs) to extract their concepts [LabelFree, LaBo] due to their high potential for impact and usability. For inspiration, you can start by looking at the references provided below to read on some recent developments.

Through this project, we hope a candidate may:

Develop their knowledge within explainable AI (XAI), probabilistic machine learning, and representation learning, all highly relevant and growing fields of study.
Get hands-on practice with code development, deep learning architecture design, training and deployment (highly desirable in academia and industry).
Get comfortable with analysing and designing interpretable architectures that can be practical in critical-sensitive tasks where interpretability is paramount.

The ideal candidate for this project will have a strong background in deep learning and mathematics (or a strong drive and the mathematical maturity to pick up these concepts quickly). Some familiarity with traditional XAI (feature importance methods, saliency maps, prototype explainability, etc), Bayesian and probabilistic Modelling, or concept learning/representation learning is a big plus.

References

Past MPhil/Part III Project Proposals

Here are some projects I previously supervised during the 2023-2024 academic year

Ain’t Nobody Got Time For That: Budget-aware Concept Intervention Policies

This project will be co-supervised by Prof Mateja Jamnik and Dr Zohreh Shams

As Deep Neural Networks (DNNs) continue to outperform competing methods in a growing number of fields, there has been a growing concern regarding the ethical and legal use of DNNs in sensitive tasks (e.g., healthcare). These concerns have inspired the development of interpretable-by-construction neural architectures that explain their explanations using “high-level concepts” (e.g., “has paws”, “has whiskers”, etc). A crucial framework for constructing these architectures is what are referred to as Concept Bottleneck Models (CBMs) [1]. CBMs are neural architectures which can be decomposed into two sequential subcomponents: (i) a concept encoder network which maps input features to a vector (i.e., a bottleneck) in which each every activation represents whether a specific concept is “off” or “on” (e.g., one of the neurons could represent “has whiskers” while another represents “has tail”), and (ii) a label predictor network which maps the concept bottleneck to an output label for a task of interest (e.g., whether the image is that of a “dog” or a “cat”). The utility of these models arises as, when they predict a sample’s task, they must always first generate a bottleneck of concept activations which serve as an explanation for the CBM’s output prediction (e.g., if the CBM predicted that an image has a “cat” in it, a possible concept explanation is that it found the concepts “whiskers”, “long ears”, and “paws” to be active in the input image).

A key property of CBMs, and the core property we will explore in this project, is that they enable expert “concept interventions” at test time: during inference, an expert interacting with the CBM can analyse the concept explanation it generates for a prediction and can then correct one or multiple mispredicted concepts before passing the updated bottleneck to the CBM’s label predictor. This enables the CBM to potentially update its original prediction to consider the expert knowledge provided to it at test time, leading to significant improvements in performance when deployed in conjunction with an expert [1, 2]. Further work [3], however, has shown that the order in which concepts are intervened at test-time can have significant effects on their effectiveness. Therefore, recent research [4] has begun to consider designing intervention policies that indicate which concepts one should request from a user to maximise their potential impact on the output prediction. Nevertheless, these frameworks have been preliminary and have only explored greedy policies, which have been shown to lag behind known optimal policies (even greedy ones).

In this project, you will be exploring how to design and learn such intervention policies so that you can (1) close the gap between the resulting policy and a known optimal policy, and (2) incorporate real-world constraints into the intervention process such as cost budgets and uncertainty (real-world experts don’t have infinite patience, certainty, and time!). An initial research direction may involve exploring non-greedy intervention policies (say via deep reinforcement learning) that can take into account predefined “intervention budgets” (the number of interventions an expert may be able to afford to do) to produce a list of concepts whose values may help reducing the CBM’s predictive uncertainty the most. For inspiration, you can start by looking at the references provided below to read on some recent developments in this area as well as some related work in active feature acquisition [5].

Through this project we hope a candidate may (i) develop their own knowledge within the fields of explainable AI (XAI), reinforcement learning, and representation learning, all highly relevant and growing fields of study; (ii) get some hands-on practice with code development as well as deep learning architecture design, training, and deployment (highly desirable in both academia and industry); and (iii) get comfortable with analysing and designing interpretable architectures which can have practical use in critical-sensitive tasks where interpretability is paramount.

The ideal candidate for this project will have a strong background in deep learning and mathematics (or a strong drive and the mathematical maturity to pick up these concepts quickly). Some familiarity with traditional XAI (feature importance methods, saliency maps, prototype explainability, etc), deep reinforcement learning, or concept learning/representation learning is a big plus.

References

[1] Koh, Pang Wei, et al. "Concept bottleneck models." International Conference on Machine Learning. PMLR, 2020.

[2] Espinosa Zarlenga, Mateo, et al. "Concept embedding models: Beyond the accuracy-explainability trade-off." Advances in Neural Information Processing Systems 35 (2022): 21400-21413.

[3] Shin, Sungbin, et al. "A closer look at the intervention procedure of concept bottleneck models." International Conference on Machine Learning. PMLR, 2023.

[4] Chauhan, Kushal, et al. "Interactive concept bottleneck models." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 5. 2023.

[5] Li, Yang, and Junier Oliva. "Active feature acquisition with generative surrogate models." International Conference on Machine Learning. PMLR, 2021.

Train it, Extend it, Retrain it: Iterative Concept Discovery

This project will be co-supervised by Prof Mateja Jamnik

As Deep Neural Networks (DNNs) continue to outperform competing methods in a growing number of fields, there has been a growing concern regarding the ethical and legal use of DNNs in sensitive tasks (e.g., healthcare). These concerns have inspired the development of interpretable-by-construction neural architectures that explain their explanations using “high-level concepts” (e.g., “has paws”, “has whiskers”, etc). Concept Embedding Models (CEMs) [1] are a recent example of such neural architectures where a DNN is trained first to learn a set of “concept embeddings” that represent the activation or inactivation of known concepts and then use those embeddings to predict a task of interest. The utility of these models arises from the fact that at inference time they first predict a set of concept embeddings whose semantical alignment can be used to explain the CEM’s output prediction (e.g., if the CEM predicted that an image has a “cat” in it, a possible concept explanation is that it first predicted embeddings that indicate that the concepts “whiskers”, “long ears”, and “paws” are active in the input image).

A crucial limitation of CEMs is that one requires a set of concept annotations at train-time to learn/align concept embeddings correctly. Nevertheless, such annotations are often costly to attain or, in some circumstances, even impossible, as the complete set of relevant concepts for a given task is sometimes ill-defined. Nevertheless, it has been hypothesized that CEMs may be capturing concepts not provided at train time as part of their learnt embedding spaces [1]. If true, this allows the construction of more complete explanations using concepts not included during training if such concepts can be extracted from the learnt concept embeddings and assigned semantics via some post-hoc expert analysis. In this project, you will explore this precise question by trying to understand whether unseen concepts are indeed encoded as part of the concept embeddings generated by a CEM and, if so, how such concepts may be extracted and used to construct more complete explanations for a CEM’s predictions. Particularly, this project will explore whether these discovered concepts can be iteratively reintroduced into a CEM’s training process as training-time concepts after they have been discovered through some post-hoc analysis (e.g., via clustering or dimensionality reduction). If successful, this will enable the creation of more interpretable models and the ability to discover valuable concepts beyond those provided as training annotations. For inspiration, you can start by looking at the references provided below to read on recent developments in this area and related work in which models similar to CEMs were shown to discover some concepts in tabular domains automatically [2].

Through this project we hope a candidate may (i) develop their own knowledge within the fields of explainable AI (XAI) and representation learning, both highly relevant and growing fields of study; (ii) get some hands-on practice with code development as well as deep learning architecture design, training, and deployment (highly desirable in both academia and industry); and (iii) get comfortable with analysing and designing interpretable architectures which can have practical use in critical-sensitive tasks where interpretability is paramount.

The ideal candidate for this project will have a strong background in deep learning and mathematics (or a strong drive and the mathematical maturity to pick up these concepts quickly). Some familiarity with traditional XAI (feature importance methods, saliency maps, prototype explainability, etc) or concept learning/representation learning is a big plus.

References

[1] Espinosa Zarlenga, Mateo, et al. "Concept embedding models: Beyond the accuracy-explainability trade-off." Advances in Neural Information Processing Systems 35 (2022): 21400-21413.

[2] Espinosa Zarlenga, Mateo, et al. "TabCBM: Concept-based Interpretable Neural Networks for Tabular Data." Transactions on Machine Learning Research (2023).

[3] Oikarinen, Tuomas, et al. "Label-Free Concept Bottleneck Models." ICLR (2023).

[4] Kim, Eunji, et al. "Probabilistic Concept Bottleneck Models." ICML (2023).

Publications

For an up-to-date list of publications, please refer to my Google Scholar page.

Mateo Espinosa Zarlenga

Research

Themes

Teaching

Current MPhil/Part III Project Proposals

Project 1 - Is Concept Leakage Truly a Bad Thing?

References

Project 2 - Learning to Admit and Explain Ignorance

References

Past MPhil/Part III Project Proposals

Ain’t Nobody Got Time For That: Budget-aware Concept Intervention Policies

This project will be co-supervised by Prof Mateja Jamnik and Dr Zohreh Shams

References

Train it, Extend it, Retrain it: Iterative Concept Discovery

This project will be co-supervised by Prof Mateja Jamnik

References

Publications

Contact Details

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge