skip to content

Department of Computer Science and Technology

Friday, 31 January, 2020 - 12:00 to 13:00
Haim Dubossarsky (University of Cambridge)
SS03, Computer Laboratory

Recent years have seen the rise of machine learning models in NLP research, which are applied inter alia, to research on questions motivated by linguistic theory. Indeed, it has now become relatively easy to model and to test research problems. The ease with which models can be deployed comes at the risk of careless use, which may potentially lead to unreliable findings and ultimately even hinder our ability to extend our knowledge. Such misuse may stem, for example, from unfamiliarity with the assumptions and hypotheses that are implicit to the models, or inherent confounds that demand experimental controls.
In this talk, I will focus on problems that are specific to linguistically-motivated questions (e.g., semantic change), but also to classical NLP research more generally, (e.g., polysemy resolution and representation), where word embeddings are the prominent ML models. Major problems include biases induced by word frequency, similarity estimation of noisy word vector representations, and the evaluation of models’ performance in the absence of properly validated evaluation tasks in general. I will suggest ways to mitigate some of these problems, and share some ideas about performing valid scientific research in the age of all-to-easy modeling.

NLIP Seminar Series

Upcoming seminars