skip to content

Department of Computer Science and Technology

Date: 
Friday, 14 March, 2025 - 12:00 to 13:00
Speaker: 
Olga Zamaraeva (University of A Coruña)
Venue: 
Room SS03 with Hybrid Format. Here is the Zoom link for those that wish to join online: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09

Natural language processing used to rely on formal methods in its early days, and this included formal theories of syntax where sentence structure was of relevance. In the statistical era, the focus shifted to annotation schemes such as Penn Treebank and Universal Dependencies, which still rely on formal theory in their origins but prioritize simplicity over consistency. Now in the era of deep learning, while most training forgoes any annotation, annotated corpora remain crucial for evaluation and interpretation of the output of language models. In this context, it is important that the theory underlying the annotation be consistent and, furthermore, developed independently of NLP tasks. I will talk about the recent work we did with the Head-driven Phrase Structure Grammar theory of syntax (HPSG). We have worked with HPSG to grow and improve existing corpora of Spanish and to improve the parsing speed of the English Resource Grammar, so that the English corpora can be grown more easily.  Currently, we are working with the English Resource Grammar to study linguistic properties of texts generated by LLMs, including looking for any systematic differences with similar texts written by people. I will talk about this work in progress at the end of the talk.

Seminar series: 
NLIP Seminar Series

Upcoming seminars