- PhD Student
Hey! I’m Filip, a first-year PhD student at the University of Cambridge (King’s College) in the NLIP group, supervised by Professor Paula Buttery. My research focuses on advancing our understanding of large language models, enhancing their performance, and mitigating the risks associated with their use.
I previously completed a BSc in Computer Science at UCL and an MPhil in Machine Learning and Machine Intelligence at the University of Cambridge, where I worked on both applied NLP and the core foundations of machine learning.
If you’re interested in collaborating or would like to discuss related topics, please feel free to contact me by email.
Research
Current Interests
-
Natural Language Processing
-
Large Language Models & Generative AI
-
AI Safety & Biases
-
Robustness & Evaluation of NLP/LLM Systems
Previous Research
-
AI-Generated Text Detection and Adversarial Evaluation of Detectors
-
Disinformation and News Trustworthiness Detection
-
Benchmark/Dataset Creation & Annotation (e.g., RAID, Czech Credibility Dataset)
-
Bias Evaluation in Language Models (Political & Media Bias)
-
Alignment of LLMs
Themes
Professional Activities
I co-founded Verifee, an NGO dedicated to monitoring and detecting disinformation, leveraging AI for article trustworthiness evaluations. Verifee now processes all news articles published in the Czech Republic, maintaining the largest public news collection in Central Europe, which is used to power a browser extension that anyone can access, which serves as an anti-virus for disinformation. During development, the project was supported through grants from Google.org and Microsoft, which allowed us to offer our services for free to all, from established institutions to about 5,000 ordinary users daily.
Here is a video of me and the president of Microsoft, Brad Smith, talking about Verifee:
Publications
For recent changes, visit my Scholar.
2025
- Rethinking AI Cultural Alignment — ICLR 2025 Bi-Align Workshop. Michal Bravansky, Filip Trhlík, Fazl Barez
2024
- Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles — Findings of EMNLP 2024, pp. 4420–4445. Filip Trhlík, Pontus Stenetorp
- RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors — ACL 2024 (Long), pp. 12463–12492. Liam Dugan, Alyssa Hwang, Filip Trhlík, Andrew Zhu, Josh Magnus Ludan, Hainiu Xu, Daphne Ippolito, Chris Callison-Burch.
2023
- Czech-ing the News: Article Trustworthiness Dataset for Czech —ACL 2023 (WASSA), pp. 96–109. Matyas Bohacek, Michal Bravansky, Filip Trhlík, Vaclav Moravec.
2022
-
Fine-grained Czech News Article Dataset: An Interdisciplinary Approach to Trustworthiness Analysis — AAAI 2023 (DE-FACTIFY Workshop). Matyáš Boháček, Michal Bravanský, Filip Trhlík, Václav Moravec.

