skip to content

Department of Computer Science and Technology

Friday, 29 November, 2024 - 13:00 to 13:55
Shrey Biswas / Radhika Iyer / Kacper Michalik, University of Cambridge
FW11, William Gates Building. Zoom link:


Grey literature’s inherent nature means that it is a difficult form of media to discover, typically being hidden deep within websites, analyse, following no standard file formats or structures, and process, due to the sheer volume of existing and actively produced literature, this forms a massive cost and time problem for organisations that require such literature in their function.

We devise and implement a pipeline that uses Common Crawl internet archives to locate & scrape potential grey literature; then process it for use in a multistage machine learning pipeline to classify and output relevant media.


*Shrey Biswas* is a second-year Computer Science Student at Pembroke College.

*Radhika Iyer* is a second-year Computer Science Student at Murray Edwards College.

*Kacper Michalik* is a Second-year Computer Science Student at Pembroke College.

Seminar series: 
Energy and Environment Group

Upcoming seminars