skip to content

Department of Computer Science and Technology

  • Senior Research Software Engineer

Research

I have an interest in most technologies; particularly those which can be leveraged to improve education.

I work at the Department of Computer Science and Technology on the Isaac Physics and Ada Computer Science projects in collaboration with the Department of Physics and the Raspberry Pi Foundation.

Part II Project Ideas for Students:

  • Automatic question generation from content
    Simplest form would be to generate fill-in-the-gaps/Cloze questions from Wikipedia pages using Part-of-Speech tagging to decide which words to omit. Marking of the answer would also need to be considered. A lot of scope to take this further given recent developments in NLP. The generated questions could be integrated with spaced repetition software such as Anki or Mnemosyne.
  • Reimplement pythontutor.com, an educational tool for tracing simple python programs, to work entirely in the browser
    Python Tutor: https://pythontutor.com https://dl.acm.org/doi/abs/10.1145/2445196.2445368
    Project would likely use either pyodide or skulpt for running python in the browser.
    Extensions could look to supporting multiple languages; using NLP techniques (LLMs?) to produce English language explanations of steps to aid younger learners.
  • Predict whether a student will answer the next question correctly given their question history
    The problem is called Knowledge Tracing. A good candidate model to implement would be the transformer-based SAINT. Extension tasks might provide an improvement to the model.
  • Use Machine Learning and/or NLP techniques to automatically tag educational questions with the knowledge components/skills/concepts they are assessing
    This can be approached as a supervised learning problem as we have a large dataset of tagged questions and more that are not.
    Alternatively, it could be approached as an unsupervised learning problem where the Knowledge Components are learnt from student interaction data.
  • Plagiarism detection of coursework put through a "word spinner"
    Students sometimes try to avoid exam board plagiarism detection through putting their copied text through a "word spinner" which replaces words with their synonyms. Word Spinner output is generally not very good and it should be possible to detect its use automatically given a LLM's perplexity values. Given the adversarial nature of this problem it is not surprising that this technology also makes it possible to use LLMs to build a better word spinner, or a "word nudger", which tries to maintain semantic meaning after its alterations through comparing sentence embedding distances. It is then another task again to try and un-spin the text to try and find the original copied text from a corpus. Word-nudging could potentially be used to break chat-bot statistical watermarking. I would be interested in similar plagarism detection mechanisms, perhaps assuring pupil idiolect is consistent across submissions or analysing digital document revision history/log data.
  • Use Elo and Glicko chess rating systems to simultaneously rank user ability and question difficulty in an Educational Technology setting
    I would be interested to know the results on the Isaac physics data to see if the computed difficulties match the hand-labelled difficulties. https://dl.acm.org/doi/abs/10.1145/3448139.3448189
  • Use computer vision techniques to detect key features in a free-hand sketch of a graph (not the CS form of a graph) for automatic marking
    Isaac Physics want to assess the graph sketching abilities of students. Key-feature extraction such as intersection points, stationary points, asymptotes and in which quadrants these lie could allow for the automatic marking of these types of questions. This could also lead to more accessible charts for screen-reader users.
  • Your own idea?
    I'm happy to consider supervising other projects if you have your own idea!

Teaching

I supervise and demonstrate for many colleges but mainly focus on courses related to Software Engineering and introductory Machine Learning.
If you are lucky enough to be being supervised by me this termI will be in contact via email :)

Contact Details

Room: 
SN14
Email: 

mlt47@cam.ac.uk