skip to content

Department of Computer Science and Technology

  • Professor of Language and Machine Learning
  • Fellow of Gonville and Caius College

I am Professor of Language and Machine Learning in the Department of Computer Science and Technology. 

See my other page for more information.

Research

I am a computational linguist interested in fundamental questions about language: why have languages evolved to be what they are today? how do we learn languages? how can we learn second languages most efficiently? how should we organise language-related information for particular tasks? Together with my research team, we develop Natural Language Processing and Machine Learning techniques to research language cognition (computational psycholinguistics). We build cognitively-motivated language applications and we research explanatory AI (artificial intelligence that can explain the decisions it has made and can provide feedback). Our research has focused on building Natural Language Processing tools that work with non-canonical forms of natural language (spoken language, learners, aphasics, social media language) and also with low resource languages (endangered languages, dialects). We are interested in both the automatic machine processing of language and the cognitive processes underlying that language; we have built models of artificial learners for first and second language learning. We are also interested in the automatic extraction of units of information from language excerpts (whether written or spoken) and how to represent relationships between these units.

Teaching

Students interested in working within my fields of research may want to apply to the MPhil in Advanced Computer Science (ACS). I very rarely accept PhD students who haven't first completed the ACS. If you wish to apply for a PhD, please look for application information on the department web pages. Please do not email me your CV directly. Summer internship positions are sometimes possible for Cambridge University students but not more widely. To let me know you have read this page before mailing me, please include the word Jabberwock in the subject header.

Professional Activities

I have several roles in the University:

  • Professor of Language and Machine Learning in the Department of Computer Science and Technology;
  • Co-Director of the Cambridge Language Sciences Interdisciplinary Research Centre;
  • Lead Scientific Advisor to Cambridge University Press and Assessment;
  • Principal Investigator of the Cambridge Institute for Automated Language Teaching and Assessment (ALTA), which is an Artificial Intelligence institute that develops Machine Learning and Natural Language Processing to improve the experience of learning and assessment online.
  • Affiliated Lecturer, Modern and Medieval Languages, and Linguistics
  • Fellow and Director of Studies in Linguistics for Gonville and Caius College

Industry

  • I am part of iLexIR, a University spinout that provides natural language processing solutions---specializing in text analytics, mining, classification and search applications.
  • I am Chief Scientist for RegGenome, a University spinout formed by colleagues from the Cambridge Judge Business School.

Publications

Journal articles

  • Goriely, Z., Caines, A. and Buttery, P., 2023. Word segmentation from transcriptions of child-directed speech using lexical and sub-lexical cues. J Child Lang,
    Doi: http://doi.org/10.1017/S0305000923000491
  • Benedetto, L., Cremonesi, P., Caines, A., Buttery, P., Cappelli, A., Giussani, A. and Turrin, R., 2023. A Survey on Recent Approaches to Question Difficulty Estimation from Text ACM Computing Surveys, v. 55
    Doi: http://doi.org/10.1145/3556538
  • Elliott, M. and Buttery, P., 2022. Non-iterative Conditional Pairwise Estimation for the Rating Scale Model. Educ Psychol Meas, v. 82
    Doi: http://doi.org/10.1177/00131644211046253
  • Katushemererwe, F., Caines, A. and Buttery, P., 2021. Building natural language processing tools for Runyakitara Applied Linguistics Review, v. 12
    Doi: http://doi.org/10.1515/applirev-2020-2004
  • Caines, A., Altmann-Richer, E. and Buttery, P., 2019. The cross-linguistic performance of word segmentation models over time. J Child Lang, v. 46
    Doi: http://doi.org/10.1017/S0305000919000485
  • Caines, A., Pastrana, S., Hutchings, A. and Buttery, PJ., 2018. Automatically identifying the function and intent of posts in underground forums Crime Science, v. 7
    Doi: 10.1186/s40163-018-0094-4
  • Bentz, C., Alikaniotis, D., Samardžić, T. and Buttery, P., 2017. Variation in Word Frequency Distributions: Definitions, Measures and Implications for a Corpus-Based Language Typology Journal of Quantitative Linguistics, v. 24
    Doi: http://doi.org/10.1080/09296174.2016.1265792
  • Thwaites, A., Nimmo-Smith, I., Fonteneau, E., Patterson, RD., Buttery, P. and Marslen-Wilson, WD., 2015. Tracking cortical entrainment in neural activity: auditory processes in human temporal cortex. Front Comput Neurosci, v. 9
    Doi: http://doi.org/10.3389/fncom.2015.00005
  • Bentz, C., Verkerk, A., Kiela, D., Hill, F. and Buttery, P., 2015. Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms. PLoS One, v. 10
    Doi: http://doi.org/10.1371/journal.pone.0128254
  • Bentz, C., Kiela, D., Hill, F. and Buttery, P., 2014. Zipf's law and the grammar of languages: A quantitative study of old and modern English parallel texts Corpus Linguistics and Linguistic Theory, v. 10
    Doi: http://doi.org/10.1515/cllt-2014-0009
  • Andersen, Ø., Briscoe, T., Buttery, P., Carroll, J., Medlock, B., Parish, T. and Watson, R., 2011. Text Processing Tools and Services from iLexIR Ltd
  • McEntyre, JR., Ananiadou, S., Andrews, S., Black, WJ., Boulderstone, R., Buttery, P., Chaplin, D., Chevuru, S., Cobley, N., Coleman, LA., Davey, P., Gupta, B., Haji-Gholam, L., Hawkins, C., Horne, A., Hubbard, SJ., Kim, JH., Lewin, I., Lyte, V., MacIntyre, R., Mansoor, S., Mason, L., McNaught, J., Newbold, E., Nobata, C., Ong, E., Pillai, S., Rebholz-Schuhmann, D., Rosie, H., Rowbotham, R., Rupp, CJ., Stoehr, P. and Vaughan, P., 2011. UKPMC: a full text article resource for the life sciences NUCLEIC ACIDS RES, v. 39
    Doi: http://doi.org/10.1093/nar/gkq1063
  • Poornima, S., Good, J., Su, Q., Huang, CR., Chen, K., Sharma, DM., Dimitriadis, A., Plank, B., van Noord, G., Caines, A. and others, , 2010. Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground$$ Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground$$,
  • Briscoe, T., Buttery, P., Carroll, J., Medlock, B. and Watson, R., 2010. Text Processing Tools and Services from iLexIR Ltd
  • Hawkins, JA. and Buttery, P., 2010. Criterial features in learner corpora: Theory and illustrations English Profile Journal, v. 1
    Doi: http://doi.org/10.1017/S2041536210000103
  • Hawkins, JA. and Buttery, P., 2009. Using learner language from corpora to profile levels of proficiency: Insights from the english profile programme Language Testing Matters: Investigating the wider social and educational impact of assessment,
  • Briscoe, T. and Buttery, P., 2008. LINGUISTIC ADAPTATIONS FOR RESOLVING AMBIGUITY The evolution of language: proceedings of the 7th International Conference (EVOLANG7), Barcelona, Spain, 12-15 March 2008,
  • 2007. Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
  • Buttery, P., 2006. Computational models for first language acquisition
  • Buttery, P. and Korhonen, A., 2005. Large-scale analysis of verb subcategorization differences between child directed speech and adult speech Proceedings of the Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes,
  • Buttery, P., 2005. Charles D. Yang. Knowledge and Learning in Natural Language. Oxford University Press, 2002. ISBN 0 19 925414 1 (hardback), Price $60. ISBN 0 19 925415 X (paperback), Price $21.95, 220 pages. Nat. Lang. Eng., v. 11
    Doi: http://doi.org/10.1017/S1351324905213724
  • Buttery, P. and Briscoe, T., 2004. The significance of errors to parametric models of language acquisition AAAI Spring Symposium - Technical Report, v. 5
  • Buttery, P., 2004. A quantitative evaluation of naturalistic models of language acquisition; the efficiency of the Triggering Learning Algorithm compared to a Categorial Grammar Learner Coling 2004,
  • Rice, A., Buttery, P., Rai, IA. and Beresford, A., Language learning on a next-generation service platform for Africa
  • Conference proceedings

  • Rietsche, R., Caines, A., Schramm, C., Pfütze, D. and Buttery, P., 2022. The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings,
  • Felice, M., Taslimipoor, S. and Buttery, P., 2022. Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers Proceedings of the Annual Meeting of the Association for Computational Linguistics,
  • Felice, M., Taslimipoor, S., Andersen, ØE. and Buttery, P., 2022. CEPOC: The Cambridge Exams Publishing Open Cloze dataset 2022 Language Resources and Evaluation Conference, LREC 2022,
  • Davis, C., Bryant, C., Caines, A., Rei, M. and Buttery, P., 2022. Probing for targeted syntactic knowledge through grammatical error detection CoNLL 2022 - 26th Conference on Computational Natural Language Learning, Proceedings of the Conference,
  • Tyen, G., Brenchley, M., Caines, A. and Buttery, P., 2022. Towards an open-domain chatbot for language practice BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings,
  • Wambsganss, T., Caines, A. and Buttery, P., 2022. ALEN App: Persuasive Writing Support To Foster English Language Learning BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings,
  • Caines, A., Bentz, C., Knill, K., Rei, M. and Buttery, P., 2020. Grammatical error detection in transcriptions of spoken English Proceedings of the 28th International Conference on Computational Linguistics,
    Doi: http://doi.org/10.18653/v1/2020.coling-main.195
  • Caines, A. and Buttery, P., 2020. REPROLANG 2020: Automatic proficiency scoring of Czech, English, German, Italian, and Spanish learner essays LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings,
  • Felice, M. and Buttery, P., 2019. Entropy as a proxy for gap complexity in open cloze tests International Conference Recent Advances in Natural Language Processing, RANLP, v. 2019-September
    Doi: http://doi.org/10.26615/978-954-452-056-4_037
  • Moore, R., Caines, A., Rice, A. and Buttery, P., 2019. Behavioural cloning of teachers for automatic homework selection Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 11625 LNAI
    Doi: 10.1007/978-3-030-23204-7_28
  • Caines, A., Pastrana, S., Hutchings, A. and Buttery, P., 2018. Aggressive language in an online hacking forum 2nd Workshop on Abusive Language Online - Proceedings of the Workshop, co-located with EMNLP 2018,
  • Flint, E., Ford, E., Thomas, O., Caines, AP. and Buttery, P., 2017. A text normalisation system for non-standard English words Proceedings of WNUT,
  • Caines, AP., Flint, E. and Buttery, P., 2017. Collecting fluency corrections for spoken learner English Proceedings of BEA,
  • Caines, AP., McCarthy, M. and Buttery, P., 2017. Parsing transcripts of speech
  • Graham, C., Buttery, P. and Nolan, F., 2017. Vowel characteristics in the assessment of L2 English pronunciation Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v. 08-12-September-2016
    Doi: http://doi.org/10.21437/Interspeech.2016-1630
  • Caines, A., Zhang, W., Alikaniotis, D. and Buttery, P., 2016. Predicting author age from Weibo microblog posts Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16),
  • Caines, A., Bentz, C., Graham, C., Polzehl, T. and Buttery, P., 2016. Crowdsourcing a multilingual speech corpus: recording, transcription and annotation of the CrowdED Corpus Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16),
  • Caines, A., Moore, R., Graham, C. and Buttery, P., 2016. Automated speech-unit delimitation in spoken learner English Proceedings of COLING,
  • Caines, A., Moore, R., Buttery, P. and Graham, C., 2015. Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English
  • Caines, A. and Buttery, P., 2012. Annotating progressive aspect constructions in the spoken section of the British National Corpus Proceedings of the Language Resources and Evaluation Conference (LREC),
  • Caines, A. and Buttery, P., 2012. Reclassifying subcategorization frames for experimental analysis and stimulus generation Proceedings of the Language Resources and Evaluation Conference (LREC),
  • Thwaites, A., Geertzen, J., Marslen-Wilson, WD. and Buttery, P., 2010. LIPS: a tool for predicting the lexical isolation point of a word
  • Thwaites, A., Geertzen, J., Marslen-Wilson, WD. and Buttery, P., 2010. LIPS: A Tool for Predicting the Lexical Isolation Point of a Word Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10),
  • Williams, C., Thwaites, A., Buttery, P., Geertzen, J., Randall, B., Shafto, M., Devereux, B. and Tyler, L., 2010. The Cambridge Cookie-Theft Corpus: A Corpus of Directed and Spontaneous Speech of Brain-Damaged Patients and Healthy Individuals Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10),
  • Caines, A. and Buttery, P., 2010. ‘You talking to me?’ A predictive model for zero auxiliary constructions Proceedings of the Workshop on Natural Language Processing and Linguistics, Finding the Common Ground, Annual Meeting of the Association for Computational Linguistics,
  • Vlachos, A., Buttery, P., Séaghdha, DO. and Briscoe, T., 2009. Biomedical event extraction without training data Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task,
  • Buttery, P. and Korhonen, A., 2007. I will shoot your shopping down and you can shoot all my tins: automatic lexical acquisition from the CHILDES database Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition,
  • Zaidi, A., Caines, A., Moore, R., Buttery, P. and Rice, A., Adaptive Forgetting Curves for Spaced Repetition Language Learning
  • Craighead, H., Caines, A., Buttery, P. and Yannakoudakis, H., Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions
  • Hughes, J., Aycock, S., Caines, A., Buttery, P. and Hutchings, A., Detecting Trending Terms in Cybersecurity Forum Discussions Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020),
    Doi: http://doi.org/10.18653/v1/2020.wnut-1.15
  • Caines, A., Bentz, C., Knill, K., Rei, M. and Buttery, P., Grammatical error detection in transcriptions of spoken English https://researchr.org/publication/coling-2020,
  • Pastrana Portillo, S., Hutchings, A., Caines, A. and Buttery, P., Characterizing Eve: Analysing Cybercrime Actors in a Large Underground Forum
  • Aglionby, G., Davis, C., Mishra, P., Caines, A., Giannakoudaki, E., Rei, M., Shutova, E. and Buttery, P., CAMsterdam at SemEval-2019 Task 6: Neural and graph-based feature extraction for the identification of offensive tweets
  • Pete, I., Hughes, J., Caines, A., Vu, A., Gupta, H., Hutchings, A., Anderson, R. and Buttery, P., POSTCOG: A Tool for Interdisciplinary Research into Underground Forums at Scale
  • Moore, R., Caines, A., Elliott, M., Zaidi, A., Rice, A. and Buttery, P., Skills Embeddings: a Neural Approach to Multicomponent Representations of Students and Tasks
  • Zaidi, A., Caines, A., Davis, C., Moore, R., Buttery, P. and Rice, A., Accurate Modelling of Language Learning Tasks and Students Using Representations of Grammatical Proficiency
  • Book chapters

  • Caines, A., McCarthy, M. and Buttery, P., 2018. 'You still talking to me?': The zero auxiliary progressive in spoken British english twenty years on
  • Caines, AP. and Buttery, P., 2017. The effect of task and topic on opportunity of use in learner corpora
  • Caines, A. and Buttery, P., 2012. Normalising frequency counts to account for ‘opportunity of use’ in learner corpora
  • Buttery, PJ. and McCarthy, M., 2011. Lexis in Spoken Discourse.
  • Briscoe, E. and Buttery PJ, , 2008. The evolution of language. LINGUISTIC ADAPTATIONS FOR RESOLVING AMBIGUITY
  • Buttery, PJ., McCarthy, M. and Carter, R., Chatting in the academy: informality in spoken academic discourse
  • Reports

  • Caines, AP., Nicholls, D. and Buttery, P., 2017. Annotating errors and disfluencies in transcriptions of speech
  • Theses / dissertations

  • Moore, R., Skill embeddings: artificial neural network representations for pedagogical policy development.
  • Datasets

  • Tyen, WHG., Brenchley, M., Caines, A. and Buttery, P., Research data supporting "Towards an open-domain chatbot for language practice"
  • Contact Details

    Room: 
    GS16
    Office phone: 
    (01223) 7-63832
    Email: 

    pjb48@cam.ac.uk