A corpus of UK court decisions for legal and AI research
We introduce the Cambridge Law Corpus (CLC), a corpus for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. The corpus contains raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases, done by legal experts. Reflecting legal and ethical considerations, we are only releasing the corpus for research purposes under restrictions.
Our paper introducing the corpus has been accepted at NeurIPS Datasets and Benchmarks Track 2023, titled "The Cambridge Law Corpus: A Dataset for Legal AI Research".
The DOI of the corpus is 10.17863/CAM.100221. To download an example of the dataset consisting of 15 court cases, please click here.
Terms & Conditions
- Applications for access to the Cambridge Law Corpus (CLC) can only be made by researchers who are employed full-time by a recognised university or other research institution. The applicant must hold a permanent position at the level of Assistant Professor (or higher) or equivalent. Research students and non-permanent researchers can apply via supervisors or mentors fulfilling the above-mentioned criteria. As part of the application process, the applicant must provide their complete contact details at the university or research institution.
- The applicant must provide a research plan. a. The research plan must include aims, methods, and specifications of the intended research outputs. b. The research plan may list members of a research group that need access to the CLC. The applicant must ensure that any detailed member of their team is compliant with these terms and conditions. Other persons may not be given access to the CLC. c. The applicant must submit an electronic copy of the ethical approval for their research plan obtained from their university (or other authority).
- The applicant must agree to the following terms and conditions: a. The CLC can only be used for research purposes and must not be used for any other purposes, for example, commercial purposes. b. The CLC cannot be transferred to any other individual or entity other than the applicant and the members of the research group specified in the application. c. The CLC cannot be published or made publicly available in any form. d. All research outputs based on using the CLC must reference the CLC and the original CLC repository. e. Research outputs, regardless of their format, must never identify natural persons, legal persons, or similar entities. f. No automated decision-making processes may be applied using the CLC, including profiling, which produces legal effects or similarly significant effects on a person. g. The applicant agrees to immediately remove any data from the CLC as requested by one of the authors in case legal risks arise. This may involve deleting the entire CLC. The applicant assumes full responsibility that members of the research group also remove such data in copies of the CLC they might have. h. The applicant agrees to adhere to policies and codes of conduct that ensure the privacy, confidentiality, and security of the CLC. i. The applicant must comply with all applicable laws in any relevant national or international jurisdiction, including the UK Data Protection Act 2018 and the UK GDPR.
- The CLC is a complete corpus. Any references to the CLC include amended versions of the CLC, regardless of who made those amendments.
- The Dean of the Faculty (or equivalent authority) of the researcher confirms all of the above requirements and assumes responsibility for the Faculty (or equivalent) that the above requirements are met. As part of this, the Dean (or equivalent authority) needs to provide their institutional contact details.
To apply for the full version of the CLC dataset, please click on the following link and complete the CLC registration form. We also provide an example ethical approval form for reference, which is requested for obtaining research access to the CLC.
To apply for the removal of a case from the CLC, please click on the following link and complete the Case Removal Form.
For questions on the CLC, please write to clc@law.cam.ac.uk.
To cite our work:
@inproceedings{ostling2023cambridge,
author = {\"{O}stling, Andreas and Sargeant, Holli and Xie, Huiyuan and Bull, Ludwig and Terenin, Alexander and Jonsson, Leif and Magnusson, M\aa ns and Steffek, Felix},
title = {The Cambridge Law Corpus: A Dataset for Legal AI Research},
booktitle = {Advances in Neural Information Processing Systems},
pages = {41355--41385},
volume = {36},
year = {2023}
}