skip to content

Department of Computer Science and Technology

 
Principal lecturer: 
Students: 
MPhil ACS, Part III
Term: 
Michaelmas term
Course code: 
L330
Class limit: 
15

This module can accommodate upto 50 students - a maximum of 35 Part II students and 15 MPhil / Part III Students

Aims

The course will develop core areas of Data Science (eg. models for regression and classification) from several perspectives: conceptual formulation and properties, solution algorithms and their implementation, data visualization for exploratory data analysis and the effective presentation of modelling outputs. The lectures will be complemented by practical classes using Python, scikit-learn and TensorFlow.

Lectures

  • Introduction. Motivation, applications, examples, common data formats (csv, json), loading data with Python, calculating statistics over a dataset with numpy, logistics and overview of the course.
  • Linear Regression. Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.
  • Practical: Linear Regression
  • Classification, part I. Classification, logistic regression, perceptron, multi-class classification, classification performance measures.
  • Practical: Classification I
  • Classification, part II. An overview of other classification techniques (e.g., decision trees, SVMs) and more advanced techniques including ensemble-based models (boosting, bagging, exemplified with AdaBoost and Random Forests).
  • Practical: Classification II
  • Deep learning basics. Neural networks, applications in the world, optimization, stochastic gradient descent, backpropagation, learning rates
  • Deep learning with TensorFlow. Introduction to TensorFlow, minimal TensorFlow example, symbolic graphs, training a network, practical tips for deep learning.
  • Practical: Deep learning with TensorFlow
  • Deep learning architectures. Convolutional networks, RNNs, LSTMs, autoencoders, regularization.
  • Practical: Deep learning architectures
  • Visualization, part I. Scales and coordinates, depicting comparisons.
  • Visualization, part II. Common plotting patterns, including dimension reduction.
  • Practical: Visualization
  • Challenges in Data Science. Summary of the course, ethics and privacy in data science, P-hacking, look-everywhere effect, bias in the training data, interpretability, information about the hand out test.

Objectives

By the end of the course students should be able to:

  • demonstrate understanding and practical skills in Data Science;
  • be able to specify and work with an analytical model;
  • be able to effectively implement Data Science algorithms;
  • understand how data visualization underpins exploring datasets as well as communicating the findings of data science models.

Recommended reading

Bishop, C.M. (2008). Pattern Recognition and Machine Learning. Springer.
MacKay, D.J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press.
Python Basic Tutorial. Available online: https://www.tutorialspoint.com/python/index.htm
Numpy: Quickstart Tutorial. Available online: https://docs.scipy.org/doc/numpy/user/quickstart.html
Get Started with TensorFlow. Available online: https://www.tensorflow.org/tutorials/

Assessment

There will be six short practicals together contributing 20% to the final module mark plus a final assessment worth 80% of the final mark.

Further Information

Due to COVID-19, the method of teaching for this module will be adjusted to cater for physical distancing and students who are working remotely. We will confirm precisely how the module will be taught closer to the start of term.

Current Cambridge undergraduate students who are continuing onto Part III or the MPhil in Advanced Computer Science may only take this module if they did NOT take it as a Unit of Assessment in Part II.

This module is shared with the Part II Computer Science Tripos course Data Science: principles and practice. Assessment will be adjusted for the two groups of students to be at an appropriate level for whichever course the student is enrolled on. Further information about assessment and practicals will follow at the first lecture.