skip to content

Department of Computer Science and Technology

Principal lecturer: 
Students: 
Part II CST 75%
Term: 
Michaelmas term
Course code: 
DataSciII
Prerequisites: 
NST Mathematics
Hours: 
16
Class limit: 
50

This module can accommodate upto 50 students - a maximum of 35 Part II students and 15 MPhil / Part III Students

Aims

The course will develop core areas of Data Science (eg. models for regression and classification) from several perspectives: conceptual formulation and properties, solution algorithms and their implementation, data visualization for exploratory data analysis and the effective presentation of modelling outputs. The lectures will be complemented by practical classes using Python, scikit-learn and TensorFlow.

Lectures

  • Introduction. Motivation, applications, examples, common data formats (csv, json), loading data with Python, calculating statistics over a dataset with numpy, logistics and overview of the course.
  • Linear Regression. Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.
  • Practical: Linear Regression
  • Classification, part I. Classification, logistic regression, perceptron, multi-class classification, classification performance measures.
  • Practical: Classification I
  • Classification, part II. An overview of other classification techniques (e.g., decision trees, SVMs) and more advanced techniques including ensemble-based models (boosting, bagging, exemplified with AdaBoost and Random Forests).
  • Practical: Classification II
  • Deep learning basics. Neural networks, applications in the world, optimization, stochastic gradient descent, backpropagation, learning rates
  • Deep learning with TensorFlow. Introduction to TensorFlow, minimal TensorFlow example, symbolic graphs, training a network, practical tips for deep learning.
  • Practical: Deep learning with TensorFlow
  • Deep learning architectures. Convolutional networks, RNNs, LSTMs, autoencoders, regularization.
  • Practical: Deep learning architectures
  • Visualization, part I. Scales and coordinates, depicting comparisons.
  • Visualization, part II. Common plotting patterns, including dimension reduction.
  • Practical: Visualization
  • Challenges in Data Science. Summary of the course, ethics and privacy in data science, P-hacking, look-everywhere effect, bias in the training data, interpretability, information about the hand out test.

Objectives

By the end of the course students should be able to:

  • demonstrate understanding and practical skills in Data Science;
  • be able to specify and work with an analytical model;
  • be able to effectively implement Data Science algorithms;
  • understand how data visualization underpins exploring datasets as well as communicating the findings of data science models.

Recommended reading

Bishop, C.M. (2008). Pattern Recognition and Machine Learning. Springer.
MacKay, D.J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press.
Python Basic Tutorial. Available online: https://www.tutorialspoint.com/python/index.htm
Numpy: Quickstart Tutorial. Available online: https://docs.scipy.org/doc/numpy/user/quickstart.html
Get Started with TensorFlow. Available online: https://www.tensorflow.org/tutorials/