Mathematics
Te Tari Pāngarau me te Tatauranga
Department of Mathematics & Statistics

MATH4SL Data Mining, Inference and Prediction

Second Semester
10 points
 

Recent years have witnessed in explosion in the quantity and variety of different data available, creating computational, mathematical and statistical challenges. The tools being developed to understand these data blur the boundaries between different disciplines. The goal of this paper is to provide an introduction to these tools, or more correctly, to the concepts and mathematics underlying them.

Outline

The paper will be built on a selection of topics from chapters 3-7 & 11 of

Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning, Data mining, Inference and Prediction. Springer.

which can be downloaded for free from here.

We will also discuss neural networks, PAC learning and dimension reduction (material taken from various sources). A provisional list of topics

Prerequisites

This paper is designed primarily for mathematicians who may, or may not, have much formal training in statistics or probability, beyond what you might find in a 1st year paper. Some experience with programming, and particularly R, will help you make use of the techniques introduced in the paper, but it is not required.

Lecturer

Prof David Bryant, Mathematics and Statistics, Room 514.

Assessment

Two written assignments (which I plan on making available as soon as possible), each worth 20% of the grade. One 3 hour exam which will be worth 60% of the grade.

Lectures

There will be (a maximum of) 20 one hour lectures, starting in week two of the semester, and finishing in week 39. There will be no lectures in week 34 (week before break).

Some of these lectures will be held in a computer lab, and may need to be rescheduled because of that (we’ll let you know).

Resources

THE BIG PROBLEM

You have an unknown function f that takes inputs X and returns outputs Y. (e.g. X = email, Y = spam or not spam)

You are given a training sample (X1,Y1), …, (Xn,Yn), where each Yi = f(Xi) with possible noise/error.

Can you predict f(X) for future values of X?

Final mark

While we strive to keep details as accurate and up-to-date as possible, information given here should be regarded as provisional. Individual lecturers will confirm teaching and assessment methods.