Statistics 2 (MATH 20800), 2012/13


Statistics is about inference under uncertainty, ie in situations where deductive logic cannot give a clearcut answer. In these situations our decisions must be assessed in terms of their probabilities of being correct or incorrect. Such decisions include estimating the parameters of a statistical model, making predictions, and testing hypotheses. It is often possible to identify 'optimal' or at least good decisions, and Statistics is about these decisions, and knowing where they apply. A thorough grounding in Statistics, as provided by this course, is crucial not only for anyone contemplating a career in finance or industry, but also for scientists and policymakers, as we realise that some of the biggest issues, like climate change, natural hazards, or health, are also some of the most uncertain.

In case you have any doubts, see this article in the NY Times.

See also Tails You Win: The Science of Chance



Course outline

Formal details of this course are available on the unit description page, http://www.stats.bris.ac.uk/study/undergrad/current_units/unit/?id=414 Please email any comments you have about the course or this webpage to me, Christophe Andrieu, at c.andrieu@bris.ac.uk.

The course is taught through a combination of timetabled lectures and problems classes, 'homework' (problem sheets) and self-directed computer practicals.

Key dates

Textbooks

The textbook for this course is:

I also frequently consult: Both books provide a good review of probability. It is strongly recommended that you brush up on your probability at the beginning of Term.

Additional books that I consult (but that are not appropriate as textbooks for this course):

Outline of the lectures

Lectures occur normally on Mondays (12.00, SM2), Thursdays (14.00, SM1), and Fridays (10.00, SM2), and problems classes on Thursdays (15.00, SM1). Note that the times are when the lecture actually starts; aim to be in your seat well before this, taking account that we are quite a large group and it may take several minutes for everyone to enter and be seated.

Most weeks are accompanied by readings from the textbook. In an ideal world you would glance at the readings before the lectures, but it is OK if you read after the lectures. It is not OK to skip the reading. Note that the given readings are very topic-specific: you will need to consult your textbook much more widely than just the sections given here.

Lecture outline and reading

Links to handouts and supplementary notes will be given below. I will compile a glossary as we go through the course of the most important concepts. 

This is a list of lectures as they will be given (tentative).

Frequentist inference

  1. Introduction, concepts, notation (Mon. 8th Oct)
    Reading: Rice chs 1 and 2, for revision and the basic distributions.
  2. The likelihood function: definition, properties (Th. 11th Oct)
  3. Maximum likelihood (ML) estimates: definition, invariance (Fri. 12th Oct)
    Reading: Rice ch 7 [HW 01, due 18th October].
  4. ML estimators: definition, methods for finding ML (Mon. 15th Oct)
    Reading: Rice, 8.1 - 8.3, start of 8.5
  5. Properties of estimators, bias, standard error, MSE (Th. 18th Oct) Reading: Rice, Ch 5 
  6. Probabilistic convergence, Slutsky's theorem  (Fr. 19th Oct) [HW 02, due 25th October].
  7. Properties of Estimators (continued) (Mon. 22nd Oct). 
  8. Fisher Information Regularity Conditions, score function. (Th. 25th October).
  9. Efficiency of estimators, the Cramer-Rao lower bound (Fr. 26th Oct) Reading: Rice, 8.5.2 [HW 03, due 1st November].
  10.  Asymptotic distribution of ML estimators (Mon. 29th Oct.) Reading: Rice, 8.6
  11.  (Th. 1st Nov)
  12. Confidence sets (CS): definition, the Normal special case (Fr 2nd Nov) [HW 04, due 8th November].
  13. Approximating CIs: asymptotic approaches based on the Wald method (Mon. 5th Nov). Reading: Rice, 5.2, 8.5.3 Numerically maximising likelihood functions in R: detailed handout and text file to cut and paste the R code.
  14. Reparametrizations and confidence intervals (Th. 8th Nov).  More on how to optimise a likelihood and compute confidence intervalsusing R (includes Allan's rule of thumb) : pdf file, text file to cut and paste into R.
  15. Another approach based on likelihood ratios - Wilk's approach (Fr. 9th Nov.) [HW 05, due 15th November].
  16.  Invariance property of Wilk's CIs -    (Mon. 12th Nov) 
  17. Introduction to hypothesis tests Reading: Rice, 9.1 - 9.3 (Th 15th Nov) 
  18. The Neyman-Pearson approach, and p-values (Fr. 16th Nov)  [HW 06, due 22nd November].
  19.  Examples: computing the power function, the p-value . Example code as pdf file, as text file (Mon. 19th Nov.)
  20.  More general approaches to testing hypotheses: UMP tests. (Reading: Rice 9.4 - 9.5 ) (Th. 22nd Nov)
  21. Generalised likelihood ratio (GLR) tests (Fr. 23rd Nov) [HW 07, due Thursday 29th November].
  22. Hypothesis tests and confidence intervals (Mon. 26th Nov.)
  23. GLR tests for the multinomial distribution  (Th. 29th Nov.) Reading: Rice 9.6
  24. Pearson's Chi-squared test  (Fri 30th Nov). [HW 08, due Th. 6th Dec.]. Reading: Rice 9.8 - 9.9
     
  25. Chi-squared tests in practice, QQ plots, hanging chi-grams . Example of code as text file as a pdf file and the associated dataset  (Mon. 3rd Dec)

Outline of Bayesian inference

  1. The postulates of Bayesian inference, Bayes's theorem
    Reading: Rice ch 15 (Th. 6th Dec) 
  2. Credible intervals (Fri 7th Dec.)
  3. Hypothesis testing, Bayes factors (Mon. 10th Dec).

Vacation reading

There is much interesting material on hypothesis tests in Rice, chapters 11, 12, and 13. This material should be straightforward, and reading it carefully is a good way to check that you have understood hypothesis testing properly.

Further information on the exam

Some other issues:

  1. Several people have asked about R in the exam. My view is that it is not constructive to assess programming in a timed and unseen exam. However, you may find it useful to describe calculations or algorithms in terms of R code.
  2. You are permitted to bring into the exam one sheet (two sides) of A4 notes. You may want to bring this example of glossary, or you may want to prepare your own. But you should be aware that the exam is set with this in mind: you should not expect to score very many marks simply by referring to your sheet of notes.
  3. You will be expected to know the probability mass/density functions of the basic distributions (e.g. Binomial, Poisson, Geometric, Exponential, Normal). So if you are unsure about these, you should include them on your sheet of notes.

Homework

There are Homeworks for most but not all weeks. The 'answers' will be posted once the homework has been handed in. The schedule for the homeworks is as follows. Worksheets are distributed on Thursday at the 14.00 lecture. Homework is handed in the next Thursday in the folder next to my office door by 5pm. The previous week's homework can be collected from me after the Thursday (if available) or Friday lecture.

The format for the homeworks is usually in the form of worksheets, i.e. designed to lead you through an analysis, to increase understanding. This is not the same format as the questions in the exam, as you will see from consulting previous exam papers. Most of the homeworks contain a question that requires you to use R.

Please remember that communication is a very important part of statistics. When you do your homework, you must communicate with your reader. This means, among other things, using proper sentences, and making the progress of your answer as clear as possible.

Marking scheme:

A
Your work is neat and well laid-out. The answers are logical.
B
Somewhere between A and C.
C
Your work is messy and hard to follow, and/or quite lot of it is wrong.
N
Really, it would have been better had you not submitted :)

Problem sheet Due on:
Mistakes, comments With answers
SWS 01


SWS01ans
HW01  18th October by 5pm

HW01ans 
HW02
25th October by 5pm

HW02ans
HW03
1st November by 5pm

HW03ans
HW04
8th November by 5pm

HW04ans
HW05
15th November by 5pm

HW05ans
HW06
22th November by 5pm
 
HW06ans
HW07
29th November by 5pm

HW07ans
HW08
6th December by 5pm

HW08ans
HW09


HW09ans

Code snippets to help with the homeworks

Homework 2

dpareto <- function(x, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0))
ifelse(x < x0, 0, (theta / x) * (x0 / x)^theta)
}

ppareto <- function(x, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0))
ifelse(x < x0, 0, 1 - (x0 / x)^theta)
}

qpareto <- function(u, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0), all(u >= 0), all(u <= 1))
x0 * (1 - u)^(-1/theta)
}

rpareto <- function(n, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0), n >= 0)
u <- runif(n)
qpareto(u, x0 = x0, theta = theta)
}

Computer practicals

Andrew Smith and Fei Xuang give two weekly computer practical office hours in the computer classroom on the ground floor of the Maths Dept. This is in case you are struggling with aspects of R: attendence is voluntary but strongly encouraged.

There are some tips for using R in the Computer Classroom, and other useful resources.

There are three computer practicals that must be handed in. The first practical functions as a dry-run.

The second and third practicals each contribute 10% of your overall mark in Statistics 2. You may want to consult the policy on late submissions, available on the unit webpage.

Plagiarism has been an issue last year and offenders have been penalized through a formal process for this. Although discussion between students of the work is allowed, you should write your own code and write your answers on your own. Make sure that you understand the departmental guidelines on the matter in Appendix C of the second year handbook ,  and ask me questions if  you are in any doubt.

Problem sheet Comments
Handed in by With answers
CP01
5pm Wed. 28th Nov CP01ans
CP02 Histograms of marks
5pm Fri. 14th Dec CP02ans
CP03  
5pm Fri 25th Jan CP03ans 

Note that the answers are the 'bare-bones' answers: your answers should also include some explanations of your calculations, and possibly comments in the R code.

Tips on using R in the Computer Classroom

  1. File management. There are clear instructions on accessing your files in the computer classroom.
  2. Using files. It is best to enter your commands into a text file, and copy and paste these commands into R. You can do
    File > New Document
    to start a new file, or
    File > Open Document...
    to open an existing file. If you make or modify a file you will be asked if you want to save it when you exit the file.

    You may want to execute all of the commands in the file, in which case use

    File > Source file...
  3. You can save the objects you create during your R session by answering 'y' to the question "Save workspace image? [y/n/c]: ", when you quit. They won't come back automatically, though. You have to issue the command
    load('.RData')
    first.
  4. The R comment character '#' does not appear on the Mac keyboard. You can find it as Alt-3.
  5. You highlight text using Shift and the cursor keys, or by dragging over the mouse. On the Mac, cut is cmd-x; copy is cmd-c, paste is cmd-v. The cmd key is the one to the left of the space bar.
  6. If you are working in R, you can recover previous commands by using the up and down arrows. This is the easiest way to build up a complicated command: keep adding arguments one at at time, until you get something that looks good.

Useful resources

We use the open-source Statistical Computing Environment R, whose homepage is http://www.r-project.org/. This gives access to to the source code, which you can also get directly at http://www.stats.bris.ac.uk/R/, and also to documentation.

You might be interested in the document An Introduction to R, despite its name, this is fairly comprehensive. For a more gentle introduction, try R: A self-learn tutorial written for undergraduates. Another good document can be found here---but this is more advanced.

When you become an R power-user, you will want to access the contributed packages on CRAN, the Comprehensive R Archive Network (click on Packages in the lefthand panel).