Statistics 2 (MATH 20800), 2012/13
Statistics is about inference under uncertainty, ie in situations
where deductive logic cannot give a clearcut answer. In these
situations our decisions must be assessed in terms of their
probabilities of being correct or incorrect. Such decisions include
estimating the parameters of a statistical model, making predictions,
and testing hypotheses. It is often possible to identify 'optimal' or
at least good decisions, and Statistics is about these decisions, and
knowing where they apply. A thorough grounding in Statistics, as
provided by this course, is crucial not only for anyone contemplating
a career in finance or industry, but also for scientists and
policymakers, as we realise that some of the biggest issues, like
climate change, natural hazards, or health, are also some of the most
uncertain.
In case you have any doubts, see this article in the NY Times.
See also Tails You Win: The Science of Chance
Course outline
Formal details of this course are available on the unit description page, http://www.stats.bris.ac.uk/study/undergrad/current_units/unit/?id=414
Please email any comments you have about the course or this webpage to me, Christophe Andrieu, at c.andrieu@bris.ac.uk.
The course is taught through a combination of timetabled lectures and problems
classes, 'homework' (problem sheets) and self-directed computer practicals.
- There are three lectures and one problems class per week, see
the outline below.
- There is a homework for most weeks.
- There are three computer practicals,
to be completed in your own time.
- There is a weekly office hour on Mondays at 11am, in my office
(4.14, top
floor, turn left out of the stairs/lift).
Please come along at the start of the hour, if you can. For short
questions, you should ask me after the lectures.
- For teaching weeks 4 to 11 there is one Maths Cafés for statistics everyweek,
ran by Patrick Cannon (3rd year student) on Tuesdays in SM3 at 1pm, ,
where you can work on and discuss the Statistics 2 material.
- For computing support, Fei Xuang and Andrew Smith give weekly Computing
Office Hours 11am - 12am on Mondays (Fei) and 9am-10am on Fridays (Andrew) in the computer classroom on the ground floor
of the Maths Dept. (G9) See below for more details. The office hour
(see above) can also be used for computing support.
Key dates
- First lecture, Tues. 8th Oct 2011 at 12am.
- First computer practical deadline, Fr. 23rd Nov 2012.
- Second computer practical deadline Fri. 14th Dec.
- End of first Term, Fri 14 Dec 2012.
- Start of second Term, Fri 11 Jan 2013.
- End of first Teaching Block, Fri 25th Jan 2013.
- Third computer practical deadline, Fri 25th Jan 2013.
Textbooks
The textbook for this course is:
I also frequently consult:
Both books provide a good review of probability. It is strongly recommended
that you brush up on your probability at the beginning of Term.
Additional books that I consult (but that are not appropriate as
textbooks for this course):
- Geoffrey Grimmett and David Stirzaker, 2001, Probability and
Random Processes, OUP, 3rd ed. Good for things like convergence, with
quite general versions of theorems and precise proofs.
- Larry Wasserman, 2004, All of Statistics: A Concise Course in
Statistical Inference, Springer. Written for the machine-learning
community; an unusual blend, sometimes deep, sometimes broad.
- Yudi Pawitan, 2001, In All Likelihood: Statistical Modelling
and Inference Using Likelihood, Oxford University Press. Good
introductory chapters.
- Colin Howson and Peter Urbach, 2006, Scientific reasoning: The
Bayesian approach, Open Court, 3rd ed. Be careful if you read
this book: there are lots of mistakes (the 2nd edition was less buggy
but more prolix).
- Mark J. Schervish, 1995, Theory of Statistics, Springer.
This is a full-on theory book; it has some illuminating examples but
you will find it difficult to follow.
- David Cox, 2006, Principles of Statistical Inference, Cambridge
University Press. Very deep book, also hard. Also the predecessor book: D.R. Cox and D.V. Hinkley, 1974, Theoretical Statistics, Chapman and Hall.
Lectures occur normally on Mondays (12.00, SM2), Thursdays (14.00, SM1), and Fridays
(10.00, SM2),
and problems classes on Thursdays (15.00, SM1). Note that the times are when the lecture actually starts;
aim to be in your seat well before this, taking account that we are
quite a large group and it may take several minutes for everyone to
enter and be seated.
Most weeks are accompanied by readings from the textbook. In an
ideal world you would glance at the readings before the lectures, but
it is OK if you read after the lectures. It is not OK to skip the
reading. Note that the given readings are very topic-specific: you
will need to consult your textbook much more widely than just the
sections given here.
Lecture outline and reading
Links to handouts and supplementary notes will be given below. I will
compile a glossary as we go through the course of the most
important concepts.
This is a list of lectures as they will be given (tentative).
Frequentist inference
- Introduction, concepts, notation (Mon. 8th Oct)
Reading: Rice chs 1 and 2, for revision and the basic distributions.
- The likelihood function: definition, properties (Th. 11th Oct)
- Maximum likelihood (ML) estimates: definition, invariance (Fri. 12th Oct)
Reading: Rice ch 7 [HW 01, due 18th October].
- ML estimators: definition, methods for finding ML (Mon. 15th Oct)
Reading: Rice, 8.1 - 8.3, start of 8.5
- Properties of estimators, bias, standard error, MSE (Th. 18th Oct) Reading: Rice, Ch 5
- Probabilistic convergence, Slutsky's theorem (Fr. 19th Oct) [HW 02, due 25th October].
- Properties of Estimators (continued) (Mon. 22nd Oct).
- Fisher Information Regularity Conditions, score function. (Th. 25th October).
- Efficiency of estimators, the Cramer-Rao lower bound (Fr. 26th Oct) Reading: Rice, 8.5.2 [HW 03, due 1st November].
- Asymptotic distribution of ML estimators (Mon. 29th Oct.)
Reading: Rice, 8.6
- (Th. 1st Nov)
- Confidence sets (CS): definition, the Normal special case (Fr 2nd Nov) [HW 04, due 8th November].
- Approximating CIs: asymptotic
approaches based on the Wald method (Mon. 5th Nov).
Reading: Rice, 5.2, 8.5.3 Numerically maximising likelihood functions in R: detailed handout and text file to cut and paste the R code.
- Reparametrizations and confidence intervals (Th. 8th
Nov). More on how to optimise a likelihood and compute confidence
intervalsusing R (includes Allan's rule of thumb) : pdf file, text file to cut and paste into R.
- Another approach based on likelihood ratios - Wilk's approach (Fr. 9th Nov.) [HW 05, due 15th November].
- Invariance property of Wilk's CIs - (Mon. 12th Nov)
- Introduction to hypothesis tests Reading: Rice, 9.1 - 9.3 (Th 15th Nov)
- The Neyman-Pearson approach, and p-values (Fr. 16th Nov)
[HW 06, due 22nd November].
- Examples: computing the power function, the p-value .
Example code as pdf file,
as text file (Mon. 19th Nov.)
- More general approaches to testing hypotheses: UMP tests. (Reading: Rice 9.4 - 9.5 ) (Th. 22nd Nov)
- Generalised likelihood ratio (GLR) tests (Fr. 23rd Nov) [HW 07, due Thursday 29th November].
- Hypothesis tests and confidence intervals (Mon. 26th Nov.)
- GLR tests for the multinomial distribution (Th. 29th Nov.) Reading: Rice 9.6
- Pearson's Chi-squared test (Fri 30th Nov).
[HW 08, due Th. 6th Dec.]. Reading: Rice 9.8 - 9.9
- Chi-squared tests in practice, QQ plots, hanging chi-grams . Example of code
as text file as a pdf file and the associated dataset (Mon. 3rd Dec)
Outline of Bayesian inference
- The postulates of Bayesian inference, Bayes's theorem
Reading: Rice ch 15 (Th. 6th Dec)
- Credible intervals (Fri 7th Dec.)
- Hypothesis testing, Bayes factors (Mon. 10th Dec).
Vacation reading
There is much interesting material on hypothesis tests in Rice,
chapters 11, 12, and 13. This material should be straightforward, and
reading it carefully is a good way to check that you have understood
hypothesis testing properly.
Some other issues:
- Several people have asked about R in the exam. My view is that it
is not constructive to assess programming in a timed and unseen
exam. However, you may find it useful to describe
calculations or algorithms in terms of R code.
- You are permitted to bring into the exam one sheet (two sides) of
A4 notes. You may want to bring this example of glossary, or you may want to prepare
your own. But you should be aware that the exam is set with this in
mind: you should not expect to score very many marks simply by
referring to your sheet of notes.
- You will be expected to know the probability mass/density
functions of the basic distributions (e.g. Binomial, Poisson,
Geometric, Exponential, Normal). So if you are unsure about these,
you should include them on your sheet of notes.
There are Homeworks for most but not all weeks. The 'answers' will be
posted once the homework has been handed in.
The schedule for the homeworks is as follows. Worksheets are
distributed on Thursday at the 14.00 lecture. Homework is handed in the
next Thursday in the folder next to my
office door by 5pm. The previous week's homework can be collected
from me after the Thursday (if available) or Friday lecture.
The format for the homeworks is usually in the form of worksheets,
i.e. designed to lead you through an analysis, to increase
understanding. This is not the same format as the questions in the
exam, as you will see from consulting previous exam papers. Most of
the homeworks contain a question that requires you to use R.
Please remember that communication is a very
important part of statistics. When you do your homework, you must
communicate with your reader. This means, among other things, using
proper sentences, and making the progress of your answer as clear as
possible.
Marking scheme:
- A
- Your work is neat and well laid-out. The answers are logical.
- B
- Somewhere between A and C.
- C
- Your work is messy and hard to follow, and/or quite lot of it is wrong.
- N
- Really, it would have been better had you not submitted :)
Code snippets to help with the homeworks
Homework 2
dpareto <- function(x, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0))
ifelse(x < x0, 0, (theta / x) * (x0 / x)^theta)
}
ppareto <- function(x, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0))
ifelse(x < x0, 0, 1 - (x0 / x)^theta)
}
qpareto <- function(u, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0), all(u >= 0), all(u <= 1))
x0 * (1 - u)^(-1/theta)
}
rpareto <- function(n, x0, theta) {
stopifnot(all(x0 > 0), all(theta > 0), n >= 0)
u <- runif(n)
qpareto(u, x0 = x0, theta = theta)
}
Andrew Smith and Fei Xuang give two weekly computer practical office hours in the computer classroom on the ground floor
of the Maths Dept. This is in case you are struggling with aspects of
R: attendence is voluntary but strongly encouraged.
There are some tips for using R in the
Computer Classroom, and other useful
resources.
There are three computer practicals that must be handed in. The first
practical functions as a dry-run.
The second and third practicals each contribute 10% of your
overall mark in Statistics 2. You may want to consult the policy on late
submissions, available on
the unit
webpage.
Plagiarism has been an issue last year and offenders
have been penalized through a formal process for this. Although
discussion between students of the work is allowed, you should write
your own code and write your answers on your own. Make sure that you
understand the departmental guidelines on the matter in Appendix C of the second year handbook , and ask me questions if you are in any doubt.
Note that the answers are the 'bare-bones' answers: your answers should also include
some explanations of your calculations, and possibly comments in the R code.
- File management. There are clear instructions on accessing your files in the computer classroom.
- Using files. It is best to enter your commands into a text file, and copy and paste these commands into R. You can do
File > New Document
to start a new file, or File > Open Document...
to open an existing file. If you make or modify a file you will be asked if you want to save it when you exit the file.
You may want to execute all of the commands in the file, in which case use
File > Source file...
- You can save the objects you create during your R session by answering 'y'
to the question "Save workspace image? [y/n/c]: ", when you quit.
They won't come back automatically, though. You have to issue
the command
load('.RData')first.
- The R comment character '#' does not appear on the Mac keyboard. You can
find it as Alt-3.
- You highlight text using Shift and the cursor keys, or by dragging over the
mouse. On the Mac, cut is cmd-x; copy is cmd-c, paste is cmd-v. The cmd key is the
one to the left of the space bar.
- If you are working in R, you can recover previous commands by using the up
and down arrows. This is the easiest way to build up a complicated command:
keep adding arguments one at at time, until you get something that looks good.
We use the open-source Statistical Computing Environment R, whose homepage is
http://www.r-project.org/.
This gives access to to the source code, which you can also get directly
at http://www.stats.bris.ac.uk/R/, and also to
documentation.
You might be interested in the document An
Introduction to R, despite its
name, this is fairly comprehensive. For a more gentle introduction, try R: A self-learn tutorial written for undergraduates. Another good document can be found here---but this is more advanced.
When you become an R power-user, you will want to access the contributed
packages on CRAN, the Comprehensive
R Archive Network (click on Packages in the lefthand panel).