The course aims at providing the cornerstones of inferential statistics: the
concept of statistical model, the tools of point estimation, of interval
estimation and of statistical hypotheses testing. The simple linear
regression is also introduced.
Alternatives:
1) Hogg, Tanis. Probability and Statistical Inference (>= 7 ed), Pearson Prentice Hall (somewhat technical). The linear regression model is covered by the corresponding chapter of the textbook in alternative 2.
2) McClave, Benson, Sincich. Statistics (>= 8 ed), Pearson. This second textbook is integrated by material covering aspects not faced (details in the course web page)
Learning Objectives
Statistics deals with collecting, organizing and interpreting numerical data. Statistical
literacy is an essential skill for understanding and making sensible decisions based on the analysis of numerical information. Within this framework, the course aims at providing the cornerstones of inferential statistics: the concept of statistical model, the tools of point estimation, of interval estimation and of statistical hypotheses testing.
Prerequisites
1) Mathematical concepts: basic operations and properties; capital sigma (generalized sum) and pi (generalized product) operators and their properties; functions; special functions (power, exponential, logarithm); derivatives; basic notions of series and integrals
2) At least 9 CFU in Statistics
3) At least the basic notions of probability (random experiment, sample space, events, probability and its properties, conditional probability and its properties). Students without this background can review the specific chapter of the textbook
Teaching Methods
Traditional lesson
Further information
None
Type of Assessment
Exam consists of 2 written tests:
1) Statistical theory: This test is composed by theoretical questions
requiring, possibly, a few computations. No notes can be used. Duration
45'; weight 40%.
2) Statistical exercises: This test is composed by statistical exercises
requiring some computations. The student can use formulas written
behind the 9 sheets of the statistical tables. Duration 1h:30'; weight 60%.
Additional, oral explanations can be requested by the teacher.
Course program
1) Presentation of the course. Random Variable (r.v.): definition;
examples; domain of a r.v.; discrete and continuous r.v.'s. Discrete r.v.:
distribution of a discrete r.v. via probability mass function (p.m.f.);
properties; examples.
2) Discrete r.v. The distribution of a discrete r.v. via cumulative
distribution function (c.f.d.). Properties of the c.d.f. Examples.
Expectations of discrete r.v.'s: mean, variance, standard deviation (s.d.).
3) Discrete r.v. The mean and the variance of some transformations of a
r.v. X: mean and variance of a constant (c), of the de-meaned r.v. (X -
mu), of the standardized r.v. (X - mu)/sigma, of a linear tranformation (a
+ b X). Continuous r.v. Motivations: why the p.m.f. does not make sense
while the c.d.f. can still play a role. Using the c.d.f. for computing
probabilities.
4) Continuous r.v. Definition, interpretation and properties of the p.d.f.Tipo testo
Testo
Link between c.d.f. and p.d.f of the same r.v. Expectations of continuous
r.v.'s, with a specific emphasis on the mean and on the variance.
5) Multiple r.v.: definition; examples; domain of a multiple r.v.; discrete,
continuous and mixed multiple r.v.'s. Multiple discrete r.v.: definition of
the joint p.m.f.; relationships with the marginal p.m.f. and the conditional
p.m.f.; properties.
6) Multiple discrete r.v. Expectations involving multiple discrete r.v.'s:
mean, variance and standard deviation of the marginal components;
covariance and correlations between couples of random variables and
their interpretation. Multiple continuous r.v. Definition of the joint p.d.f.
7) Multiple continuous r.v. Properties of the joint p.d.f. Joint, marginal and
conditional p.d.f.'s. Expectations involving multiple continuous r.v.'s:
mean, variance and standard deviation of the marginal components;
covariance and correlations between couples of random variables.
Multiple r.v. Independence of r.v's; independence versus absence of
correlation. Examples.
8) Multiple r.v. Properties of covariance and correlation coefficient. Mean
and variance of a portfolio (linear combination) of random variables and
some useful special cases. Special r.v.'s. Summary of the points touched
in handling special r.v's: definition (in terms of p.m.f. or p.d.f.); main
expectations (mean and variance); properties; some practical examples
(when possible). The Bernoulli r.v.
9) Special r.v.'s. The Binomial r.v.; The Poisson r.v.
10) Special r.v.'s. The Continuous Uniform r.v. The Normal (or Gaussian)
r . v .
11) Special r.v.'s. The use of the Standard Normal tables for computing
probabilities and intervals with Normal r.v.'s
12) Special r.v.'s. The Gamma r.v., the Chi-squared r.v., the Student-T r.
v., the Fisher-F r.v.
13) Point Estimation. Introduction to the problem and to the concepts of
population, sample, parameter, statistic and estimator, statistic value
and estimate, sample distribution of a statistic and related synthetic
indices.
14) Point estimation. Properties of estimators: the mean squared error
(MSE) and the concept of relative and absolute efficiency. In quest of the
most efficient estimator: motivations for applying some restriction to the
set of possible estimators taken into account; decomposition of the MSE
as variance plus bias^2; unbiased estimators.
15) Point estimation. In quest of the most efficient estimator: the Cramer-
Rao bound as benchmark for checking the absolute efficiency of unbiased
estimators. The Maximum Likelihood (ML) method: definition of
likelihood, log-likelihood, score vector. The ML method at work: the
estimation of p in the Bernoulli model.
16) Point estimation. The ML method at work: the estimation of lambda in
the Poisson model; the estimation of mu and/or sigma^2 (depending on
if one or both parameters are unknown) in the Normal model.
17) Point estimation. Derivation of the properties (sample distribution,
bias, variance, MSE, check of the Cram\'er-Rao bound for unbiased
estimators) of the ML estimators computed.
18) Point estimation. ML estimation of parameters of the Gamma model
as a motivation for introducing asymptotic properties. Asymptotic
properties: consistency, asymptotic unbiasedness, asymptotic efficiency,
asymptotic sample distribution.
19) Point estimation. ML estimators as C.A.N.E. (Consistent
Asymptotically Normal Efficient) estimators. Interval Estimation.
Introduction to the statistical problem by comparing interval estimation
with point estimation.
20) Interval Estimation. Definition of interval estimate (confidence
interval), confidence level, size of the interval. The Pivot method for
finding confidence intervals: definition of pivot quantity and illustration of
how the method works in practice. Interval Estimation. Pivots and
corresponding intervals for: the mean of a Normal r.v. (variance known).
21) Interval Estimation. Pivots and corresponding intervals for: the mean
of a Normal r.v. (variance unknown); the variance and the s.d. of a
Normal r.v. (mean known and unknown).
22) Interval Estimation. Pivots and corresponding intervals for: theTipo testo
Testo
probability of a Bernoulli r.v.; the mean of a Poisson r.v. How to use the
theory behind interval estimation for computing the sample size of a
survey aiming at estimating a probability or a mean.
23) Testing Hypotheses. Motivations, framework, definition of statistical
hypothesis (simple and composite), definition of statistical test.
24) Testing Hypotheses. Table of decisions, type I and type II errors,
significance level and power of a test. The Neyman-Person lemma and
ensuing remarks. Examples.
25) Testing Hypotheses. Comparison of different specifications of the
alternative hypothesis (pointwise, unidirectional, bidirectional) and
consequences on the rejection region. More on the role of the power of a
test. The factors influencing the power a test.
26) Testing Hypotheses. Testing hypotheses concerning: the mean
parameter of a Normal r.v. (cases sigma^2 known and sigma^2
unknown); the probability parameter of a Bernoulli r.v. The p-value:
definition, computation and interpretation.
27) Testing Hypotheses. Testing hypotheses concerning: the variance of
a Normal r.v. (cases mu known and mu unknown); the difference
between the probabilities of two independent Bernoulli distributions (and
remarks on point estimation and interval estimation in the same
situation).
28) Testing Hypotheses. Testing hypotheses concerning: the difference
between the means of two Normal r.v.'s by means of independent
samples (with the two variances known; with large samples and the two
variances unknown; with the two variances unknown but equal and,
related to this case, the pooled sample variance).
29) Testing Hypotheses. Testing hypotheses concerning: the difference
between the means of two Normal r.v.'s, by means of independent
samples, with the Satterthwaite-Welsh statistic; the difference between
the means of two Normal r.v.'s by means of paired data.
30) Linear Regression Model. Introduction; model definition and
corresponding properties; Ordinary Least Squares (OLS) estimators of the
parameters; fitted values and residuals. Linear Regression Model.
32) Properties of OLS estimators: their sample distribution; Best Linear
Unbiased Estimators (BLUE) and discussion of the Gauss-Markov
property. Examples.
33) Linear Regression Model. Deviance decomposition and R^2 index;
predictions of the conditional mean and of the dependent variable for a
given value of the independent variable.