Population

Population: The entire group of individuals or instances about whom we hope to learn

Sample

A (representative) subset of a population, examined in hope of learning about the population

Parameter

A numerically valued attribute of a model

Examples: µ, σ

Statistic

A value calculated from data to summarize aspects of data

Examples: ȳ, s

Parameter vs. Statistic

Parameter: describes the whole population

Statistic: describes a sample

Sampling distribution

A probability distribution of a statistic that is obtained through repeated sampling of a specific population

Example:

Consider a population in which the digits 0, 1, . . . , 9 were represented in equal proportions. Suppose we take a sample of size 10 and calculate the average of the digits.

One sample: 2, 3, 2, 0, 6, 8, 0, 1, 7, 3

Another one: 4, 1, 4, 5, 6, 1, 5, 3, 3, 8

The first and second sample have sample means of 3.2 and 4 respectively.

What happens when we repeat this procedure many times?

The distribution is bell-shaped and roughly symmetric

The distribution is centered around the population mean 4.5

The distribution of means has less spread than the population values (0-9)

What are reasonable values we can expect as a sample mean?

Largest sample mean possible: 9

Smallest sample mean possible: 0

While theoretically it could happen, it is extremely unlikely we would ever see that bound.

Confidence Interval

When we take random samples, usually our sample statistic is a reasonable estimate of the population mean. What are some reasons why our estimate might be drastically off?

When we take random samples, usually our sample statistic is a reasonable estimate of the population mean.

Every time we take a sample from a population, we get something that is a bit different. If we are picking sample randomly and with a sufficiently larger sample size, they should generally reflect the population.

What are some reasons why our estimate might be drastically off?

sampling from a biased group

not taking outliers into account

too small sample size

large variance in distribution

human mistakes during data collection

Sampling Distributions: Skewed Data

What happens to our sampling distribution when we have a highly skewed, finite data set?

The sampling distribution is symmetric despite a skewed population.

Again, the distribution is centered around the population mean, µ = 1.29

What happens when we have a smaller sample size?

When the sample size is small, the sampling distribution is still skewed

Many statistical methods don’t apply without approximate normality of the sampling distribution

If a small sample size must be used, the population should be somewhat normal, or able to be transformed to be such

Last changed10 months ago