Population
Population: The entire group of individuals or instances about whom we hope to learn
Sample
A (representative) subset of a population, examined in hope of learning about the population
Parameter
A numerically valued attribute of a model
Examples: µ, σ
Statistic
A value calculated from data to summarize aspects of data
Examples: ȳ, s
Parameter vs. Statistic
Parameter: describes the whole population
Statistic: describes a sample
Sampling distribution
A probability distribution of a statistic that is obtained through repeated sampling of a specific population
Example:
Consider a population in which the digits 0, 1, . . . , 9 were represented in equal proportions. Suppose we take a sample of size 10 and calculate the average of the digits.
One sample: 2, 3, 2, 0, 6, 8, 0, 1, 7, 3
Another one: 4, 1, 4, 5, 6, 1, 5, 3, 3, 8
The first and second sample have sample means of 3.2 and 4 respectively.
What happens when we repeat this procedure many times?
The distribution is bell-shaped and roughly symmetric
The distribution is centered around the population mean 4.5
The distribution of means has less spread than the population values (0-9)
What are reasonable values we can expect as a sample mean?
Largest sample mean possible: 9
Smallest sample mean possible: 0
While theoretically it could happen, it is extremely unlikely we would ever see that bound.
Confidence Interval
When we take random samples, usually our sample statistic is a reasonable estimate of the population mean. What are some reasons why our estimate might be drastically off?
When we take random samples, usually our sample statistic is a reasonable estimate of the population mean.
Every time we take a sample from a population, we get something that is a bit different. If we are picking sample randomly and with a sufficiently larger sample size, they should generally reflect the population.
What are some reasons why our estimate might be drastically off?
sampling from a biased group
not taking outliers into account
too small sample size
large variance in distribution
human mistakes during data collection
Sampling Distributions: Skewed Data
What happens to our sampling distribution when we have a highly skewed, finite data set?
The sampling distribution is symmetric despite a skewed population.
Again, the distribution is centered around the population mean, µ = 1.29
What happens when we have a smaller sample size?
When the sample size is small, the sampling distribution is still skewed
Many statistical methods don’t apply without approximate normality of the sampling distribution
If a small sample size must be used, the population should be somewhat normal, or able to be transformed to be such
Last changeda year ago