Unit 4: Stratified Sampling

Buffl

STAT2300

by Sylvie B.

Simple Random Sampling vs. Stratified Sampling

Stratified Random Sample

Stratified Random Sample: A sample of size n = n1 + n2 + ··· + nL taken be dividing the population into L non-overlapping homogeneous groups, (called strata), and selecting a simple random sample of size n_i from each strata, i = 1,..., L.
The stratification allows you to group your population into similar subgroups that ensures you take samples from each.
Example:
- geographically, e.g., from all the lakes over a certain size in the Whiteshell, take 40 water samples from each lake
- by association, e.g., from each section of STAT 1000, sample 20 students to take a survey
- by age
- any way that groups people likely to have a similar response

Selecting Sample Sizes for Stratification:

proportional and disproportional

Selecting Sample Sizes for Stratification:

proportional and disproportional

We do not need to take the same amount from each group. In fact, it is more common to take samples proportionally from each group.

The choice between proportional and disproportional stratified sampling depends on the research objectives, the characteristics of the population, and the available resources.

Proportional Stratified Sampling

Proportional stratified sampling is used when you want to maintain the same proportion of each stratum in the sample as it exists in the population.
This method is ideal when each stratum represents a relatively equal portion of the overall population, and you want to ensure that the sample reflects this balance.
Example: If you are conducting a political opinion survey in a country with equal numbers of Democrats, Republicans, and Independents, you might use proportional stratified sampling to ensure that the sample also contains an equal proportion of each group.

Proportional Stratified Sampling: Example

Suppose are taking a stratified random sample of cats at animal shelters in the city: Shelter A has 100 cats, shelter B has 30 cats, shelter C has 40 cats, and shelter D has 30 cats.

If you wanted to take a sample of size 40, you would take 20 from shelter A, 6 from shelter B, 8 from shelter C, and 6 from shelter D as:

Disproportional Stratified Sampling

Disproportional stratified sampling is used when the subgroups within the population are not equally represented, and you want to oversample certain strata to ensure adequate representation in the sample.
This approach is beneficial when some strata are of particular interest, rare, or underrepresented, and you need more precise estimates for those subgroups.
Example: In a healthcare study, if you are interested in studying a rare medical condition that affects a small proportion of the population, you may use disproportional stratified sampling to oversample individuals from that stratum to ensure you have enough data for meaningful analysis.

Sample Size and Cost

It is common in industry to take samples based on the cost of sampling.
We want to both minimize cost while accounting for strata size and variability.
It may not cost much money for you to survey a student in STAT 1000, but it could be very costly if you’re taking blood samples from wild life in the Canadian Shield.
Different strata will have different costs, different amounts of variability, and different original sizes. All of these may need to be taken into account.

Advantage of Stratified Sampling

Usually stratification produces a lower estimated variance and hence lower bound on the estimate.
Logical groupings may be easier to locate together (e.g. multiple people in the same school), lowering costs to conduct the survey.
Allows you to obtain estimates for subgroups in your population assuming that the subgroups match your strata.

Estimator of the population mean µ

Estimated Variance of ȳ

Estimator of the population total τ

Estimated Variance of estimated population total τ

Estimated variance of Nȳ_st:

Approximate sample size required to estimate µ with a bound B on the error of estimation

Approximate allocation that minimizes cost for a fixed value of V(ȳ_st) or minimizes V(ȳ_st) for a fixed cost

Find the sample size

Allocate your sample

Neyman Allocation

When costs are known to be equal across strata or unknown and presumed mostly equal, the sample can be allocated as

Find the sample size

Allocate your sample

Even simpler than the other allocation

—> only c_i missing for the costs

--> allocation with variable costs is harder

Estimator of the population proportion p

Estimated Variance of p̂_st

Approximate sample size required to estimate p with a bound B on the error of prediction

Approximate allocation that minimizes cost for a fixed value of V(p̂_st) or minimizes V(p̂_st) for a fixed cost

Stratification After Selection of the Sample: Post-Stratification

Largely, the reason for the use of stratification, is to make sure we have some people from all subgroups.
If left to random chance we could over or under sample a subgroup.
However, we can’t always stratify in advance to ensure against that.
Examples:
- If I’m using a telephone survey, I likely don’t know if the homeowner is of working age or retired.
- If I’m sending out an email survey to a random selection of U of M email addresses, I cannot tell in advance whether I’m polling students that live at home with their parents or whether they live with roommates/independently.
Post-stratification can be used to weight the final results of the subgroups if we need to make sure we get a proportional representation of the views of these subgroups but cannot tell in advance how many of each group we will get.
This assume that we know approximately the total size of the subgroups

for each strata.

We can then estimate the overall population mean by taking the sample means in each strata and re-weight them by their proportion in the population.

Post-Stratification Estimates for a Mean

Join Course

Preview

Author

Sylvie B.

Information

Last changed
2 years ago

Report course