Simple Random Sampling vs. Stratified Sampling
Stratified Random Sample
Stratified Random Sample: A sample of size n = n1 + n2 + ··· + nL taken be dividing the population into L non-overlapping homogeneous groups, (called strata), and selecting a simple random sample of size n_i from each strata, i = 1,..., L.
The stratification allows you to group your population into similar subgroups that ensures you take samples from each.
Example:
geographically, e.g., from all the lakes over a certain size in the Whiteshell, take 40 water samples from each lake
by association, e.g., from each section of STAT 1000, sample 20 students to take a survey
by age
any way that groups people likely to have a similar response
Selecting Sample Sizes for Stratification:
proportional and disproportional
We do not need to take the same amount from each group. In fact, it is more common to take samples proportionally from each group.
The choice between proportional and disproportional stratified sampling depends on the research objectives, the characteristics of the population, and the available resources.
Proportional Stratified Sampling
Proportional stratified sampling is used when you want to maintain the same proportion of each stratum in the sample as it exists in the population.
This method is ideal when each stratum represents a relatively equal portion of the overall population, and you want to ensure that the sample reflects this balance.
Example: If you are conducting a political opinion survey in a country with equal numbers of Democrats, Republicans, and Independents, you might use proportional stratified sampling to ensure that the sample also contains an equal proportion of each group.
Proportional Stratified Sampling: Example
Suppose are taking a stratified random sample of cats at animal shelters in the city: Shelter A has 100 cats, shelter B has 30 cats, shelter C has 40 cats, and shelter D has 30 cats.
If you wanted to take a sample of size 40, you would take 20 from shelter A, 6 from shelter B, 8 from shelter C, and 6 from shelter D as:
Disproportional Stratified Sampling
Disproportional stratified sampling is used when the subgroups within the population are not equally represented, and you want to oversample certain strata to ensure adequate representation in the sample.
This approach is beneficial when some strata are of particular interest, rare, or underrepresented, and you need more precise estimates for those subgroups.
Example: In a healthcare study, if you are interested in studying a rare medical condition that affects a small proportion of the population, you may use disproportional stratified sampling to oversample individuals from that stratum to ensure you have enough data for meaningful analysis.
Sample Size and Cost
It is common in industry to take samples based on the cost of sampling.
We want to both minimize cost while accounting for strata size and variability.
It may not cost much money for you to survey a student in STAT 1000, but it could be very costly if you’re taking blood samples from wild life in the Canadian Shield.
Different strata will have different costs, different amounts of variability, and different original sizes. All of these may need to be taken into account.
Advantage of Stratified Sampling
Usually stratification produces a lower estimated variance and hence lower bound on the estimate.
Logical groupings may be easier to locate together (e.g. multiple people in the same school), lowering costs to conduct the survey.
Allows you to obtain estimates for subgroups in your population assuming that the subgroups match your strata.
Estimator of the population mean µ
Estimated Variance of ȳ
Estimator of the population total τ
Estimated Variance of estimated population total τ
Estimated variance of Nȳ_st:
Approximate sample size required to estimate µ with a bound B on the error of estimation
Approximate allocation that minimizes cost for a fixed value of V(ȳ_st) or minimizes V(ȳ_st) for a fixed cost
Find the sample size
Allocate your sample
Neyman Allocation
When costs are known to be equal across strata or unknown and presumed mostly equal, the sample can be allocated as
Even simpler than the other allocation
—> only c_i missing for the costs
--> allocation with variable costs is harder
Estimator of the population proportion p
Estimated Variance of p̂_st
Approximate sample size required to estimate p with a bound B on the error of prediction
Approximate allocation that minimizes cost for a fixed value of V(p̂_st) or minimizes V(p̂_st) for a fixed cost
Stratification After Selection of the Sample: Post-Stratification
Largely, the reason for the use of stratification, is to make sure we have some people from all subgroups.
If left to random chance we could over or under sample a subgroup.
However, we can’t always stratify in advance to ensure against that.
Examples:
If I’m using a telephone survey, I likely don’t know if the homeowner is of working age or retired.
If I’m sending out an email survey to a random selection of U of M email addresses, I cannot tell in advance whether I’m polling students that live at home with their parents or whether they live with roommates/independently.
Post-stratification can be used to weight the final results of the subgroups if we need to make sure we get a proportional representation of the views of these subgroups but cannot tell in advance how many of each group we will get.
This assume that we know approximately the total size of the subgroups
for each strata.
We can then estimate the overall population mean by taking the sample means in each strata and re-weight them by their proportion in the population.
Post-Stratification Estimates for a Mean
Last changeda year ago