Element
An object on which a measurement is taken
Population
A collection of elements about which we wish to make an inference.
Sampling Unit
Nonoverlapping collections of elements from the population that cover the entire population.
(Sampling) Frame
A list of sampling units
Sample
A collection of sampling units drawn from a frame or frames.
A unit and an element may or may not be the same thing.
What could your sampling units be instead so that your elements and sampling units would not be the same?
List of addresses/households
--> but you actually are surveying individuals
--> elements would be the people who live there but the company only has the addresses
Suppose I wanted to survey adults in the City of Winnipeg about how often they recycle. Our elements would be adults and our population would be adults in the City of Winnipeg.
What might we use for sampling units?
What might we use for a sampling frame?
Sampling units: individuals, households
Sampling frame: census, record of taxpayers, list of telephone numbers, list of registered voters
Sample size
What sample size should we use?
We estimate a population parameter with our best guess, a sample statistic.
Every sample is different but ideally we’d like to make sure our estimate is within a certain amount, say B.
Sampling Method
How do we choose a sampling method?
This depends largely on the context of the study so we must ask ourselves:
How are our elements spread out throughout our sampling frame? How easy is it for me to access them?
Are elements easy to sample with no groupings of common elements —> simple random sampling
Are there defined subgroups? —> stratified sampling
Are our sampling units very geographically spread apart?—> multistage sampling.
Are there defined groups where there’s no big difference between groups but inside the groups it is diverse? —> cluster sampling
Do I have an easily accessible list where the order is more or less random? —> systematic sampling
Errors of Non-Observation
Errors of nonobservation are related to our sample making up only part of the target population.
Examples:
Sampling Error
Undercoverage
Nonresponse
Errors of Non-Observation: Sampling Error
Sampling Error: The distance between the recorded statistic and the population parameter due to only collecting a sample of the population.
Example: our statistic changes between each sample merely because each sample is different, not because the parameter is changing
Sampling error is something we have to live with as the price of being statisticians —> we can at least control this by setting α
Choice of Significance Level α
We can choose α greater than or smaller than the standard 0.05 if it is suitable for the study.
What are some scenarios in which a small α may be preferred? What about a larger α?
Small alpha: medical trials --> avoid bigger mistakes
Large alpha: get a general overview for the model and accuracy is not that important as before
Important: setting alpha before analysis and not depending on the p-value just to have the significance of your results
Errors of Non-Observation: Undercoverage
Undercoverage: When a sampling frame does not include the entire target population.
Note: There can also be issues with the sampling frame containing units not in the target population, e.g., Non-Canadians.
Correction: Issues with coverage are hard to correct after the fact as there is usually a reason they were not included in the original sampling frame. Responsibly, you should report what your sampling frame was and how it compares to your target population in your study
What are some examples of scenarios where your sampling frame may be missing elements from the population?
What about where your sampling frame contains elements not in the target population?
Question: What are some examples of scenarios where your sampling frame may be missing elements from the population?
Answer: List of telephone numbers for individuals but not everybody has a number --> if significant individual does not have a phone number this can lead to undercoverage
Question: What about where your sampling frame contains elements not in the target population?
Answer: Asking people on the street questions even though they are actually not Canadians
Often those missing, are missing for reasons that may make them important and unique parts of your population to survey. In particular, vulnerable or low income populations can be marginalized from participation in opinion polling.
Errors of Non-Observation: Nonresponse
Nonresponse: When you cannot collect measurement on selected units in your sample
Example: e.g. when people do not respond to mail, on phone, ...
Name the three causes of non-response.
We can broadly classify non-response into three causes
An inability to physically reach a sampling unit, e.g., no internet connection, no phone line, no permanent address.
An inability of the sampling unit to give the correct response.
A person may refuse to answer the survey because, e.g.,
Questions where you do not feel comfortable to answer
Survey is too long and would take too much time
Not interested in doing so
Errors of Observation
Errors of observation are related to what is recorded about our sampling units being inaccurate.
We can broadly group errors of observation as being due to:
Interviewers: Tone, age, gender, physical appearance, and demeanor of an interviewer can all affect how truthful people will be, intentionally or unintentionally
Respondents: Respondents might not understand questions, may not seek clarification, may be embarrassed or afraid to answer truthfully, may exaggerate, may make up answers to not appear uninformed, or confuse units of measurement.
Measurement Instrument: Confusion around what the unit of measurement is or how something is defined.
Method of Data Collections: Accuracy can be affected by conducting personal interviews vs telephone interviews vs self-administered questionaires vs direct observation.
Personal interviews vs. telephone interviews vs. mailed questionaires
Personal interviews
non-response may be lower with personal interviews (as people my not have time for a personal meeting)
could result in more detailed responses
may get biased responses from trying to agree with the interviewer
Telephone interviews
people may be more willing to partiticipate
Mailed questionnaire
participatnts may be more honest
non-response may be higher unable to clarify question
Reducing Error
Name five ways to reduce the errors in surveys.
There are many ways research companies and researches attempt to reduce errors in their surveys:
Callbacks: Making follow-up calls or sending reminder surveys (by mail or email) can help response rates. Follow-up calls should vary by time of day and week to catch people on different schedules.
Rewards and Incentives: Surveys can offer monetary incentives for participating or put respondents into draws for a potential reward. Large survey companies with panels of people they select from may earn points towards gift cards or other rewards.
Interviewer Training: Interviewers should have opportunities to practice asking questions under watchful eyes that can suggest improvements in intonation or pronunciation or demeanor that may get more truthful answers.
Data Checks: Data can be cross referenced (e.g. age to year of birth), obvious “wrong” answers can be eliminated or corrected by followups if possible.
Questionnaire Construction: Questions can be constructed to help get honest and truthful answers from respondents and help eliminate people from lying due to not understanding questions.
Question Ordering: Primacy Effect
Primacy effect: make people more likely to select the first choice on a list or stop reading when they get to their first agreeable option.
Example: In an online survey asking participants which food item they consume the most from a list of 30+ foods, items near the top of the list may get selected more than those at the bottom.
--> in long lists respondents tend to select itmes in the beginning of the list
--> try to categorize and make multiple questions
Question Ordering: Recency Effect
Recency effect: makes people more likely to choose from the last few options on a list because they remember them
Example: In a telephone survey asking participants to choose the most important political issue to them, more responses may be received for the options at the end of the list because they remember them the best.
—> avoid long lists
Question Ordering: Context Effect
Context effects are particularly common if one goes from specific question to a more general one or vice versa.
Example 1
People were asked if they were happy in their marriage and if they were happy with their life in general.
When asked about life then marriage, 52% said they were very happy in life in general.
When asked marriage then life, 38% said they were very happy in life in general.
The theory being people who felt happy thinking about their marriages specifically, that made life in general seem less great in comparison.
Example 2
A person that is given a long list of questions about crimes might respond differently to a question about if they’ve been a victim of crime.
The first question “primes” them and gives them opportunities to remember things that have happened to them in the past.
Example 3:
Respondents may be asked the following
A: Will you support an increase in taxes for education?
B: Will you support an increase in taxes?
If A were asked first, those who support a tax increase for education and respond with “Yes” may think that B implies an increase in taxes not for education and respond with “No”.
If B were asked first, those same people may respond “Yes” to both.
Note: General question first, then the specific one.
Question Ordering: Combat unintended effects
Solutions to combat unintended effects include:
Randomization amongst all participants
1) randomize question order (when it makes sense)
2) for lists: randomize order of options
3) for ordinal questions: flip order of options, e.g., not satisfied ... highly satisfied vs. highly satistified ... not satisfied
Write out or restate questions
—> help participants refocus on the given question.
Close questions
Closed questions: have a predetermined set of answers or a finite numerical answer (e.g. age or pain rating 1 to 10)
Pros: easy to analyze, may be easier for participant
Cons: difficult to capture all possible choices, responses may be biased based on order of choices, some categories may get chosen more than would be in an open question
Open questions
Open questions: allows people answer however they would like
Pros: allows respondent to give detailed response, may receive responses that would not be on a closed list and that they didn’t even think of
Cons: difficult to analyze since respondents don’t have a specific format, difficult to compare across questionnaires
Pre-surveys
Sometimes a pre-survey is done to get the most common options for the real survey.
This helps to capture what the public will answer as opposed to what the surveyers think they might answer.
Response Options: Forced Choice
Forced choice questions: makes a respondent select a yes or a no, a one or the other.
Forcing people to make decisions on questions they know nothing about is not valuable
Questions about which everyone has enough information on to form an opinion should be stated without a “No Opinion” or “Don’t Know” option
Fewer options may be given as an attempt to polarize opinion on one side or the other
Example: Consider the following question: “Do you think the enforcement of traffic laws in our city is too strict or too lenient?” —> No middle choice is given as respondents may tend to choose it too often as an easy way out.
In certain situations it seems highly likely that people are more accurate when doing forced choice rather than “select all that apply”. The select-all-that-apply format yielded a large primacy effect, while the forced-choice format showed no primacy effect.
Example: Someone is unlikely to report they’ve suffered from addiction when they haven’t. However, someone may report they haven’t suffered from addiction when they have (to not embarrass themselves) so the reporting method with higher results is likely more accurate here.
Question Wording: Leading Questions
Leading questions: includes extra information that purposefully influences people in a particular question.
Example
When people were asked “Do you think the United States should forbid public speeches against democracy?”, 21.4% said yes
When they were asked “Do you think the United States should allow public speeches against democracy?”, 47.8% said no
Forbid is a strong word, allow is much milder
—> the outcome can be misrepresented depending on the wording
It is generally good to give someone two options in the wording rather than a straight “Do you favour. . . ?”
Example: “Do you favor or oppose the use of capital punishment?” over “Do you favor capital punishment?”
Asking “Do you agree with. . . ?” may make the interviewee feel like the interviewer thinks it’s agreeable and make them more likely to respond with yes
Question Wording: Double Barreled Question
Double barreled question: a question that addresses to two ideas
Only one question should be asked at a time.
Example: “Do you believe the IB program helped promote thinking about global issues and multiculturalism?”
—> people could have different opinions on this
Question Wording: Double Negatives
Don’t use double negatives
Example: “Do you favour or oppose not allowing teenage drivers to drive alone after midnight?”
Question Wording: Measurement Instruments
Be clear in writing out the measurement instruments for questions with quantitative responses
Example: “How much do you work a week?” could be better as “On average, how many hours a week are paid for work?”
For in person interviews, a prop might be helpful to demonstrate height or volume
Hospitals often give pain scales with descriptors for each number when asking questions of patients
Zuletzt geändertvor einem Jahr