Buffl

Statistik

LH
by Lea H.

Suggest an appropriate visualization and implement it with ggplot2 to display a possible association between coffee consumption and “datavizitis” disease risk, measured in deaths per 1000 individuals. Does this plot by itself seem consistent with a causal effect of coffee on datavizitis?



Investigate the full dataset. Do you see evidence for a third variable influencing association? Support your statement with an appropriate plot. Draw a graph with the potential causal relationships you find consistent with the data. Relate it to one of the situations from the lecture script’s figure 6.3 or Simpson’s paradox.

# Taken by itself, the plot seems consistent with a causal effect on datavizitis.

ggplot(coffee_dt, aes(coffee_cups_per_day, datavizitis_risk)) + geom_boxplot() + labs(x = "Cups of coffee per day", y = "Deaths per 1,000")




# This is the way it looks for smoking

ggplot(coffee_dt, aes(packs_cigarettes_per_day, datavizitis_risk)) + geom_boxplot() + labs(x = "Packs of cigarette per day", y = "Deaths per 1,000")



# and this is the proper way to look at it, # coffee effects are always the same within each smoking group.

ggplot(coffee_dt,

aes(packs_cigarettes_per_day, datavizitis_risk, fill = coffee_cups_per_day)) + geom_boxplot() + labs(x = "Packs of cigarette per day", y = "Deaths per 1,000") + guides(fill = guide_legend(title = "Cups of coffee"))




# But the effect of smoking is not the same within each # coffee consumption group.

ggplot(coffee_dt, aes(coffee_cups_per_day, datavizitis_risk, fill = packs_cigarettes_per_day)) + geom_boxplot() + labs(x = "Cups of coffee per day", y = "Deaths per 1,000") + guides(fill = guide_legend(title = "Packs of cigarettes"))













































Author

Lea H.

Information

Last changed