df.iloc[2,1]
df is the dataframe
-> 55000 (3rd row, 2nd column)
How to calculate the mean for the income variable for our population (sampled_users)
1st Option: sampled_users[‘income’].mean()
2nd Option: sampled_users.income.mean()
Pick one observation from the sample list
sampled_users.income.sample(n=1).iloc[0]
Difference between stats.norm.ppf (percentage point function) and stats.norm.cdf (cumulative distribution function)
stats.norm.cdf() -> takes a value -> returns a probability (area under the curve) -> XYZ%** of the observation is smaller than some value x" (we are looking for XYZ%)
stats.norm.ppf() -> takes a probability (i.e., arean under the curve) (e.g., 95%) -> returns a value of the normal distribution (e.g., 1.96) -> ""XYZ% of the observation is smaller than some value x" (we are looking for x)
How to formulate a one sample test in Python:
one sided
two sided
One sided:
t_statistic_1s_g, p_value_1s_g = stats.ttest_1samp (a=sample.income, popmean=50000, alternative="greater")
t_statistic_1s_g, p_value_1s_g = stats.ttest_1samp (a=sample.income, popmean=50000, alternative="less")
Two sided:
t_statistic_1s_g, p_value_1s_g = stats.ttest_1samp (a=sample.income, popmean=50000, alternative="two-sided")
How to formulate two sample test in Python:
1 sided
2 sided
One-sided:
t_statistic_1s_g, p_value_1s_g = stats.ttest_ind(sample1.value, sample2.value, alternative="greater")
Two-sided:
t_statistic_1s_g, p_value_1s_g = stats.ttest_ind(sample1.value, sample2.value, alternative="two-sided")
compute the t_score with python
t_score = stats.t.ppf(1 - alpha/2, df=degrees_of_freedom)
Last changed13 hours ago