undefined

Buffl

Data Analysis with R

von Maja L.

Which of the following is NOT an arithmetic operator in R?

A) +
B) -
C) **
D) %/%

How do you install a package in R?

install.packages("package_name")

Which of the following modes does NOT exist in R?

A) logical
B) integer
C) array
D) complex

C) array

What function is used to create a vector?

A) vector()
B) c()
C) create_vector()
D) list()

B) c()

How can you access the second element of a vector named v?

A) v[2]
B) v(2)
C) v{2}
D) v<2>

A) v[2]

What does sample(v, size=3, replace=TRUE) do?
- A) Selects three unique elements from v
- B) Selects three elements from v with replacement
- C) Replaces three elements in v
- D) Sorts v in ascending order

B) Selects three elements from v with replacement

What function is used to count occurrences of a categorical variable?
- A) summary()
- B) count()
- C) table()
- D) unique()

C) table()

Which function generates a bar plot for a categorical variable?
- A) hist()
- B) plot()
- C) barplot()
- D) scatter()

C) barplot()

What function is used to create a histogram in R?
- A) hist()
- B) barplot()
- C) scatterplot()
- D) pie()

A) hist()

What parameter in boxplot() determines the color of the boxes?
- A) color
- B) fill
- C) col
- D) boxcolor

C) col

Which function is used to apply a transformation to every element of a vector?
- A) lapply()
- B) apply()
- C) transform()
- D) mutate()

A) lapply()

What does scale() do in R?
- A) Standardizes data to have mean 0 and standard deviation 1
- B) Converts data into categorical form
- C) Normalizes data between 0 and 1
- D) Computes the logarithm of the data

A) Standardizes data to have mean 0 and standard deviation 1

What function calculates Pearson’s correlation in R?
- A) pearson()
- B) cor()
- C) lm()
- D) cov()

B) cor()

In linear regression using lm(y ~ x), what does y represent?
- A) The dependent variable
- B) The independent variable
- C) The correlation coefficient
- D) The residual error

A) The dependent variable

Which function is used to perform a t-test in R?
- A) t.test()
- B) test.t()
- C) hypothesis()
- D) stat.test()

A) t.test()

What does a p-value less than 0.05 indicate?
- A) Strong evidence against the null hypothesis
- B) The null hypothesis is true
- C) The data is normally distributed
- D) No conclusion can be made

A) Strong evidence against the null hypothesis

Which function is used to perform an ANOVA test in R?
- A) chisq.test()
- B) aov()
- C) anova_test()
- D) lm()

B) aov()

The Chi-square test is used to analyze which type of data?
- A) Continuous
- B) Categorical
- C) Binary
- D) Ordinal

B) Categorical

Which function in R is commonly used for PCA?
- A) pca.test()
- B) prcomp()
- C) factor.pca()
- D) pc.analysis()

B) prcomp()

What does PCA aim to do?
- A) Standardize the data
- B) Reduce dimensionality
- C) Perform hypothesis testing
- D) Normalize categorical variables

B) Reduce dimensionality

Which function is used for hierarchical clustering in R?
- A) cluster.hclust()
- B) hclust()
- C) hier.cluster()
- D) dendro.cluster()

B) hclust()

K-means clustering requires specifying which parameter?
- A) The number of variables
- B) The number of clusters (k)
- C) The correlation matrix
- D) The Euclidean distance

B) The number of clusters (k)

When should you use a one-sample t-test?
- A) When comparing the mean of a sample to a known population mean
- B) When comparing two independent samples
- C) When analyzing paired observations
- D) When testing for categorical relationships

A) When comparing the mean of a sample to a known population mean

When should you use a paired t-test?
- A) When comparing two measurements from the same subjects
- B) When comparing two independent samples
- C) When testing for correlation
- D) When testing multiple variables simultaneously

A) When comparing two measurements from the same subjects

Which function is used to perform an ANOVA test in R?
- A) chisq.test()
- B) aov()
- C) anova_test()
- D) lm()

B) aov()

The Chi-square test is used to analyze which type of data?
- A) Continuous
- B) Categorical
- C) Binary
- D) Ordinal

B) Categorical

When should you use a one-way ANOVA?
- A) When comparing means across more than two groups
- B) When analyzing paired data
- C) When comparing two independent groups
- D) When testing for correlation

A) When comparing means across more than two groups

What is the null hypothesis in a Chi-square test?
- A) There is no association between categorical variables
- B) The means of multiple groups are equal
- C) The dataset follows a normal distribution
- D) The correlation coefficient is zero

A) There is no association between categorical variables

When should you use the Wilcoxon Rank-Sum Test instead of a t-test?
- A) When data is not normally distributed
- B) When testing for correlation
- C) When comparing means across more than two groups
- D) When working with categorical data

A) When data is not normally distributed

What is the Kruskal-Wallis test used for?
- A) Comparing more than two independent groups when data is not normally distributed
- B) Testing for correlation
- C) Analyzing paired data
- D) Checking for multicollinearity

A) Comparing more than two independent groups when data is not normally distributed

What does the Mann-Whitney U test compare?
- A) Differences between two independent groups when normality cannot be assumed
- B) Differences between paired samples
- C) The variance of multiple groups
- D) The presence of outliers

A) Differences between two independent groups when normality cannot be assumed

What is the purpose of logistic regression?
- A) Predicting categorical outcomes
- B) Predicting continuous outcomes
- C) Testing for normality
- D) Comparing means across groups

A) Predicting categorical outcomes

When should you use multiple linear regression?
- A) When predicting a continuous outcome based on multiple predictor variables
- B) When analyzing a categorical dependent variable
- C) When testing for correlation between two variables
- D) When checking for normality

A) When predicting a continuous outcome based on multiple predictor variables

What is multicollinearity in regression analysis?
- A) A situation where predictor variables are highly correlated with each other
- B) A method to check if residuals are normally distributed
- C) A test for model accuracy
- D) A measure of goodness-of-fit

A) A situation where predictor variables are highly correlated with each other

Which statistical test should you use to compare the means of two independent groups?
- A) Paired t-test
- B) Independent t-test
- C) One-way ANOVA
- D) Chi-square test

B) Independent t-test

If you want to test whether a dataset follows a normal distribution, which test should you use?
- A) Chi-square test
- B) One-way ANOVA
- C) Shapiro-Wilk test
- D) Mann-Whitney U test

C) Shapiro-Wilk test

Which function is used to perform an ANOVA test in R?
- A) chisq.test()
- B) aov()
- C) anova_test()
- D) lm()

B) aov()

The Chi-square test is used to analyze which type of data?
- A) Continuous
- B) Categorical
- C) Binary
- D) Ordinal

B) Categorical

You want to compare the means of three different independent groups. Which test should you use?
- A) Independent t-test
- B) One-way ANOVA
- C) Chi-square test
- D) Wilcoxon test

B) One-way ANOVA

What statistical test should be applied to analyze the relationship between two categorical variables?
- A) Independent t-test
- B) One-way ANOVA
- C) Chi-square test
- D) Pearson correlation

C) Chi-square test

When should you use the Wilcoxon Rank-Sum Test instead of a t-test?
- A) When data is not normally distributed
- B) When testing for correlation
- C) When comparing means across more than two groups
- D) When working with categorical data

A) When data is not normally distributed

What is the Kruskal-Wallis test used for?
- A) Comparing more than two independent groups when data is not normally distributed
- B) Testing for correlation
- C) Analyzing paired data
- D) Checking for multicollinearity

A) Comparing more than two independent groups when data is not normally distributed

You have two independent groups with non-normally distributed data. Which test should you use?
- A) Independent t-test
- B) Mann-Whitney U test
- C) One-way ANOVA
- D) Paired t-test

B) Mann-Whitney U test

When should you use multiple linear regression?
- A) When predicting a continuous outcome based on multiple predictor variables
- B) When analyzing a categorical dependent variable
- C) When testing for correlation between two variables
- D) When checking for normality

A) When predicting a continuous outcome based on multiple predictor variables

What is multicollinearity in regression analysis?
- A) A situation where predictor variables are highly correlated with each other
- B) A method to check if residuals are normally distributed
- C) A test for model accuracy
- D) A measure of goodness-of-fit

A) A situation where predictor variables are highly correlated with each other

You want to measure the relationship between two continuous variables. Which test should you use?
- A) One-way ANOVA
- B) Independent t-test
- C) Pearson correlation
- D) Chi-square test

C) Pearson correlation

If you suspect that the relationship between two variables is not linear, which correlation test should you use?
- A) Pearson correlation
- B) Spearman correlation
- C) Chi-square test
- D) Shapiro-Wilk test

B) Spearman correlation

You are comparing the means of two independent groups with small sample sizes. Which assumption must be met for using an independent t-test?
- A) Normality of data in both groups
- B) Data must be categorical
- C) Groups must be paired
- D) The number of samples must be equal

A) Normality of data in both groups

If you want to test whether a dataset follows a normal distribution, which test should you use?
- A) Chi-square test
- B) One-way ANOVA
- C) Shapiro-Wilk test
- D) Mann-Whitney U test

C) Shapiro-Wilk test

You are comparing means across four different groups. If your ANOVA result is significant, what should be your next step?
- A) Conclude all group means are different
- B) Perform post-hoc tests (e.g., Tukey's test)
- C) Conduct a Pearson correlation test
- D) Recalculate p-values with a lower alpha

B) Perform post-hoc tests (e.g., Tukey's test)

You are analyzing survey responses with two categorical variables. Which statistical test is appropriate?
- A) Independent t-test
- B) One-way ANOVA
- C) Chi-square test
- D) Spearman correlation

C) Chi-square test

You suspect that your two groups do not follow a normal distribution. What is the best alternative to an independent t-test?
- A) Mann-Whitney U test
- B) One-way ANOVA
- C) Pearson correlation
- D) Wilcoxon signed-rank test

A) Mann-Whitney U test

You need to compare medians of two related samples. Which test should you use?
- A) Independent t-test
- B) One-way ANOVA
- C) Wilcoxon signed-rank test
- D) Pearson correlation

C) Wilcoxon signed-rank test

You need to predict whether customers will buy a product (yes/no) based on various predictor variables. Which statistical method should you use?
- A) Linear regression
- B) One-way ANOVA
- C) Logistic regression
- D) Pearson correlation

C) Logistic regression

You are performing a regression analysis, but your residuals show strong heteroscedasticity. What should you do?
- A) Transform the dependent variable or use robust regression
- B) Ignore the issue if p-values are significant
- C) Apply a Pearson correlation test instead
- D) Remove outliers until residuals are homogeneous

A) Transform the dependent variable or use robust regression

You suspect collinearity among predictor variables in your regression model. How can you check for it?
- A) Calculate Variance Inflation Factor (VIF)
- B) Perform a t-test on each predictor
- C) Run a Shapiro-Wilk test
- D) Use a Chi-square test

A) Calculate Variance Inflation Factor (VIF)

You want to measure the relationship between two continuous variables. Which test should you use?
- A) One-way ANOVA
- B) Independent t-test
- C) Pearson correlation
- D) Chi-square test

C) Pearson correlation

If you suspect that the relationship between two variables is not linear, which correlation test should you use?
- A) Pearson correlation
- B) Spearman correlation
- C) Chi-square test
- D) Shapiro-Wilk test

B) Spearman correlation

How do you combine data frames?

D = rbind(D, new_students)

How do you create an new ID column in a data frame

ID = paste0("ID_”, c(1:nrow(D)))

cbind(ID,D)

ID = paste0("ID_", c(1:nrow(D)))

What does D$ID = ID do?

will add an ID column but as the last column

How do you combine two data frames?

M = merge(D1, D2, by=”ID”)

this only adds rows that are in both df, with all = TRUE everything will be combined

how do you store a df and export it?

write.table(D, “my_dataframe.csv”, sep=”;”, col.names=True, row.names=False)

How do you import data

D = read.table(“name.csv”, sep =”;”, header=True)

What types of correlations can be applied?

pearson, spearman and kendall

What does the Pearson corralation assume?

A) Assumes a monotone relationship but makes no assumption about the distribution of the variables

B)Assumes linearity, a monotone relationship and that both variables are normally distributed

C)No assumptions about distribution or relationship (can work with ties)

What does the Kendall corralation assume?

A) Assumes a monotone relationship but makes no assumption about the distribution of the variables

B)Assumes linearity, a monotone relationship and that both variables are normally distributed

C)No assumptions about distribution or relationship (can work with ties)

What does the spearman corralation assume?

A) Assumes a monotone relationship but makes no assumption about the distribution of the variables

B)Assumes linearity, a monotone relationship and that both variables are normally distributed

C)No assumptions about distribution or relationship (can work with ties)

What is inferential statistics?

• Inferential statistics uses the sample to draw conclusions about the population

When should the p-value get smaller?

The p value gets smaller with:

- Greater difference of the means of the two groups

- Smaller variances in the groups

- Higher sample sizes

what is the p-value?

the p value describes the probability to retrieve such or more extreme results if the null hypothesis (no association between the two variables) is true

Which test needs to be done if you want to analyse multiple groups, that are normally distributed and have equal variance?

Anova and Tukeys HSD test

Which test when:

multiple groups

normal distribution
not equal variance

Welchs Anova and Games- Howell test

Which test when:

two groups

normal distribution
not equal variance

Welch test

Which test when:

two groups

no normal distribution
not equal variance

Mann Whitney U test

Which test when:

multiple groups

no normal distribution
not equal variance

Kruskal Wallis Test and Dunn test

How do you check for normal distribution?

with the shapiro wilk test

How do you check for equal variance?

bartlett.test

Beitreten

Vorschau

Author

Maja L.

Informationen

Zuletzt geändert
vor 5 Monaten

Kurs melden

Lectures

Author

Maja L.

Informationen