What is statistics?

Statistics is the science of collecting, organizing, presenting and interpreting data

In the field of statistics one learns from data

For what is statistics useful and what is the common procedure of a statistical task?

Statistics enables Exploration and visualization of large and complicated datasets

Statistics compresses data to extract useful information and summarize data

Statistics models real world applications (e.g. radioactive decay)

Statitistics estimates and predict unknown parameters or quantities

Statistics tests research questions and hypotheses

Common precedure

Explore

Summarize

Model

Estimate

Test

Why is it important to learn statistics?

Solving your own statistical problems

Understanding statistical methods in scientific papers

Being comfortable and competent around data and uncertainty

Statistics is the foundation of scientific research and part of our daily life

How does the media use statistics to twist ir hide facts?

Presenting polls which sum up to more than 100%

Asking unprecise or non-related questions in polls

Present graphs with incoherent time intervals

Comparing statistical maps with different scales of the categories

Presenting data in manipulating scales

Confusing correlation and causation

Turning graphs upside-down

What is data in statistics?

Data is referring to numerical facts

What is a model in statistics?

A model is a system of assumptions and equations that describes the data you are interested in

What is statistical hypotheses testing?

Statistical hypothesis testing is the use of data in deciding between different possibilities

What are the two main categories in statistics and how do they differ?

Descriptive statistics (empirical statistics)

Given data is described and summerized to gain more information

Typical descriptive methods are tables, graphs, charts and summerizing statistics

Inductice statistics (mathematical/inferential statistics)

Given data is used to predict or answer research questions

Draw conclusions from a sample and generalize them to a population

Propability theory is often used with inductive statistics

What are the two key elements of combinatarics?

Perutation: How many possibilities exist to arrange n elements in different sequences?

Combination: How many possibilities exist to select k elements from a set of n elements?

What is important regarding data collection and what two different ways of data collection exist?

Important for data collecting

Data collecting should be objective (independent of the person, who is collecting the data)

Data collecting should be valid (precise measurement of what is needed)

Data collecting should be reliable (it should be replicable under constant conditions)

Two different ways of collecting data

Primary Data („Field Research“): firsthand collection of data by a researcher through observations, experiments or surveys

Secondary Data („Desk Research“): data has already been collected by someone else (e.g. government organizations) and is available (e.g. through publications, journals, newspapers, …)

What is basic terminology in respect of statistics?

Empirical population: a finite set of objects, which are clearly (spatially, temporally, objectively) defined, e.g. the students which are sitting in HS7 at 13:00

Sample: a selection of objects from a population, e.g. the students who sit in the first row

Obersivational unit: entity, whose characteristics are measured, e.g. the students which grade should be statistically anylized

Attribute: is a characteristic or feature, which is measured for each observational unit, e.g. the grade of each student

Attribute value: the specific measured or observed value or the specific characteristic of an object, e.g. each student has one grade in the range of 1-5

Parameters: the „true values“ of a population, which can be estimated by a sample statistic, e.g. based on a sample of students, it is estimated that the average grade is 2.5

What are the different levels of measurements (scales of measurements) and how do they differ?

Nominal data: only categories with no meaningful order, e.g. color, gender, origin

Ordinal data: meaningful order, ranking according to this order is possible and can be used to analyze the data, e.g. job classification, bond rating, school grade

Quantitave: data is observed/counted or measured („numbers with a scale unit“)

quantitative-discrete data: only values from a fixed list of numbers can be assumed, point on the number-line

quantitative-continuous data: all values from a „continuum“ are possible, interval on the number-line (e.g. measuring the weight of an apple in gram ➔ theoretically the weight can be measured with an infinite precision)

What is the key difference between probability and statistics?

Statistics: Presentation of the data and generalization of the data to the „real world“

Probability: What if we know how the world works? What kind of data and results can we expect?

What is important to consider regarding the quality of data collection?

Does the source of the data make money on it?

Is the raw data available?

Are the respondents selected at random?

Does the interviewer use suggestive questions?

Does an independent confirmation exist?

What is the difference between disjunct and complete attributes?

Based on an attribute, the population can be divided into classes so that this classification:

disjunct, i.e. no object may fall into several classes

complete, i.e. each element must fall into exactly one class

What is the difference between interval and ration data?

Interval Scale: zero point is defined subjectively (e.g. calendar date, …) only addition and subtraction are possible

Ratio Scale: zero point is defined objectively (e.g. scale units in physics, …) addition, subtraction, multiplication and division are possible

How are the following terminologies in statistics defined?

Multivariate data

Raw data list

Stock data

Flow data

Multivariate data: Contrary to univariate data more pieces of information of an object are recorded and analyzed simultaneously (e.g. height and weight)

Raw data list: Original uncompressed recording of all information regarding a population

Stock data: Data is measured at one specific time point and represents a quantity existing at that point in time

Flow data: Data is measured over an interval of time

What is the difference between intensive and extensive data?

Extensive data: The sum of all the data leads to useful information, e.g. all tech companies in the US combined have a 6 Billion income

Intensive data: The sum of all the data leads to useless information, but the average of this some contains useful information. e.g. the average height of a Google employee is 1.82 cm

How is the absolute frequency defined?

Absolute frequency: Number of times that a specific attribute value occurs in a population, which is divided into classes by this attribute

All absolute frequency sum up to the size of the population

How is the relative frequency defined?

Relative frequency: Result of dividing the absolute frequency of a specific attribute value by the size of the total population. The relative frequency is the absolute frequency normalized by the total number of events

All relative frequencies sum up to 100%

How is the cummulative frequency defined?

Cumulative frequency: Sum of the absolute frequencies of all attribute values less than or equal to a specific attribute value. If the relative frequencies are used instead the absolute frequencies the result is called relative cumulative frequency

Cumulative frequencies provide only useful information, if the data has at least ordinal scale

All cumulative frequencies sum up to the size of the population

Last changed2 months ago