Data
Is the information we gather with experiments and with surveys
Variables
The characteristic that varies from one person or thing to another and is observed for the subjects in a study.
Population
The collection of all individuals or items of interest
Sample
The part of the population from which the data is obtained.
Inferential statistics
Involves making decisions or predictions about a population based on information obtained from a sample of that population.
Inferential statistics
Then you use __________ to make predictions about the population.
Descriptive statistics
Involves gathering, organizing and summarizing data.
Parameter
A numerical summary of the population. ______ values are usually unknown.
Statistic
A numerical summery of the sample
Simple random sampling
Randomness is crucial to insuring that the sample is representative of the population so that powerful inferences can be made. Each subject in the population has the same chance of being included in the sample.
Quantitative
Numerical; Measure how much of something; ex) age, IQ, GPA, height, weight.
Descriptive statistics
involves gathering, organizing and summarizing data.
Inferential statistics
involves making decisions or predictions about a population based on information obtained from a sample of that population.
Sample
the part of the population from which the data is obtained.
Population
The collection of all individuals or items of interest
Population Parameter
a numerical summary of the population. ______ values are usually unknown.
Statistic Sample
a numerical summary of the sample.
Simple Random Sampling
Randomness is crucial to insuring that the sample is representative of the population so that powerful inferences can be made. Each subject in the population has the same chance of being included in the sample.
Categorical data
Non-numerical; some numeric data could be classified (e.
g. phone#, zip code, year born). Ex) Blood type, gender, majr, dating status.
Quantitative Data
Numerical; Measures how much of something. Ex) Age, IQ, GPA, height, weight.
Discrete Data
possible values form a set of separate numbers such as 0,1,2, etc. A _____ variable is usually a count. Ex) The number of pets in a household, the number of children in a family
Continuous Data
possible values form an interval of numbers like [0,10]. __________ variables have an infinite number of values. Ex) time,height,weight,age.
Histogram
A _________ uses bars to portray the frequencies or the relative frequencies of the possible outcomes for a quantitative variable.
Pie Chart
A circle having a " slice of the pie " for each category. The size of a slice corresponds to the percentage of observations in the category. USED FOR CATEGORICAL VARIABLES.
Bar Graph
Displays a vertical bar for each category.
The height of the bar is the percentage of observations in the category. IT IS EASIER TO COMPARE CATEGORICAL VARIABLES W/ a ____ GRAPH.
Symemetric
The side of the distribution below a central value is a mirror image of the side above that central value.
Left skewed
left tail is longer than the right tail
Right skewed
right tail is linger than the left tail
Mean
The sum of the observations divided by the number of observations.
The "average". X Bar
Median
the midpoint of the observations when they are ordered from smallest to largest. the point that splits the data in two, half the data below it and half the data above it.
Outlier
an observation that falls well above or well below the overall bulk of the data; the mean can be highly influenced by an _____.
Resistant
A numerical summary of the observations is called _________ if extreme observations have little, if any, influence on its value. Median, IQR, 1st & 3rd quartiles. UNAFFECTED BY OUTLIERS
Symmertrical
mean=median
Right-skewed
mean is larger than the median
Left-skewed
mean is smaller than the median
median
if a distribution is very highly skewed, the ______ is usually preferred over the mean because it better represents what is typical.
mean
if the distribution is close to symmetric or only mildly slewed, the ______ is usually preferred because it uses the numerical values of all the observations.
mode
is the value that occurs most frequently. there can be more than one mode; Is the highest bar in the histogram; Is most often used with categorical data.
standard deviation
gives a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of the deviations; Describes how far the data fall from the mean. It is the most important measure of spread. The symbol for the _____________ of a sample is 's'.
(sx- sample standard, ox- population standard)
Range
Maximum observation minus minimum observstion
percentiles
The pth ______ is a value such that p percent of the observations fall below or at the value. Three useful _______ are the quartiles.
Quartiles
______ split the distribution into four parts, each containing one quarter (25%) of the observations.
Inner quartile range(IQR)
is the distance between the third and first quartiles.
=Q3-Q1. Gives the spread of the middle 50% of the data.
z-score
data can be standardized so that different data sets can be compared or to compare values within the same data set; is the number of standar deviations that it falls from the mean. z= obeservation-mean/standard deviation. MOST z-scored will fall between -3 and 3.
Response variable
the dependent variable, the y-variable, the outcome variable. Ex.) Blood alcohol level/Beers consumed
Explanatory variable
the independent variable also known as the predictor variable; the x-variable. Ex) Grade on test/Amount of study time
upper & lower limits
An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.
5 x IQR above the third quartile.
non-resistant
mean(average), range, SD, Correlation, measure of spread
linear correlation
The quantity r, called the ____________ coefficient, measures the strength and the direction of a linear relationship between two variables; Sometimes referred to as the Pearson product. Takes a value between -1 and 1.
regression equation