Statistics, Science, and Observations



 Science
 Science is based on the empirical method for making
observations  for systematically obtaining information.
It consists of methods for making observations.
 Observations
 Observations are the basic empirical "stuff" of science.
 Statistics
 Statistics is a set of methods and rules for organizing,
summarizing and interpreting information.
The methods and rules enable scientific researchers
to describe and analyze the observations they have made.
Statistical methods are tools for science.
Science consists of methods for making observations;
Statistics consists of methods for describing and analyzing
the observations.
Here are some of the "observations" we gathered in the
survey we did on the first day of
class in 1997 and 1998.
Populations & Samples
 Populations
 A population is the set of all individuals of interest
in a particular study. We will also refer to populations
of scores.
 Samples
 A sample is a set of individuals selected from a population,
usually intended to represent the population in a study.
We will also refer to samples of scores.
The data we gathered in class are a "sample" of scores
obtained with a sample of individuals. The population
we sampled from is the population of UNC undergraduates.
 Parameters
 A Parameter is a value, usually a numerical
value, that describes a Population. A Parameter
may be obtained from a single measurement, or it may be
derived from a set of measurements from the Population.
 Statistics
 A Statistic is a value, usually a numerical
value, that describes a Sample. A Statistic
may be obtained from a single measurement, or it may be
derived from a set of measurements from the Sample.
Here are some "statistics" computed from our sample
of data:
 Data
 Data (plural) are measurements or observations. A data
set is a collection of measurements or observations.
A datum (singular) is a single measurement or observation
and is commonly called a datavalue, a score, or a raw
score.
 Descriptive Statistics
 Descriptive Statistics are statistical procedures used
to summarize, organize and simplify data. It is also the
branch of statistical activity focusing on the use of
such procedures. These procedures are the focus of chapters
1 through 5.
 Statistical Visualization
 Recently developed computational statistical procedures
used to visually summarize, organize and simplify data.
The statistical system we are using is named ViSta for
"Visual Statistics", because it includes statistical visualiation.
A statistical visualization of our data is shown below.
It shows the relationship between GPA and Satisfaction
with the UNC experience. Higher satisfaction is associated
with higher GPA.
 Exploratory Statistics
 The process of exploring data by using descriptive and
visualization methods to "see what the data seem to say".
The branch of statistics that focuses on "seeing what
the data seem to say" (Tukey, 19??).
 Inferential Statistics
 Inferential Statistics consist of techniques that allow
us to study samples and then to make generalizations about
the populations from which the samples were selected.
It is also the branch of statistical activity focusing
on the use of such procedures. These procedures are the
focus of chapters 8 through the remainder of the text.
The groundwork for statistical inference is laid in chapters
6 and 7.
 Sampling Error
 Sampling error is the discrepency, or amount of error,
that exists between a sample statistic and the corresponding
population parameter.
The Scientific Method and the Design of Experiments
Science attempts to discover orderliness in the universe 
to discover regularity in changes. Something that can change
is called a variable.
 Variables
 A variable is a characteristic or condition that changes
or has different values for different individuals. In
the data we gathered, the variables include "Gender",
"Age", etc.
A constant is a characteristic or condition
that does not vary, and is the same for every individual.
 The Correlational Method
 The scientific method in which two (or more) variables
are observed without manipulation (i.e., as they exist
naturally) to see if there is any relationship between
them.
The correlational method cannot establish causeandeffect:
Correlation is not causation!
The data we gathered are an example of the correlational
method. We can say that "Higher satisfaction is associated
with higher GPA", but we can't say that "Higher GPA
causes higher satisfaction" (or the converse).
 The Experimental Method
 The scientific method which can establish a causeandeffect
relationship between two (or more) variables. Some important
points:
 The researcher manipulates one variable and
observes what happens on the other. More than one
variable may be manipulated or observed.
 To correctly establish causeandeffect, the researcher
must exercise some control over the experimental
situation to ensure that some other variable(s) do(es)
not influence the relationship being watched.
 Random Assignment can be used to eliminate
other variables' influence on results.
 The experimental conditions must be identical,
other than differing on values of the manipulated
variable.
 Independent Variable (also called the predictor
variable)
 The variable which is manipulated by the researcher.
 Dependent Variable (also called the response
variable)
 The variable which is observed by the researcher
for changes in order to access the effect of the treatment.
(The treatment is the manipulation of the predictor
variable).
 Confounding Variable
 An uncontrolled variable that is unintentionally
allowed to vary systematically with the independent
variable. Confounds the results (bad, bad, bad!).
 The control group
 This is a condition of the independent variable
that does not receive the experimental treatment.
Usually, the control group receives either no treatment
or a placebo treatment.
 The experimental group
 This is a condition of the independent variable
that does receive an experimental treatment.
There may be several experimental groups.
 The QuasiExperimental Method
 Examines differences between preexisting groups of
subjects (such as men vs. women) or differences between
groups of scores obtained at different times (before and
after treatment).
 Hypotheses
 A hypothesis is a prediction about the outcome of an
experiment. In experimental research, a hypothesis makes
a prediction about how the manipulation of the independent
(predictor) variable will affect the dependent (response)
variable.
Measurement
Data are measurements of observations which involve categorizing,
ordering or using number to characterize amount. Several levels
of measurement are involved. These in turn determine what
statistics can be computed. Measurements may also be discrete
or continuous.
 Scales (Levels) of Measurement
 Nominal
 The nominal level of measurement labels observations
so that they fall into different categories. Football
jersey numbers and home street addresses are common
examples.
In ViSta, nominal variables are called "Category"
variables.
 Ordinal
 The ordinal level of measurement consists of categories
that are ordered in a sequence. Order of finish in a
race is a common example.
In ViSta, ordinal variables are called "Ordinal"
variables.
 Interval
 The interval level of measurement consists of ordered
categories where all of the categories are intervals
of exactly the same size. Temperature is a common example.
Here, equal differences between numbers reflect equal
differences in magnitude of the observed variable.
 Ratio
 The ratio level of measurement is an interval scale
with an absolute zero point. Length and weight are common
examples. Here, ratio of numbers reflect ratios of variable
magnitude.
In ViSta, interval and ratio variables are called
"Numeric" variables.
 Discrete and Continuous Variables
 Discrete
 A discrete variable has separate, indivisible categories.
No values can exist in between two neighboring categories.
 Continuous
 A continuous variable has an infinite number of possible
values falling between any two observed values.
Mathematical Notation
In statistical calculations you will constantly be required
to add a set of values to find a specific total. We use algebraic
expressions to represent the values being added. For example
X means "Scores on a Variable.
For example X = [1 2 3] refers to a variable with three
observations which are 1, 2, and 3."
We will use the greek letter Sigma to signify the summation
process. Thus, we write
Note that
 All calculations within parentheses are done first.
 Squaring, multiplying, and dividing are done second,
and should be completed in order from left to right.
 Adding and subtracting (including summation) are third,
and should be completed in order from left to right.
The following term, which is called the "squared sum"
works as shown:
Because of the order of operations, the following term,
which is called "the sum of squares", works as shown:
Consider how the following summation equation works:
On the other hand, the next summation equation works differently:
Finally, consider how this last summation equation works:
