Correlation Indices
and Scatterplots
-
- Definition of Correlation:
Correlation is a statistical technique that is used
to measure and describe the STRENGTH and
DIRECTION of the relationship between two
variables.
- Correlation requires two scores from the SAME
individuals. These scores are normally identified
as X and Y. The pairs of scores can be listed in
a table or presented in a scatterplot. Usually the
two variables are observed, not manipulated.
|
- Definition of a Scatterplot:
A scatterplot is a statistical graphic that displays
the STRENGTH, DIRECTION and SHAPE
of the relationship between two variables.
- A scatterplot requires two scores from the SAME
individuals. These scores are normally identified
as X and Y. A scatterplot displays the X variable
on the horizontal (X) axis, and the Y variable on
the vertical (Y) axis. Each individual is identified
by a single point (dot) on the graph which is located
so that the coordinates of the point (the X and
Y values) match the individual's X and Y scores.
|
- Example: Consider the correlation
between the SAT-M scores and GPA of the 1997 Psych 30
class. Here are the Math SAT scores and the GPA scores
of 13 of the students in the class, and the scatterplot
for all 41 students:
- Scatterplot: The scatterplot
has the X variable (GPA) on the horizontal (X) axis, and
the Y variable (MathSAT) on the vertical (Y) axis. Each
individual is identified by a single point (dot) on the
graph which is located so that the coordinates of the
point (the X and Y values) match the individual's X (GPA)
and Y (MathSAT) scores.
- For example, the student named "Obs5" (in the sixth
row of the datasheet) has GPA=2.30 and MathSAT=710. This
student is represented in the scatterplot by high-lighted
and labled ("5") dot in the upper-left part of the scatterplot.
Note that is to the right of MathSAT of 710 and above
GPA of 2.30.
- Pearson Correlation:The Pearson
correlation (explained below) between these two variables
is .32.
- Correlations and Scatterplots:
Correlations can tell us about the direction,
and the degree (strength) of the relationship
between two variables. Scatterplots can also tell
us about the form (shape) of the relationship.
|
- The Direction of a Relationship The correlation
measure tells us about the direction of the relationship
between the two variables. The direction can be positive
or negative.
- Positive: In a positive relationship both
variables tend to move in the same direction: If one
variable increases, the other tends to also increase.
If one decreases, the other tends to also.
In the example above, GPA and MathSAT are positively
related. As GPA (or MathSAT) increases, the other
variable also tends to increase.
- Negative: In a negative relationship the
variables tend to move in the opposite directions:
If one variable increases, the other tends to decrease,
and vice-versa.
The direction of the relationship between two variables
is identified by the sign of the correlation coefficient
for the variables. Postive relationships have a "plus"
sign, whereas negative relationships have a "minus"
sign.
- The Degree (Strength) of a Relationship
A correlation coefficient measures the degree (strength)
of the relationship between two variables. The Pearson
Correlation Coefficient measures the strength of the
linear relationship between two variables. Two
specific strengths are:
- Perfect Relationship: When two variables
are exactly (linearly) related the correlation coefficient
is either +1.00 or -1.00. They are said to be perfectly
linearly related, either positively or negatively.
- No relationship: When two variables have
no relationship at all, their correlation is 0.00.
There are strengths in between -1.00, 0.00 and +1.00.
Note, though. that +1.00 is the largest postive correlation
and -1.00 is the largest negative correlation that is
possible.
Examples: Here are three examples. These examples
concern variables measuring characteristics of automobiles.
The variables are their weight, miles-per-gallon, horsepower
and drive ratio (number of revolutions of the engine
per revolution of the wheels).
Weight and Horsepower
|
The relationship between Weight and Horsepower
is strong, linear, and positive, though not
perfect.
The Pearson correlation coefficient is
+.92.
|
|
|
Drive Ratio and Horsepower
|
The relationship between drive ratio and
Horsepower is weekly negative, though not
zero.
The Pearson correlation coefficient is
-.59.
|
|
|
Drive Ratio and Miles-Per-Gallon
|
The relationship between drive ratio and
MPG is weekly positive, though not zero.
The Pearson correlation coefficient is
.42.
|
|
|
- Scatterplots and The Form (Shape) of a Relationship:
The form or shape of a relationship refers to whether
the relationship is straight or curved.
- Linear: A straight relationship is called
linear, because it approximates a straight
line. The GPA, MathSAT example shows a relationship
that is, roughly, a linear relationship.
- Curvilinear: A curved relationship is called
curvilinear, because it approximates a curved
line. An example of the relationship between the Miles-per-gallon
and engine displacement of various automobiles sold
in the USA in 1982 is shown below. This is curvilinear
(and negative).
Miles-per-gallon and engine displacement
|
The relationship between Miles-per-gallon
and engine displacement is strongly positive,
but curvilinear.
The Pearson correlation coefficient is
not appropriate.
|
|
|
The Pearson correlation coefficient is only appropriate
as a measure of linear relationship. We will see other
correlation coefficients that measure curvilinear relationship.
- Where & Why we use Correlation:
Correlations are used for Prediction, Validity,
Reliability, and Verification.
|
- Prediction: Correlations can be used to help
make predictions. If two variables have been known in
the past to correlate, then we can assume they will continue
to correlate in the future. We can use the value of one
variable that is known now to predict the value that the
other variable will take on in the future.
For example, we require high school students to take
the SAT exam because we know that in the past SAT scores
correlated well with the GPA scores that the students
get when they are in college. Thus, we predict high
SAT scores will lead to high GPA scores, and conversely.
- Validity: Suppose we have developed a new test
of intelligence. We can determine if it is really measuring
intelligence by correlating the new test's scores with,
for example, the scores that the same people get on standardized
IQ tests, or their scores on problem solving ability tests,
or their performance on learning tasks, etc.
This is a process for validating the new test of intelligence.
The process is based on correlation.
- Reliability: Correlations can be used to determine
the reliability of some measurement process. For example,
we could administer our new IQ test on two different occasions
to the same group of people and see what the correlation
is. If the correlation is high, the test is reliable.
If it is low, it is not.
- Theory Verification: Many Psychological theories
make specific predictions about the relationship between
two variables. For example, it is predicted that parents
and children's intelligences are positively related. We
can test this prediction by administering IQ tests to
the parents and their children, and measuring the correlation
between the two scores.
|