| 
                  
                    Correlation Indices 
                      and Scatterplots  
                       
                   
                    
                    
                    
                      
                        |  
                            Definition of Correlation:Correlation is a statistical technique that is used 
                              to measure and describe the STRENGTH and 
                              DIRECTION of the relationship between two 
                              variables.
  
                             Correlation requires two scores from the SAME 
                              individuals. These scores are normally identified 
                              as X and Y. The pairs of scores can be listed in 
                              a table or presented in a scatterplot. Usually the 
                              two variables are observed, not manipulated. 
                           |  
 
                      
                        |  
                             
                             Definition of a Scatterplot:A scatterplot is a statistical graphic that displays 
                              the STRENGTH, DIRECTION and SHAPE 
                              of the relationship between two variables.
  
                            A scatterplot requires two scores from the SAME 
                              individuals. These scores are normally identified 
                              as X and Y. A scatterplot displays the X variable 
                              on the horizontal (X) axis, and the Y variable on 
                              the vertical (Y) axis. Each individual is identified 
                              by a single point (dot) on the graph which is located 
                              so that the coordinates of the point (the X and 
                              Y values) match the individual's X and Y scores. 
                           |   
                      Example: Consider the correlation 
                        between the SAT-M scores and GPA of the 1997 Psych 30 
                        class. Here are the Math SAT scores and the GPA scores 
                        of 13 of the students in the class, and the scatterplot 
                        for all 41 students: 
                         
                          
                       Scatterplot: The scatterplot 
                        has the X variable (GPA) on the horizontal (X) axis, and 
                        the Y variable (MathSAT) on the vertical (Y) axis. Each 
                        individual is identified by a single point (dot) on the 
                        graph which is located so that the coordinates of the 
                        point (the X and Y values) match the individual's X (GPA) 
                        and Y (MathSAT) scores. 
                         
                       For example, the student named "Obs5" (in the sixth 
                        row of the datasheet) has GPA=2.30 and MathSAT=710. This 
                        student is represented in the scatterplot by high-lighted 
                        and labled ("5") dot in the upper-left part of the scatterplot. 
                        Note that is to the right of MathSAT of 710 and above 
                        GPA of 2.30. 
                         
                      Pearson Correlation:The Pearson 
                        correlation (explained below) between these two variables 
                        is .32. 
                      
                     
 
                      
                        | Correlations and Scatterplots:Correlations can tell us about the direction, 
                            and the degree (strength) of the relationship 
                            between two variables. Scatterplots can also tell 
                            us about the form (shape) of the relationship.
 |  
  
                     
                      The Direction of a Relationship The correlation 
                        measure tells us about the direction of the relationship 
                        between the two variables. The direction can be positive 
                        or negative. 
                        
                           
                           Positive: In a positive relationship both 
                            variables tend to move in the same direction: If one 
                            variable increases, the other tends to also increase. 
                            If one decreases, the other tends to also. 
                             In the example above, GPA and MathSAT are positively 
                              related. As GPA (or MathSAT) increases, the other 
                              variable also tends to increase. 
                              
                          Negative: In a negative relationship the 
                            variables tend to move in the opposite directions: 
                            If one variable increases, the other tends to decrease, 
                            and vice-versa. 
                          The direction of the relationship between two variables 
                          is identified by the sign of the correlation coefficient 
                          for the variables. Postive relationships have a "plus" 
                          sign, whereas negative relationships have a "minus" 
                          sign. 
                          
                      The Degree (Strength) of a Relationship 
                         A correlation coefficient measures the degree (strength) 
                          of the relationship between two variables. The Pearson 
                          Correlation Coefficient measures the strength of the 
                          linear relationship between two variables. Two 
                          specific strengths are: 
                         
                           
                           Perfect Relationship: When two variables 
                            are exactly (linearly) related the correlation coefficient 
                            is either +1.00 or -1.00. They are said to be perfectly 
                            linearly related, either positively or negatively. 
                             
                          No relationship: When two variables have 
                            no relationship at all, their correlation is 0.00. 
                          There are strengths in between -1.00, 0.00 and +1.00. 
                          Note, though. that +1.00 is the largest postive correlation 
                          and -1.00 is the largest negative correlation that is 
                          possible. 
                          Examples: Here are three examples. These examples 
                          concern variables measuring characteristics of automobiles. 
                          The variables are their weight, miles-per-gallon, horsepower 
                          and drive ratio (number of revolutions of the engine 
                          per revolution of the wheels). 
                          
                         
                          
                            | 
                                
                                  | Weight and Horsepower |  
                                  | The relationship between Weight and Horsepower 
                                    is strong, linear, and positive, though not 
                                    perfect. The Pearson correlation coefficient is 
                                      +.92. 
                                   |   |  |   
                         
                          
                            | 
                                
                                  | Drive Ratio and Horsepower |  
                                  | The relationship between drive ratio and 
                                    Horsepower is weekly negative, though not 
                                    zero. The Pearson correlation coefficient is 
                                      -.59. 
                                   |   |  |   
                         
                          
                            | 
                                
                                  | Drive Ratio and Miles-Per-Gallon |  
                                  | The relationship between drive ratio and 
                                    MPG is weekly positive, though not zero. The Pearson correlation coefficient is 
                                      .42. 
                                   |   |  |   
                      Scatterplots and The Form (Shape) of a Relationship: 
                        The form or shape of a relationship refers to whether 
                        the relationship is straight or curved. 
                         
                         
                          Linear: A straight relationship is called 
                            linear, because it approximates a straight 
                            line. The GPA, MathSAT example shows a relationship 
                            that is, roughly, a linear relationship. 
                             
                          Curvilinear: A curved relationship is called 
                            curvilinear, because it approximates a curved 
                            line. An example of the relationship between the Miles-per-gallon 
                            and engine displacement of various automobiles sold 
                            in the USA in 1982 is shown below. This is curvilinear 
                            (and negative).
                             
                         
                          
                            | 
                                
                                  | Miles-per-gallon and engine displacement |  
                                  | The relationship between Miles-per-gallon 
                                    and engine displacement is strongly positive, 
                                    but curvilinear. The Pearson correlation coefficient is 
                                      not appropriate.
                                   |   |  |   The Pearson correlation coefficient is only appropriate 
                          as a measure of linear relationship. We will see other 
                          correlation coefficients that measure curvilinear relationship. 
                      
                     
 
                      
                        | Where & Why we use Correlation: 
                            Correlations are used for Prediction, Validity, 
                            Reliability, and Verification. 
                         |  
  
                     
                      Prediction: Correlations can be used to help 
                        make predictions. If two variables have been known in 
                        the past to correlate, then we can assume they will continue 
                        to correlate in the future. We can use the value of one 
                        variable that is known now to predict the value that the 
                        other variable will take on in the future. 
                         For example, we require high school students to take 
                          the SAT exam because we know that in the past SAT scores 
                          correlated well with the GPA scores that the students 
                          get when they are in college. Thus, we predict high 
                          SAT scores will lead to high GPA scores, and conversely. 
                          
                      Validity: Suppose we have developed a new test 
                        of intelligence. We can determine if it is really measuring 
                        intelligence by correlating the new test's scores with, 
                        for example, the scores that the same people get on standardized 
                        IQ tests, or their scores on problem solving ability tests, 
                        or their performance on learning tasks, etc. 
                         This is a process for validating the new test of intelligence. 
                          The process is based on correlation. 
                          
                      Reliability: Correlations can be used to determine 
                        the reliability of some measurement process. For example, 
                        we could administer our new IQ test on two different occasions 
                        to the same group of people and see what the correlation 
                        is. If the correlation is high, the test is reliable. 
                        If it is low, it is not. 
                         
                      Theory Verification: Many Psychological theories 
                        make specific predictions about the relationship between 
                        two variables. For example, it is predicted that parents 
                        and children's intelligences are positively related. We 
                        can test this prediction by administering IQ tests to 
                        the parents and their children, and measuring the correlation 
                        between the two scores. 
                     
 |