Multiple Regression
Forrest Young's Notes
Copyright © 1997-9 by Forrest W. Young.
Multiple Regression Example:
1997-99 GPA with Math and Verbal SATWe return to the GPA, Math SAT and Verbal SAT variables.
The visualization for these data produces the plots shown below. We see postive relationships of Verbal and Math SAT with GPA. Note that the black dots are for Psych30 1997, the blue for Psych30C 1998, and the red for Psych30C 1999.
Rescale the data. We now prepare to do the regression analysis. We convert the data by dividing the two SAT variables by 100 to clarify the discussion of the slope (so that we can see a change of one unit on the plot). This change in the variable (dividing by a constant) does not change the relationship between the two variables, and does not change either the correlation or regression analysis.
We do this by clicking on the data object, and then typing in the listener:
(transform :use pstat9799 :variables '(GPA MSAT/100 VSAT/100) :program (let ((a (/ MathSAT 100)) (b (/ VerbSAT 100)) ) (list gpa a b)))Now we do the regression analysis using ViSta's Regression Analysis module, which can be done by clicking on the Regres button on the workmap, and selecting GPA as the response variable and the two SAT variables as the predictor variables.
Report: We then ask for the regression report. It is shown below.
The regression analysis report has three major sections, each containing important information about the analysis:
- Parameter Estimates: The parameter estimates section of the report presents information about the slopes for each SAT variable, as well as about the intercept.
Under the "Estimate" column the report presents the values for the intercept and slopes of the function that regression analysis estimates produces the best fit to the points.
The intercept and slopes are often called the "coefficients", because they are the coefficients of the regression function. They are called "estimates" (short for "estimated coefficients") because they are estimates of what the coefficients are in the population.
- Intercept:The report calls the intercept the "Constant". Regression analysis estimates it to be a=1.08. This means that if we had someone with a Verbal SAT of zero, we would estimate that person's GPA to be 1.08. Notice that this value doesn't make sense! In fact, the Intercept is usually not interpreted, especially if a value of zero for the predictor variable can't really be obtained in practice.
- Slope for MathSAT/100:The report presents the slope for the "VerbSAT/100" variable. Regression analysis estimates it to be b=.15. This means that for a given level of Verbal SAT, for every 100 point change in Math SAT we expect a change in GPA of .15 units! Thus, for two people with the same Math SAT, a person whose SAT is 100 points higher than the other, we would predict that the first person's GPA would be .15 points higher than the second person's. Note that this is a significant effect (p=.0039)
- Slope for VerbSAT/100:The report presents the slope for the "VerbSAT/100" variable. Regression analysis estimates it to be b=.19. This means that for a given level of MathSAT, for every 100 point change in Verbal SAT we expect a change in GPA of .19! Thus, for two people with the same Math SAT, a person whose SAT is 100 points higher than the other, we would predict that the first person's GPA would be .19 points lower than the second person's. This is also a significant effect (p=.0023).
- Regression Line: The equation for the regression line (also known as the best linear combination) is:
GPA = 1.52
+0.15 (MathSAT/100)
+0.19 (VerbSAT/100)The (partial) regression plots are shown below (also called the added variables plot).Note the mild positive relationship for both variables with GPA.
- Std. Error: The report has a column labeled Standard Error. This column presents the standard errors of the estimated coefficients. This measures the stability of the estimates.
Note: This is not what the book calls the "Standard Error of Estimate". That value is presented below by the name "Sigma Hat (RMS Error)".
- t-ratio, P-Value: These provide a significance test for each of the estimated coefficients. The test is of the null hypothesis that the tested coefficient (intercept or slope) is zero.
Note: The question of whether the slope is zero gets at the question of the nature of the relationship between the variables. This is important because the question is: Does one variable change when the other does? (Note that zero intercept makes little interpretive sense and the test is usually ignored).
- Summary of Fit
- R Squared: This is the square of the multiple correlation between the best linear combination (the equation above) of the two predictor variables and the response variable. It is the coefficient of determination that measures the variance shared between the variables.
- Sigma Hat (RMS Error): This is what the books calls the "Standard Error of Estimate". It specifies the Root-Mean-Squared (RMS) average --- or standard --- distance between the points and the line, measured vertically.
- Analysis of Variance: An analysis of variance is reported that tells us whether the entire regression model significantly fits the response variable. The entire model includes both slopes and the intercept simultaneously. The null hypothesis is that there is no relation between the variables. The F-Ratio and P-Value summarize this test's results. The R-Squared tells us the proportion of variance in GPA that is understood from the two SAT variables.