Simple Regression
When you want to use one variable in order to predict another variable then you can do a simple regression simple means that you just have one predictor or one independent variable and a regression means that you want to predict something so. I'm going to be using how students did on the first quiz in a course to predict how well they did in their final percent grade to do the regression you going to analyze regression linear and then you place whatever it is that you want to predict in the dependent variable box sometimes this is called a criterion variable. And you place your predictor or your independent variable in the independent variable box in statistics. You'll want to check off the descriptives then press. Continue and we're going to leave everything else the same for now then press ok so SPSS gives you a lot of different tables. The first one is the descriptive statistics that we requested. So you'll get the mean and standard deviation for both variables both of your variables in a regression should be scale or continuous variables after that you will get the correlations or the correlation between your predictor variable and your dependent variable here it says that our correlation is point eight one eight which means a positive relationship and this relationship is significant based on the significance value here. This next table refers to our model and which variables have been entered and removed using method called. Enter so in order to understand this. We need to understand that in order to do a regression what we are actually doing is trying to fit the relationship between our two variables onto a straight line so our simple equation for a line looks like this y equals MX plus B where. Y is our dependent variable X is our independent variable and B is some sort of intercept or constant so the M in this equation is the slope or the coefficient the relationship between our dependent variable and our independent variable so basically if we were to plot our two variables here and stick a line of best fit through that data then that line is basically our model or our representation of the data.
So what this is saying is that we have one model one line and we are entering one variable quiz one. We haven't removed any variables and this method of entering and forcing all the variables that you entered to be in the equation is called the enter method. This is just a simple regression so we only have one independent variable and only one model after that you get the model summary which indicates the relationship between the independent and dependent variables. In this case. We only have one independent variable or one predictor variable which means that this R here is going to be the exact same as our correlation between those two variables. Afterwards we get the significance of our model. This table indicates because this p-value is less than point zero 5 that our model is significant which means that our line or our equation for a line can be used to represent our data the residual is the error. Lastly you get the coefficients table and this will tell you the basically the numbers that you can use to construct your equation and you get unstandardized coefficients and stay coefficients so in order to interpret this. I'm going to show you what form of the equation. So it's going to follow this basic equation of a line but it's going to look slightly different here. We have y hat whenever you see a hat on top of a variable. It means that it's being predicted. So this is saying that our dependent variable is final percent grade and that's the variable that were interested in predicting next we have our B our B coefficient and also our independent variable which is quiz 1 plus some sort of constant so it's following this same form in format it's just called slightly different things in order to fill this equation out you just have to read the values from the table so from the B the unstandardized coefficients you would put 399 before the B.
You can leave this as it is. It just indicates which independent variable is being what relationship you're looking at and then for the constant you read it from this line here and that's it. This is the equation for our line. So what this is saying because we're using the unstandardized coefficient this is saying for every one unit increase in quiz 1 so if if a student got one mark higher on quiz 1 then there would be a 3.99 increase in their final percent grade so. I'll show you an example of how to use this equation to figure out a predicted value. Let's say that a student got 6 out of 10 on their first quiz in order to calculate this or what the predicted value would be or what you would predict for their final percent grade. In the course you would put 3 9 9 times 6 the value plus 50 point 4 8 or the constant that tells us that based on our model for that student we would prison we would predict a final percent grade of about 74 percent so if we go to look at our data we can see here that the first student here did get a six on quiz 1 and their final percent grade is 64 so we're close but it's not it's not a perfect model so the difference between our observed or our real final percent grade and our predicted value is called the residual. So that's how the error in prediction is it's determined you can also use the standardized coefficients in order to construct your equation but they're interpreted in a slightly different way so to get these standardized coefficients. What's happening is that quiz. 1 and final percent grade are standardized or converted to set score variables and then the regression is conducted so what this means is that for every one in one unit increase in standard deviation in the independent variable. There's a point eight one. Eight increase in standard deviations for the dependent variable generally people find the standardized coefficients more difficult to interpret because the units are in standard deviations and not their original units.
But if you were to do comparisons across groups then you would need to use the standardized coefficients.
So what this is saying is that we have one model one line and we are entering one variable quiz one. We haven't removed any variables and this method of entering and forcing all the variables that you entered to be in the equation is called the enter method. This is just a simple regression so we only have one independent variable and only one model after that you get the model summary which indicates the relationship between the independent and dependent variables. In this case. We only have one independent variable or one predictor variable which means that this R here is going to be the exact same as our correlation between those two variables. Afterwards we get the significance of our model. This table indicates because this p-value is less than point zero 5 that our model is significant which means that our line or our equation for a line can be used to represent our data the residual is the error. Lastly you get the coefficients table and this will tell you the basically the numbers that you can use to construct your equation and you get unstandardized coefficients and stay coefficients so in order to interpret this. I'm going to show you what form of the equation. So it's going to follow this basic equation of a line but it's going to look slightly different here. We have y hat whenever you see a hat on top of a variable. It means that it's being predicted. So this is saying that our dependent variable is final percent grade and that's the variable that were interested in predicting next we have our B our B coefficient and also our independent variable which is quiz 1 plus some sort of constant so it's following this same form in format it's just called slightly different things in order to fill this equation out you just have to read the values from the table so from the B the unstandardized coefficients you would put 399 before the B.
You can leave this as it is. It just indicates which independent variable is being what relationship you're looking at and then for the constant you read it from this line here and that's it. This is the equation for our line. So what this is saying because we're using the unstandardized coefficient this is saying for every one unit increase in quiz 1 so if if a student got one mark higher on quiz 1 then there would be a 3.99 increase in their final percent grade so. I'll show you an example of how to use this equation to figure out a predicted value. Let's say that a student got 6 out of 10 on their first quiz in order to calculate this or what the predicted value would be or what you would predict for their final percent grade. In the course you would put 3 9 9 times 6 the value plus 50 point 4 8 or the constant that tells us that based on our model for that student we would prison we would predict a final percent grade of about 74 percent so if we go to look at our data we can see here that the first student here did get a six on quiz 1 and their final percent grade is 64 so we're close but it's not it's not a perfect model so the difference between our observed or our real final percent grade and our predicted value is called the residual. So that's how the error in prediction is it's determined you can also use the standardized coefficients in order to construct your equation but they're interpreted in a slightly different way so to get these standardized coefficients. What's happening is that quiz. 1 and final percent grade are standardized or converted to set score variables and then the regression is conducted so what this means is that for every one in one unit increase in standard deviation in the independent variable. There's a point eight one. Eight increase in standard deviations for the dependent variable generally people find the standardized coefficients more difficult to interpret because the units are in standard deviations and not their original units.
But if you were to do comparisons across groups then you would need to use the standardized coefficients.