Linear regression


Linear regression demonstrates the relationship between two quantitative variables: an explanatory variable and a dependent variable. Running a linear regression is extremely useful in finding the equation of the regression line, and lending the ability to interpret both the slope and the intercept in contextually meaningful ways.

Before we can do a linear regression we have to check the all conditions for correlation.

 

 For example, if you wanted to run a linear regression on the relationship between household size (explanatory variable)  and the amount of plastic waste (response variable) after conducting the correct correlation analysis on a specific data set, the first step would be to check and see if there is homoscedasticity (does the plot thicken). For this, you would run a linear regression, and then navigate to the residual plot, where you then check it for homoscedasticity. Once there, you would analyze the data and determine whether or not you believe that the data expands its range. For this scenario, the plot doesn't thicken, although there are some minor outliers.

 

 

 

Since the linear regression passes this condition, the next step is to find the equation of the regression line, which takes the form of:   Where y is equal to the response variable (amount of plastic waste), and x is equal to the explanatory variable (household size).

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.392

.195

 

2.014

.049

Household size

.409

.047

.751

8.798

.000

a. Dependent Variable: Weight of discarded plastic goods

 

 

By looking at theCoefficient table, we can state the equation to be:

 

 

 

Now we have to interpret the slope in a contextually meaningful way. So the model predicts that when the household size increases by 10 units of measurement, the amount of plastic waste would increase by 4.09 units.

 

Additionally, we provide a literal interpretation of the intercept as well. So the model predicts that when the household size is 0, the amount of plastic waste would be 0.392 units. To interpret this in a contextually meaningful way, we can say that the literal interpretation is not an appropriate one, because if there are no people in the house, there can’t be any plastic waste. However, the scatterplot shows us that is not extrapolation, since the predicted value is not far away from the actual data.

 

Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.751a

.563

.556

.70990

 

The R² value tells us that 56.3% of the variability of the amount of plastic waste can be explained by the variability in the household size.

 

 

 

Linear regression in SPSS:

 

 

Once you have the SPSS output generated, you can identify the coefficients, which are necessary to interpret the slope and the intercept, under the "Coefficients" table. If you scroll down to the bottom, you will need to identify the Residual Plot to check if the plot thickens.

 

The following video illustrated these steps:

 

Unable to display content. Adobe Flash is required.