• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View

# Correlation table

last edited by 5 years, 8 months ago

Correlation table is a statistical tool to show whether quantitative variables are related to one another and if they are, how strong they are related to one another. Correlation measure the strength  of linear association between two quantitative variables. To interpret a correlation, there are certain assumptions that we need to check. We have to assume that there is a true underlying linear relationship between the two variables but since we generally can't know that there is relationship, we can only check if it meet certain conditions to see a possible relation. We can do that through a possible Scatterplot. The first condition is quantitative variables condition. This condition is met simply by making sure that the data sets that we have are quantitative data. The second condition is straight enough condition. From the scatterplot, we can check that if they look rather reasonably straight, however this is a judgment call but not difficult. The third and last condition is the no outlier condition. It is easy in the scatterplot to check for an outlier. Outlier are important to notice since it can distort the correlation dramatically. Correlations are often reported without supporting data or plots however it is important to think about the conditions above so as it is meaningful to report the correlation.

The correlation coefficient, r , also called Pearson correlation, measure the strength and the direction of a linear relationship between quantitative variables. The sign of the correlation coefficient gives the direction of the association. The correlation coefficient is always between -1 and +1, but it has no units. If the relationship seems linear by looking at the scatterplot and the correlation coefficient, the correlation coefficient will have stars (*) attached to it, which gives it its significance, the higher number of stars, the better it is for the Pearson correlation value. The stars are usually related to the sample size and other factors. Correlation coefficient treats x and y symmetrically that is the correlation of x with y is the same as the correlation of y with x.

For example, suppose we have the data on burgers from different burger places. Data about calories, fat, and sodium is collected on each burger. From the following table, we can see the correlation coefficient of each of the variables with respect to one another.

 Correlations Fat Sodium Calories Fat Pearson Correlation 1 .199 .961** Sig. (2-tailed) .669 .001 N 7 7 7 Sodium Pearson Correlation .199 1 .265 Sig. (2-tailed) .669 .566 N 7 7 7 Calories Pearson Correlation .961** .265 1 Sig. (2-tailed) .001 .566 N 7 7 7 **. Correlation is significant at the 0.01 level (2-tailed).

For this particular example, let's consider the first row.The correlation coefficient that we see is of Fat content versus Fat content which is 1. This seem to make sense because it will have a perfect linear relation with itself. The second correlation coefficient is 0.199 for Fat content versus sodium content but since the 0.199 does not have any stars attached to it and is such a rather low value, from this data set there does not appear to be a relationship between Fat and Sodium. The third correlation coefficient is 0.961** for Fat content versus Calories.The sign in front of  correlation coefficient gives the direction of the association, which means there is a positive relationship between the Fat  content and Calories content. The coefficient is 0.961 shows a moderately strong strength between the Fat content of the burger and Calories of the burgers .The two stars attached to the coefficient correlation (0.961**) shows the significance of the relationship. From the above set of data, the only data that could be report as a possible significant correlation coefficient is 0.961**.

A scatterplot for the Fat versus Calories is generated. The scatterplot shows a linear relationship but we can see a few of the data points lying on the line and there are a lot of points not lying on the line. Also due to the small sample size, one data value make a lot of difference on the correlation coefficient so it might not be safe to report the correlation coefficient in this case. From the above example, we can see how it is important to check out the three assumptions before reporting the correlation coefficient. It is useless to report correlation if it does not have the support of data or plots therefore be cautious in interpreting a correlation when you can't check the conditions. Once the correlation coefficient makes sense, the interpretation of the correlation is important. There are two ways in which interpretation can go wrong. The first is by saying the correlation means "association" which is wrong. For example as we see a moderately strong positive correlation coefficient between the Fat content of the burgers and Calories content of the burgers however as tempting as it is to say that the explanatory variable(Fat content) has caused the response variable(Calories content) to change, there is not a cause and effect in this explanation. There can always be a lurking variable behind the relationship between the two variables which simultaneously affect them both and which is causing this moderately strong correlation. The second wrong interpretation is trying to correlate categorical variables.

Generating a Correlation Table is SPSS:

• Go to "Analyze" menu, select the "Correlate" and click on "Bivariate".
• Drag the quantitative variables into the box called "Variables".
• Click "OK". The correlation table will appear in the Output window.