• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View

# Two-sample t-interval for the difference between means

last edited by 5 years, 7 months ago

Two-sample t-interval for the difference between means is an interval that captures the difference between two individual sample means, under a certain confidence level. The interval is useful because it provides us with a statistically significant range to make reasonable estimates when comparing the difference between two individual population means. To calculate, we can use the following formula:

is the difference between the two sample means, and  is the margin of error(ME), where t* is the critical value determined by the specified confidence level and the number of degrees of freedom (we can calculate the degrees of freedom by using a very complicated formula), S1 and S2 refers to the two sample standard deviations, and n1 and n2 refer to the two sample sizes respectively.

In order to calculate the two-sample t-interval, we need to have one categorical and quantitative variable. In the categorical variable, there needs to be at least two different categories that we are comparing.

In our example, a sports reporter suggests that professional baseball players must on average, be older than professional football players, since football is a contact sport and players are more susceptible to concussions and serious injuries (www.sports.yahoo.com). One player was selected at random from each team in both professional baseball (MLB) and professional football (NFL). The data is summarized below.

 MLB NFL n 29 32 27.03 26.16 s 3.05 2.78

Before we get into calculating the t-interval, we can analyze the box plots of the two groups to see if there is any significant difference between the two groups.

Although it seems like there is no significant difference between the two groups, we can still proceed to calculate the t-interval.

To calculate the confidence interval, we must first check the conditions. There are four conditions:

1. Randomization Condition (for both groups)
2. Sample size is under 10% of the population (for both groups)
3. Nearly Normal Condition (for both groups)
• n>40
• or check histogram (unimodal and symmetric)/Q-Q plot (straight enough)

4.  Independent Group Assumption

For our example, the first condition is met since one player from each team in both sports was stated to be selected at random. For both groups, the sample sizes of 29 and 32 should be below 10% of the two populations, i.e. the MLB players and the NFL players. To check whether the two samples are normally distributed, we use the Descriptive statistics in order to analyze the Q-Q plots (see below). In this case, MLB has a Q-Q plot that is straight enough, but the Q-Q plot for NFL doesn't seem to be as straight. In this case, even if the Q-Q plot is a little skewed, we're still ok since the sample size for NFL is 32 which is close to 40. So we say that two groups are nearly normal. For the last condition, the two groups are independent of each other because the players selected were exclusively playing one sport or the other.

To calculate the t-interval, we can either use the formula above and plug in our data or we can use SPSS to generate it for us. To do it by hand, we already know the sample sizes, sample means, and sample standard deviations. We still need to use SPSS to generate the critical t score(see the section "Generating a Critical T-Score in SPSS" on this page for instructions - One-sample t-interval for the mean).

If the question that you encounter doesn't provide you with all the information that you need to calculate the interval, we can use Descriptive statistics in SPSS to find those values.

Now we have n1=29, 1=27.03, s1=3.053 and n2=32, 2=26.16, s2=2.784. By plugging in those numbers correctly, we should get a t-interval of (-.625, 2.382). Therefore, we can conclude that there could be a difference between the mean age of MLB and NFL players, and we are 95% confident that  the true mean difference is captured in the interval (-.625, 2.382) years. In context, MLB players could have a mean age anywhere between .6 years younger than NFL players to 2.4 years older than NFL players. We come to this conclusion by subtracting the mean age of MLB players by mean age of NFL players.

# Generating a Two-sample t-interval using SPSS

• Go to "Analyze" menu, select "Compare Means" and then "Independent-Samples T Test".
• Drag the quantitative variable under "Test Variable".
• Drag the categorical variable under "Grouping Variable".
• Click "Define Groups".
• Decide which groups you want to identify as Group 1 and Group 2 (remember that it's Group 1 minus Group 2).
• Type in Group 1 and 2 exactly as they're stated in the dataset, and click "Continue".
• Click "OK". Then the "Independent Samples Test" table will appear in the Output window.
• We want to look at the confidence interval in the second row of the table under title "Equal Variances Not Assumed ".