# Chapter 1 - What is Statistics? Chapters 1. Introduction 2. Graphs 3. Descriptive statistics 4. Basic probability 5. Discrete distributions 6. Continuous distributions 7. Central limit theorem 8. Estimation 9. Hypothesis testing 10. Two-sample tests 13. Linear regression

14. Multivariate regression Chapter 10 Two Sample Tests 03/01/2020 Towson University - J. Jung 1.1 Comparing Two Populations Previously

we looked at techniques to estimate and test parameters for one population: Population Mean: Population Proportion: We will still consider these parameters when we are looking at two populations, however our interest will now be: The difference between two means. The ratio of two variances. The difference between two proportions. Difference between Two Means In order to test and estimate the difference between two population means, we draw random samples from each of two populations. Initially, we will consider independent samples, that is, samples that are

completely unrelated to one another. Population 1 Sample: n1 Parameters: (Likewise, we consider Statistics: for Population 2) Difference between Two Means Because we are comparing two population means, we use the statistic:

which is an unbiased and consistent estimator of: 1- 2 Sampling Distribution of Differences of Means is normally distributed if the original populations are normal or approximately normal if the populations are non-normal and the sample sizes are large (n1, n2 > 30)!! (Remember condition for CLT) The expected value of is: The variance of is: and the standard error is : Making Inferences About 1-2

Since is normally distributed if the original populations are normal or approximately normal if the populations are non-normal and the sample sizes are large, then: is a N(0,1) (or approximately normal) random variable. We could use this to build the test statistic and the confidence interval estimator for . Making Inferences About except that, in practice, the z statistic is rarely used since the population variances are unknown.

?? Instead we use a t-statistic. We consider two cases for the unknown population variances: 1. when we believe they are equal: 2. when we believe they are not equal: Test Statistic for (equal variances: ) Calculate as the pooled variance estimator

and use it here: degrees of freedom CI Estimator for 1-2 (equal variances) The confidence interval estimator for 1-2 when the population variances are equal is given by: pooled variance estimator degrees of freedom Test Statistic for

(unequal variances: ) The test statistic for 1-2 when the population variances are unequal is given by: degrees of freedom Likewise, the confidence interval estimator is: Which test to use? Which test statistic do we use? Equal variance or unequal variance? Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the equal variances ttest. This is so, because for any two given samples:

The number of degrees of freedom for the equal variances case Larger numbers of degrees of freedom have the same effect as having larger sample sizes The number of degrees of freedom for the unequal variances case

Example Millions of investors buy mutual funds choosing from thousands of possibilities. Some funds can be purchased directly from banks or other financial institutions while others must be purchased through brokers, who charge a fee for this service. This raises the question, can investors do better by buying mutual funds directly than by purchasing mutual funds through brokers. Example To help answer this question a group of researchers randomly sampled the annual returns from mutual funds that can be acquired directly and mutual

funds that are bought through brokers and recorded the net annual returns, which are the returns on investment after deducting all relevant fees. Can we conclude at the 5% significance level that directly-purchased mutual funds outperform mutual funds bought through brokers? Example To answer the question we need to compare the population of returns from direct and the returns from broker- bought mutual funds. The data are obviously interval (we've recorded real numbers). This problem objective - data type combination tells us that the parameter to be tested is the difference

between two means 1- 2. Example The hypothesis to be tested is that the mean net annual return from directlypurchased mutual funds (1) is larger than the mean of broker-purchased funds (2). Hence the alternative hypothesis is H1: 1- 2 > 0 and H0: 1- 2 = 0 To decide which of the t-tests of 1 - 2 to apply we conduct the F-test of 12/ 22 . (Not covered in class, so just roll with it for now) Example Assume F-test concluded that there is not enough evidence to infer that the population variances

differ. It follows that we must apply the equal-variances t-test of 1- 2 Example A B C 1 t-Test: Two-Sample Assuming Equal Variances 2 3 Direct Broker 4 Mean

6.63 3.72 5 Variance 37.49 43.34 6 Observations 50 50 7 Pooled Variance 40.41 8 Hypothesized Mean Difference 0 9 df 98 10 t Stat

2.29 11 P(T<=t) one-tail 0.0122 12 t Critical one-tail 1.6606 13 P(T<=t) two-tail 0.0243 14 t Critical two-tail 1.9845 Example: Result The value of the test statistic is 2.29. The one-tail pvalue is .0122. We observe that the p-value of the test is small (and the test statistic falls into the rejection region). As a result we conclude that there is sufficient

evidence to infer that on average directly-purchased mutual funds outperform broker-purchased mutual funds Confidence Interval Estimator Suppose we wanted to compute a 95% confidence interval estimate of the difference between mean caloric intake for consumers and non-consumers of high-fiber cereals. The unequal-variances estimator is ( x 1 x 2 ) t / 2 1 1

n1 n 2 2 s p Confidence Interval Estimator A B C D 1 t-Estimate of the Difference Between Two Means (Equal-Variances) 2 Sample 1 Sample 2 Confidence Interval Estimate

3 2.91 4 Mean 6.63 3.72 Lower confidence limit 5 Variance 37.49 43.34 Upper confidence limit 6 Sample size 50 50 40.42 7 Pooled Variance

8 Confidence level 0.95 E F 2.52 0.39 5.43 Confidence Interval Estimator We estimate that the return on directly

purchased mutual funds is on average between .39 and 5.43 percentage points larger than broker-purchased mutual funds.