PART I. HYPOTHESIS TESTING PROBLEM 1 A certain brand of fluorescent light tube was advertised as having an effective life span before burning out of 4000 hours. A random sample of 84 bulbs was burned out with a mean illumination life span of 1870 hours and with a sample standard deviation of 90 hours. Construct a 95 confidence interval based on this sample and be sure to interpret this interval. Answer Since population standard deviation is unknown, t distribution can be used construct the confidence interval. ? The 95% confidence interval is given by ? X ? t? / 2,n ? 1 ? S S? , X ? ? /2,n ? 1 ? n n? Details Confidence Interval Estimate for the Mean Data Sample Standard Deviation Sample Mean Sample Size Confidence Level 90 1870 84 95% Intermediate Calculations Standard Error of the Mean 9. 819805061 Degrees of Freedom 83 t Value 1. 988959743 Interval Half Width 19. 53119695 Confidence Interval Interval Lower Limit 1850. 47 Interval Upper Limit 1889. 53 2 PROBLEM 2 Given the following data from two independent data sets, conduct a one -tail hypothesis test to determine if the means are statistically equal using alpha=0. 05. Do NOT do a confidence interval. 1 = 35 n2 = 30 xbar1= 32 xbar2 = 25 s1=7 s2 = 6 Answer H0:µ1=µ2 H1: µ1>µ2 Test statistics used is t ? X1 ? X 2 S 2 (n1 ? 1) S12 ? (n2 ? 1) S2 n1n2 ~ tn1 ? n1 ? 2 where S ? n1 ? n2 ? 2 n1 ? n2 Decision rule : Reject the null hypothesis, if the calculated value of test statistic is greater than the critical value. Details t Test for Differences in Two Means Data Hypothesized Difference Level of Significance Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation 0 0. 05 35 32 7 30 25 6
Intermediate Calculations Population 1 Sample Degrees of Freedom 34 Population 2 Sample Degrees of Freedom 29 Total Degrees of Freedom 63 Pooled Variance 43. 01587 Difference in Sample Means 7 t Test Statistic 4. 289648 Upper-Tail Test Upper Critical Value p-Value Reject the null hypothesis 1. 669402 3. 14E-05 Conclusion: Reject the null hypothesis. The sample provides enough evidence to support the claim that means are different. 3 PROBLEM 3. A test was conducted to determine whether gender of a display model af fected the likelihood that consumers would prefer a new product.
A survey of consumers at a trade show which used a female spokesperson determined that 120 of 300 customers preferred the product while 92 of 280 customers preferred the product when it was shown by a female spokesperson. Do the samples provide sufficient evidence to indicate that the gender of the salesperson affect the likelihood of the product being favorably regarded by consumers? Evaluate with a two-tail, alpha =. 01 test. Do NOT do a confidence interval. Answer H0: There no significant gender wise difference in the proportion customers who preferred the product.
H1: There significant gender wise difference in the proportion customers who preferred the product. P ? P2 n p ? n p 1 The test Statistic used is Z test Z ? where p= 1 1 2 2 n1 ? n2 ?1 1? P(1 ? P) ? ? ? ? n1 n2 ? Decision rule : Reject the null hypothesis, if the calculated value of test statistic is greater than the critical value. Details Z Test for Differences in Two Proportions Data Hypothesized Difference Level of Significance Group 1 Number of Successes Sample Size Group 2 Number of Successes Sample Size 0 0. 01 Male 120 300 Female 92 80 Intermediate Calculations Group 1 Proportion 0. 4 Group 2 Proportion 0. 328571429 Difference in Two Proportions 0. 071428571 Average Proportion 0. 365517241 Z Test Statistic 1. 784981685 Two-Tail Test Lower Critical Value -2. 575829304 Upper Critical Value 2. 575829304 p-Value 0. 074264288 Do not reject the null hypothesis Conclusion: Fails to reject the null hypothesis. The sample does not provide enough evidence to support the claim that there significant gender wise difference in the proportion customers who preferred the product. 4
PROBLEM 4 Assuming that the population variances are equal for Male and Female GPA’s, test the following sample data to see if Male and Female PhD candidate GPA’s (Means) are equal. Conduct a two-tail hypothesis test at ? =. 01 to determine whether the sample means are different. Do NOT do a confidence interval. Male GPA’s Female GPA’s Sample Size 12 13 Sample Mean 2. 8 4. 95 Sample Standard Dev .25 .8 Answer H0: There is no significant difference in the mean GPA of males and Females H1: There is significant difference in the mean GPA of males and Females. Test Statistic used is independent sample t test. ? X1 ? X 2 S 2 (n1 ? 1) S12 ? (n2 ? 1) S2 n1n2 ~ tn1 ? n1 ? 2 where S ? n1 ? n2 ? 2 n1 ? n2 Decision rule: Reject the null hypotheses, if the calculated value of test statistic is greater than the critical value. Details t Test for Differences in Two Means Data Hypothesized Difference Level of Significance Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation Intermediate Calculations Population 1 Sample Degrees of Freedom Population 2 Sample Degrees of Freedom Total Degrees of Freedom Pooled Variance 0. 05 12 2. 8 0. 25 13 4. 95 0. 8 11 12 23 0. 363804 5 Difference in Sample Means t Test Statistic -2. 15 -8. 90424 Two-Tail Test Lower Critical Value Upper Critical Value p-Value Reject the null hypothesis -2. 80734 2. 807336 0. 0000 Conclusion: Reject the null hypotheses. The sample provides enough evidence to support the claim that there is significant difference in the mean GP A score among the males and females. 6 PART II REGRESSION ANALYSIS Problem 5 You wish to run the regression model (less Intercept and coefficients) shown below: VOTE = URBAN + INCOME + EDUCATE
Given the Excel spreadsheet below for annual data from1970 to 2006 (with the data for row 5 thru row 35 not shown), complete all necessary entries in the Excel Regression Window shown below the data. 1 2 3 4 A YEAR 1970 1971 1972 B VOTE C URBAN D INCOME E EDUCATE 49. 0 58. 3 45. 2 62. 0 65. 2 75. 0 7488 7635 7879 4. 3 8. 3 4. 5 36 37 38 2004 2005 2006 50. 1 92. 1 94. 0 95. 6 15321 15643 16001 4. 9 4. 7 5. 1 67. 7 54. 2 Regression Input OK Input Y Range: A1:A38 Input X Range: B1:E38 Cancel Help ? Labels Confidence Level: x X X Output options X Constant is Zero 95 % Output Range: New Worksheet Ply:
New W orkbook Residuals Residuals Residual Plots Standardized Residuals Line Fit Plots Normal Probabilit y Normal Probability Plots 7 PROBLEM 6. Use the following regression output to determine the following: A real estate investor has devised a model to estimate home prices in a new suburban development. Data for a random sample of 100 homes were gathered on the selling price of the home ($ thousands), the home size (square feet), the lot size (thousands of square feet), and the number of bedrooms. The following multiple regression output was generated: Regression Statistics Multiple R 0. 8647 R Square . 7222 Adjusted R Square 0. 6888 Standard Error 16. 0389 Observations 100 Intercept X1 (Square Feet) X2 (Lot Size) X3 (Bedrooms) Coefficients -24. 888 0. 2323 11. 2589 15. 2356 Standard Error 38. 3735 0. 0184 1. 7120 6. 8905 t Stat -0. 7021 9. 3122 4. 3256 3. 2158 P-value 0. 2154 0. 0000 0. 0001 0. 1589 a. Why is the coefficient for BEDROOMS a positive number? The selling price increase when the number of rooms increases. Thus the relationship is positive. b. Which is the most statistically significant variable? What evidence shows this? Most statistically significant variable is one with least p value.
Here most statistically significant variable is Square feet. c. Which is the least statistically significant variable? What evidence shows this? Least statistically significant variable is one with high p value. Here least statistically significant variable is bedrooms d. For a 0. 05 level of significance, should any variable be dropped from this model? Why or why not? The variable bed rooms can be dropped from the model as the p value is greater than 0. 05. e. Interpret the value of R squared? How does this value from the adjusted R squared? The R2 gives the model adequacy. Here R2 suggest that 72. 22% variability can e explained by the model. Adjusted R2 is a modification of R2 that adjusts for the number of explanatory terms in a model. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance. f. Predict the sales price of a 1134-square-foot home with a lot size of 15,400 square feet and 2 bedrooms. Selling Price =-24. 888+0. 02323*1134+11. 2589*15400+15. 2356*2=173419 8 PART III SPECIFIC KNOWLEDGE SHORT-ANSWER QUESTIONS. Problem 7 Define Autocorrelation in the following terms: a. In what type of regression is it likely to occur? Regressions involving time series data . What is bad about autocorrelation in a regression? The standard error of the estimates will high. c. What method is used to determine if it exists? (Think of statistical test to be used) Durbin Watson Statistic is used determine auto correlation in a regression. d. If found in a regression how is it eliminated? Appropriate transformations can be adopted to eliminate auto correlation. Problem 8 Define Multicollinearity in the following terms: a) In what type of regression is it likely to occur? Multicollinearity occurs in multiple regressions when two or more independent variables are highly correlated. ) Why is multicollinearity in a regression a difficulty to be resolved? Multicollinearity in Regression Models is an unacceptably high level of intercorrelation among the independents, such that the effects of the independents cannot be separated. Under multicollinearity, estimates are unbiased but assessments of the relative strength of the explanatory variables and their joint effect are unreliable. c) How can multicollinearity be determined in a regression? Multicollinearity refers to excessive correlation of the predictor variables. When correlation is excessive (some use the rule of thumb of r > 0. 90), tandard errors of the b and beta coefficients become large, making it difficult or impossible to assess the relative importance of the predictor variables. The measures Tolerance and VIF are commonly used to measure multicollinearity. Tolerance is 1 - R2 for the regression of that independent variable on all the other independents, ignoring the dependent. There will be as many tolerance coefficients as there are independents. The higher the inter-correlation of the independents, the more the tolerance will approach zero. As a rule of thumb, if tolerance is less than . 20, a problem with multicollinearity is indicated.
When tolerance is close to 0 there is high multicollinearity of that variable with other independents and the b and beta coefficients will be unstable. The more the multicollinearity, the lower the tolerance, the more the standard error of the regression coefficients. d) If multicollinearity is found in a regression, how is it eliminated? Multicollinearity occurs because two (or more) variables are related – they measure essentially the same thing. If one of the variables doesn’t seem logically essential to your model, removing it may reduce or eliminate multicollinearity.