Statistical Inference for the Center of a Population

Wolfe, Douglas A.; Schneider, Grant

doi:10.1007/978-3-319-56072-4_7

Douglas A. Wolfe⁶ &
Grant Schneider⁷

Part of the book series: Springer Texts in Statistics ((STS))

5787 Accesses

Abstract

In this chapter we consider the commonly encountered statistical problem of using sample data for a quantitative variable along with the sampling distribution of an appropriate summary statistic to make inferences about the center of the corresponding population distribution. For example, Sciulli and Carlisle (1975) used skeletal remains to obtain sample data on the stature of a number of prehistoric Amerindian populations living in the Ohio Valley over the years from 200 BC to 1200 AD. A number of questions naturally arose in this study. What was the typical height for male and female Amerindians living in this region of the country during that period of time? As the degree of plant cultivation increased and the reliance on the availability or scarcity of fresh game decreased over the years, was there a noticeable change in the stature of the Amerindian populations? Questions such as these can be addressed only through the use of appropriate statistical techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
(If there are ties among the ∣D∣’s, we assign average ranks to the tied values. For example, if n = 5 and D ₁ = -4, D ₂ = 9, D ₃ = -3, D ₄ = 4, and D ₅ = 6, the ordered ∣D∣ values are 3, 4, 4, 6, 9. The average rank of \( \frac{2+3}{2} \) = 2.5 is assigned to each of the tied absolute values ∣D ₁∣ = ∣D ₄∣ = 4. The complete set of ranks for (∣D ₁∣, …, ∣D ₅∣) is then (R ₁, …, R ₅) = (2.5, 5, 1, 2.5, 4).)

Bibliography

Al Jarad, N., Gellert, A. R., & Rudd, R. M. (1993). Bronchoalveolar lavage and ^99m TC-DTPA clearance as prognostic factors in asbestos workers with and without asbestosis. Respiratory Medicine, 87, 365–374.
Article Google Scholar
Ali, A., Rasheed, A., Siddiqui, A. A., Naseer, M., Wasim, S., & Akhtar, W. (2015). Non-parametric test for ordered medians: The Jonckheere Terpstra test. International Journal of Statistics in Medical Research, 4, 203–207.
Article Google Scholar
Anderson, N. L. (1999). Personal communication for report in Statistics 661. Columbus: Ohio State University.
Google Scholar
Chu, S. (2001). Pricing the C’s of diamond stones. Journal of Statistics Education, 9(2), 12 pages online.
Google Scholar
Groom. (1999). Personal communication for report in Statistics 661. Columbus: Ohio State University.
Google Scholar
Hines, C. (1999). Personal communication for report in Statistics 661. Columbus: Ohio State University.
Google Scholar
Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric Statistical Methods (3rd ed.). New York: Wiley.
MATH Google Scholar
Kayle, K. A. (1984). Personal communication for report in Statistics 661. Columbus: Ohio State University.
Google Scholar
Kerr, H. (1983). Personal communication for report in Statistics 661. Columbus: Ohio State University.
Google Scholar
Larson, D. W. (1999). Personal communication.
Google Scholar
Larson, D. W., Matthes, U., Gerrath, J. A., Larson, N. W. K., Gerrath, J. M., Nekola, J. C., Walker, G. L., Porembski, S., & Charlton, A. (2000). Evidence for the widespread occurrence of ancient forests on cliffs. Journal of Biogeography, 27, 319–331.
Article Google Scholar
Mackowiak, P. A., Wasserman, S. S., & Levine, M. M. (1992). A critical appraisal of 98.6^o F, the upper limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich. Journal of the American Medical Association, 268(12), 1578–1580.
Article Google Scholar
March, G. L., John, T. M., McKeown, B. A., Sileo, L., & George, J. C. (1976). The effects of lead poisoning on various plasma constituents in the Canada goose. Journal of Wildlife Diseases, 12, 14–19.
Article Google Scholar
Moore, T. L. (2006). Paradoxes in film ratings. Journal of Statistics Education, 14(1), 8 pages online.
Google Scholar
Perez, H. D., Horn, J. K., Ong, R., & Goldstein, I. M. (1983). Complement (C5)- derived chemotactic activity in serum from patients with pancreatitis. The Journal of Laboratory and Clinical Medicine, 101, 123–129.
Google Scholar
Sciulli, P. W., & Carlisle, R. (1975). Analysis of the dentition from three Western Pennsylvania late woodland sites. I. Descriptive statistics, partition of variation and asymmetry. Pennsylvania Archaeologist, 45(4), 47–55.
Google Scholar
Shkedy, Z., Aerts, M., & Callaert, H. (2006). The weight of Euro coins: Its distribution might not be as normal as you would expect. Journal of Statistics Education, 14(2), 15 pages online.
Google Scholar
Shoemaker, A. L. (1996). What’s normal?—Temperature, gender, and heart rate. Journal of Statistics Education, 4(2), 4 pages online.
Google Scholar
Woodard, R., & Leone, J. (2008). A random sample of Wake County, North Carolina residential real estate plots. Journal of Statistics Education, 16(3), 3 pages online.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The Ohio State University, Columbus, OH, USA
Douglas A. Wolfe
Upstart Network, San Carlos, CA, USA
Grant Schneider

Authors

Douglas A. Wolfe
View author publications
You can also search for this author in PubMed Google Scholar
Grant Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Chapter 7 Comprehensive Exercises

7.1.1 7.A. Conceptual

7.A.1. Consider a random sample of size n = 5 from a population with median η = 0. If the value of the signed rank statistic for these data is W ⁺ = 13, what are the possible values for the sign statistic B?

7.A.2. Consider a random sample of size n = 8 from a population with median η = 0. If the value of the sign statistic for these data is B = 3, what are the possible values for the signed rank statistic W ⁺?

7.A.3. Consider a random sample of size n = 10 from a population with median η = 0. If the value of the signed rank statistic for these data is W ⁺ = 9, what are the possible values for the sign statistic B?

7.A.4. Consider a random sample of size n = 15 from a population with median η = 0. If the value of the sign statistic for these data is B = 10, what are the possible values for the signed rank statistic W ⁺?

7.A.5. Construct a set of data for which the sign test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but the t-test does not reject H ₀ in favor of H _A.

7.A.6. Construct a set of data for which the signed rank test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but the t-test does not reject H ₀ in favor of H _A.

7.A.7. Construct a set of data for which the t-test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but the sign test does not reject H ₀ in favor of H _A.

7.A.8. Construct a set of data for which the t-test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but the signed rank test does not reject H ₀ in favor of H _A.

7.A.9. Construct a set of data for which the signed rank test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but the sign test does not reject H ₀ in favor of H _A.

7.A.10. Construct a set of data for which the sign test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but the signed rank test does not reject H ₀ in favor of H _A.

7.A.11. Construct a set of data for which the t-test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but neither the sign test nor the signed rank test rejects H ₀ in favor of H _A.

7.A.12. Construct a set of data for which the sign test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but neither the t-test nor the signed rank test rejects H ₀ in favor of H _A.

7.A.13. Construct a set of data for which the signed rank test rejects H ₀: η = 0 in favor of H _A: η > 0 at significance level α = .05 but neither the sign test nor the t-test rejects H ₀ in favor of H _A.

7.A.14. How many possible values are there for the sign statistic B and signed rank statistic W ⁺ for a sample of 15 observations?

7.A.15. How many possible values are there for the sign statistic B for a sample of 10 observations? How large would you have to increase the sample size to double the number of possible values for B?

7.A.16. How many possible values are there for the signed rank statistic W ⁺ for a sample of 10 observations? How large would you have to increase the sample size to double the number of possible values for W ⁺?

7.A.17. Suppose we are interested in a 100CL% confidence interval for the median of a population η based on a random sample of n = 15 sample observations. List the exact levels CL that are available if we base our inferences on the sign statistic.

7.A.18. Suppose we are interested in a 100CL% confidence interval for the median of a population η based on a random sample of n = 15 sample observations. List the exact levels CL that are available if we base our inferences on the signed rank statistic.

7.A.19. Construct a set of sample observations for which the sample median, \( \tilde{X} \), is positive, but the median of the Walsh averages, \( \tilde{W} \), is negative.

7.A.20. Construct a set of sample observations for which the sample median, \( \tilde{X} \), is negative, but the median of the Walsh averages,\( \tilde{W} \), is positive.

7.A.21. Consider a sample of n = 20 observations for which the observation with the largest absolute value is positive. What is the minimum number of Walsh averages that could be positive?

7.A.22. Consider a sample of n = 15 observations for which the observations with the largest and second largest absolute values are both negative. What is the maximum number of Walsh averages that could be positive?

7.A.23. Consider a sample of n = 10 observations for which the observation with the largest absolute value is positive and the observation with the second largest absolute value is negative. What are the minimum and maximum number of Walsh averages that could be positive?

7.1.2 7.B. Data Analysis/Computational

7.B.1. Diamonds. In the February 18, 2000 edition of Singapore’s Business Times , an advertisement (as discussed in Chu, 2001) listed data (weight in carats, color purity, grade of clarity, certification body, and value in Singapore dollars) for 308 round diamond stones. These data are provided in the dataset diamonds_carats_color_cost. Viewing these data as a random sample of size n = 308 from the population of all round diamond stones, complete the following statistical analyses. Assume only the minimal assumption for the underlying population with unknown median diamond size (in carats) η.

(a)
Obtain a point estimate of η and find an approximate 94% confidence interval for η.
(b)
Find the P-value for a test of the null hypothesis H ₀: η = 0.5 carats versus the one-sided alternative H _A: η > 0.5 carats.

7.B.2. Diamonds Round Two. Carry out the same statistical analyses prescribed in Exercise 7.B.1, but now under the more stringent assumption that the population of round diamond stone sizes is symmetrically distributed about its unknown median size (in carats) η. Compare your findings with those obtained under the minimal assumption of Exercise 7.B.1. Are you comfortable with the assumption of distributional symmetry? Why or why not?

7.B.3. Diamonds Round Three. Carry out the same statistical analyses prescribed in Exercise 7.B.1, but now under the even more stringent assumption that the population of round diamond stone sizes (in carats) is normal with mean (and median) μ. Compare your findings with those obtained in Exercises 7.B.1 and 7.B.2 under minimal and symmetry assumptions, respectively. Are you comfortable with the assumption that the population of sizes (in carats) for round diamond stones is normally distributed? Why or why not?

7.B.4. How Much Do Euros Weigh? The Euro is the common currency coin for the twenty-eight countries comprising the European Union. According to information from the “National Bank of Belgium”, the 1 Euro coin is stipulated to weigh 7.5 grams. Shkedy et al. (2006) obtained eight separate packages of 250 Euros each from a Belgian bank and their assistants Sofie Bogaerts and Saskia Litière individually weighed each of these 2000 coins using a weighing scale of the type Sartorius BP310, which provides an accurate reading up to one thousandth of a gram. These two thousand weights, indexed by package number, are provided in the dataset weight_of_Euros. Using only the 250 coins from package number 1, conduct the following analyses under the assumption that the population of Euro weights is normally distributed with mean μ.

(a)
Obtain a point estimate of μ and find an approximate 96% confidence interval for μ.
(b)
Find the P-value for a test of the null hypothesis H ₀: μ = 7.5 grams versus the one-sided alternative H _A: μ > 7.5 grams.

7.B.5. How Much Do Euros Weigh—Again? Repeat the statistical analyses from Exercise 7.B.4 using the 500 Euros obtained from combining packages 1 and 2. Compare and contrast the outcome of these two sets of statistical analyses.

7.B.6. How Much Do Euros Weigh—Once More? Repeat the statistical analyses from Exercise 7.B.4 using only the 250 Euros from package number 8. Compare and contrast the results for package 1 versus the results for package 8.

7.B.7. Is 98.6 Degrees Fahrenheit Truly the Mean Body Temperature? It is a widely held belief that the normal body temperature for humans is 98.6^o F. Mackowiak et al. (1992) provide a critical evaluation of this statement through the collection of data from 148 individuals aged 18 through 40 years. The dataset body_temperature_and_heart_rate contains body temperature and heart rate values (artificially generated by Shoemaker, 1996, to closely recreate the original data considered by Mackowiak et al.) for 65 male and 65 female subjects. Conduct the following analyses under the assumption that the population of human body temperatures for healthy individuals is normally distributed with mean μ.

(a)
Obtain a point estimate of μ and find an approximate 94% upper confidence interval for μ.
(b)
Find the P-value for a test of the null hypothesis H ₀: μ = 98.6^o F versus the one-sided alternative H _A: μ < 98.6^o F.
(c)
Carry out similar analyses separately for the 65 male and 65 female subjects. Discuss the results.

7.B.8. House Sizes in North Carolina. The dataset house_lot_sizes contains the information about house and lot sizes for a random sample of 100 properties in Wake County, North Carolina, as collected by Woodard and Leone (2008). Assume only the minimal assumption for the underlying population of house sizes in Wake County, North Carolina with unknown median η.

(a)
Obtain a point estimate of η.
(b)
Find an approximate 96% confidence interval for η.

7.B.9. House Sizes in North Carolina Round Two. Carry out the same statistical analyses prescribed in Exercise 7.B.8, but now under the more stringent assumption that the population of house sizes in Wake County, North Carolina is symmetrically distributed about its unknown median η. Compare your findings with those obtained under the minimal assumption of Exercise 7.B.8. Are you comfortable with the assumption of distributional symmetry? Why or why not?

7.B.10. Lot Sizes in North Carolina. The dataset house_lot_sizes contains the information about house and lot sizes for a random sample of 100 properties in Wake County, North Carolina, as collected by Woodard and Leone (2008). Assume only the minimal assumption for the population of lot sizes in Wake County, North Carolina with unknown median η.

(a)
Obtain a point estimate of η.
(b)
Find an approximate 92% confidence interval for η.

7.B.11. Lot Sizes in North Carolina Round Two. Carry out the same statistical analyses prescribed in Exercise 7.B.10, but now under the more stringent assumption that the population of lot sizes in Wake County, North Carolina is normal with mean (and median) μ. Compare your findings with those obtained in Exercises 7.B.10 under only the minimal assumption about the population. Are you comfortable with the assumption that the population of lot sizes is normally distributed? Why or why not?

7.B.12. Healthy Heart Rate. In Exercise 7.B.7 we discussed the dataset body_temperature_and_heart_rate generated by Shoemaker (1996) for 65 healthy female and 65 healthy male subjects. Conduct the following analyses under the assumption that the population of human heart rates for healthy individuals is normally distributed with mean μ.

(a)
Obtain a point estimate of μ and find an approximate 95% confidence interval for μ.
(b)
Repeat the calculations in part (a) separately for the 65 male and 65 female subjects. Discuss the results.

7.B.13. Movie Ratings. The Movie and Video Guide is a ratings and information guide to movies prepared annually by Leonard Maltin. Moore (2006) selected a random sample of 100 movies from the 1996 edition of the Guide. He compiled the dataset movie_facts containing relevant information about the selected movies. One of the pieces of information provided is the rating that Maltin gave to each of the movies on a rising (worst to best) scale of 1, 1.5, 2, 2.5, 3, 3.5, 4. Assume only the minimal assumption for the population of moving ratings with unknown median η.

(a)
Obtain a point estimate of η and find an approximate 90% confidence interval for η.
(b)
Find the P-value for a test of the null hypothesis H ₀: η = 2.5 versus the one-sided alternative H _A: η < 2.5.

7.B.14. How Long Are Movies? The Movie and Video Guide is a ratings and information guide to movies prepared annually by Leonard Maltin. Moore (2006) selected a random sample of 100 movies from the 1996 edition of the Guide. He compiled the dataset movie_facts containing relevant information about the selected movies. One of the pieces of information provided is the running length of the movies, in minutes. Conduct the following analyses under the assumption that the running length of movies is normally distributed with mean μ.

(a)
Obtain a point estimate of μ and find an approximate 93% confidence interval for μ.
(b)
Find the P-value for a test of the null hypothesis H ₀: μ = 90 min versus the one-sided alternative H _A: μ > 90 min.

7.B.15. How Big Are Movies? The Movie and Video Guide is a ratings and information guide to movies prepared annually by Leonard Maltin. Moore (2006) selected a random sample of 100 movies from the 1996 edition of the Guide. He compiled the dataset movie_facts containing relevant information about the selected movies. One of the pieces of information provided is the number of listed cast members in each movie. Conduct the following analyses under the assumption that the population of number of cast members in a movie is symmetrically distributed about its median η

(a)
Obtain a point estimate of η and find an approximate 97% confidence interval for η.
(b)
Find the P-value for a test of the null hypothesis H ₀: η = 6.5 versus the one-sided alternative H _A: η > 6.5.

7.1.3 7.C. Activities

7.C.1. Coffee, Coffee, Coffee. How many cups of coffee does a typical college student drink in a given day? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.C.2. Beer, Beer, Beer. How many bottles of beer does a typical college student drink in a given week? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.C.3. Sleep, Sleep, Sleep. How many hours of sleep does a typical college student get during the “school nights” of Sunday through Thursday? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.C.4. Sleep, Sleep, Sleep? How many hours of sleep does a typical college student get during the weekend nights of Friday and Saturday? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data. Compare your findings with those you obtained for “school nights” in Exercise 7.C.3.

7.C.5. U.S. Pennies. How long do U.S. pennies remain in circulation? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data. Does your result depend on where the coins were minted (Denver or Philadelphia)?

7.C.6. U.S. Nickels. How long do U.S. nickels remain in circulation? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data. Does your result depend on where the coins were minted (Denver or Philadelphia)? Compare your findings with those you obtained for U.S. pennies in Exercise 7.C.5.

7.C.7. Smart Phones. How much daily time do college students spend on their smart phones? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.C.8. Exercise. How many hours per week do college students spend exercising? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.C.9. Coursework/Studying. How many hours per week (outside of the classroom) do college students spend on coursework, including studying for examinations? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.C.10. Solitary Time. How many hours per week (other than sleeping) do college students spend without conversing with another person, either face to face or through an electronic device? Design a study to collect data to address this question. Carry out an appropriate statistical analysis of your collected data.

7.1.4 7.D. Internet Archives

7.D.1. Social Issues. Search the Internet to find a published research article that uses data from a random sample to address a social issue of particular interest to you. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.2. Treating Disease. Search the Internet to find a published research article that uses data from a random sample to address ways to treat a potentially fatal disease. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.3. Income Inequality . Search the Internet to find a published research article that uses data from a random sample to address the issue of income inequality in the United States. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.4. Sports Injuries. Search the Internet to find a published research article that uses data from a random sample to address the prevalence of youth sports injuries. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.5. Student Loan Debt at Public Universities. Search the Internet to find a published research article that uses data from a random sample to address the issue of student loan debt for graduates of public universities. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.6. Student Loan Debt at Private Universities. Search the Internet to find a published research article that uses data from a random sample to address the issue of student loan debt for graduates of private universities. Discuss their statistical findings in the context of the one-sample setting of Chap. 7 and compare the findings with those in Exercise 7.D.5.

7.D.7. Global Warming . Search the Internet to find a published article that uses data to address an issue related to global warming. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.8. Trans Fat. Search the Internet to find a published article that uses data to address the health impact of trans fat in our diets. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.9. Fracking . Search the Internet to find a published article that uses sample data to address the impact of fracking on society. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

7.D.10. Toxic Algae Bloom . Search the Internet to find a published article that uses sample data to address the causes of toxic algae blooms in a water system. Discuss their statistical findings in the context of the one-sample setting of Chap. 7.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wolfe, D.A., Schneider, G. (2017). Statistical Inference for the Center of a Population. In: Intuitive Introductory Statistics. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-56072-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-56072-4_7
Published: 10 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56070-0
Online ISBN: 978-3-319-56072-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Statistical Inference for the Center of a Population

Abstract

Access this chapter

Notes

Bibliography

Author information

Authors and Affiliations

Chapter 7 Comprehensive Exercises

Chapter 7 Comprehensive Exercises

7.1.1 7.A. Conceptual

7.1.2 7.B. Data Analysis/Computational

7.1.3 7.C. Activities

7.1.4 7.D. Internet Archives

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation