Skip to main content

Part of the book series: SpringerBriefs in Statistics ((JSSRES))

  • 881 Accesses

Abstract

We discuss a statistical method for the classification problem with two groups \(y=0\) and \(y=1\). We envisage a situation in which the conditional distribution of \(y=0\) is well specified by a normal distribution, but the conditional distribution of \(y=1\) (rare observations in imbalanced data sets) is not well modeled by any specific distribution. Typically in a case-control study, the distribution in the control group can be assumed to be normal via an appropriate data transformation, whereas the distribution in the case group may depart from normality. In this situation, the maximum t-statistic for linear discrimination, or equivalently the Fisher’s linear discriminant function, may not be optimal. We propose a class of generalized t-statistics and study asymptotic consistency and normality. The optimal generalized t-statistic in the sense of asymptotic variance is derived in a semi-parametric manner, and its statistical performance is confirmed in several numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baek S, Komori O, Ma Y (2018) An optimal semiparametric method for two-group classification. Scand J Stat 45:806–846

    Article  MathSciNet  Google Scholar 

  2. Dottorini T, Sole G, Nunziangeli L, Baldracchini F, Senin N, Mazzoleni G, Proietti C, Balaci L, Crisanti A (2011) Serum IgE reactivity profiling in an asthma affected cohort. PLoS ONE 6:e22319

    Article  Google Scholar 

  3. Duong T, Hazelton ML (2003) Plug-in bandwidth matrices for bivariate kernel density estimation. Nonparametric Stat 15:17–30

    Article  MathSciNet  Google Scholar 

  4. Duong T (2007) ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J Stat Softw 21:1–16

    Article  Google Scholar 

  5. Efron B (1975) The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc 70:892–898

    Article  MathSciNet  Google Scholar 

  6. Komori O, Eguchi S, Copas JB (2015) Generalized \(t\)-statistic for two-group classification. Biometrics 71:404–416

    Google Scholar 

  7. Lian H (2008) MOST: detecting cancer differential gene expression. Biostatistics 9:411–418

    Article  Google Scholar 

  8. O’Neill TJ (1980) The general distribution of the error rate of a classification procedure with application to logistic regression discrimination. J Am Stat Assoc 75:154–160

    Article  MathSciNet  Google Scholar 

  9. Su JQ, Liu JS (1993) Linear combinations of multiple diagnostic markers. J Am Stat Assoc 88:1350–1355

    Article  MathSciNet  Google Scholar 

  10. Tibshirani R, Hastie T (2007) Outlier sums for differential gene expression analysis. Biostatistics 8:2–8

    Article  Google Scholar 

  11. Wu B (2007) Cancer outlier differential gene expression detection. Biostatistics 8:566–575

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osamu Komori .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Japan KK

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Komori, O., Eguchi, S. (2019). Generalized T-Statistic. In: Statistical Methods for Imbalanced Data in Ecological and Biological Studies. SpringerBriefs in Statistics(). Springer, Tokyo. https://doi.org/10.1007/978-4-431-55570-4_4

Download citation

Publish with us

Policies and ethics