Skip to main content
Log in

Effect of low-complexity regions on protein structure determination

  • Published:
Journal of Structural and Functional Genomics

Abstract

It has been previously shown that protein sequences containing a quasi-repetitive assortment of amino acids are common in genomes and databases such as Swiss-Prot but are under-represented in the structure-based Protein Data Bank (PDB). Structural genomics groups have been using the absence of these “low-complexity” sequences for several years as a way to select proteins that have a good chance of successful structure determination. In this study, we examine the data deposited in the PDB as well as the available data from structural genomics groups in TargetDB and PepcDB to reveal interesting trends that could be taken into consideration when using low-complexity sequences as part of the target selection process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Abbreviations

CESG:

Center for Eukaryotic Structural Genomics

PDB:

Protein Data Bank

NMR:

Nuclear magnetic resonance

HSQC:

Heteronuclear single quantum coherence

References

  1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242

    Article  PubMed  CAS  Google Scholar 

  2. Canaves JM, Page R, Wilson IA, Stevens RC (2004) Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J Mol Biol 344(4):977–991

    Article  PubMed  CAS  Google Scholar 

  3. Daughdrill GW, Chadsey MS, Karlinsey JE, Hughes KT, Dahlquist FW (1997) The C-terminal half of the anti-sigma factor, FlgM, becomes structured when bound to its target, sigma 28. Nat Struct Biol 4(4):285–291

    Article  PubMed  CAS  Google Scholar 

  4. Dunker A, Lawson J, Brown C, Williams R, Romero P, Oh J, Oldfield C, Campen A, Ratliff C, Hipps K, Ausio J, Nissen M, Reeves R, Kang C, Kissinger C, Bailey R, Griswold M, Chiu W, Garner E, Obradovic Z (2001) Intrinsically disordered protein. J Mol Graph Model 19(1):26–59

    Article  PubMed  CAS  Google Scholar 

  5. Goh CS, Lan N, Douglas SM, Wu B, Echols N, Smith A, Milburn D, Montelione GT, Zhao H, Gerstein M (2004) Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J Mol Biol 336(1):115–130

    Article  PubMed  CAS  Google Scholar 

  6. Golding GB (1999) Simple sequence is abundant in eukaryotic proteins. Protein Sci 8(6):1358–1361

    Article  PubMed  CAS  Google Scholar 

  7. Huntley MA, Golding GB (2002) Simple sequences are rare in the Protein Data Bank. Proteins 48(1):134–140

    Article  PubMed  CAS  Google Scholar 

  8. Huth JR, Bewley CA, Nissen MS, Evans JN, Reeves R, Gronenborn AM, Clore GM (1997) The solution structure of an HMG-I(Y)-DNA complex defines a new architectural minor groove binding motif. Nat Struct Biol 4(8):657–665

    Article  PubMed  CAS  Google Scholar 

  9. Kay BK, Williamson MP, Sudol M (2000) The importance of being proline: the interaction of proline-rich motifs in signaling proteins with their cognate domains. Faseb J 14(2):231–241

    PubMed  CAS  Google Scholar 

  10. Li X, Romero P, Rani M, Dunker AK, Obradovic Z (1999) Predicting protein disorder for N-, C-, and internal regions. Genome Inform Ser Workshop Genome Inform 10:30–40

    PubMed  CAS  Google Scholar 

  11. Linding R, Jensen L J, Diella F, Bork P, Gibson TJ, Russell RB (2003) Protein disorder prediction: implications for structural proteomics. Structure 11(11):1453–1459

    Article  PubMed  CAS  Google Scholar 

  12. Linding R, Russell RB, Neduva V, Gibson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31(13):3701–3708

    Article  PubMed  CAS  Google Scholar 

  13. Liu J, Tan H, Rost B (2002) Loopy proteins appear conserved in evolution. J Mol Biol 322(1):53–64

    Article  PubMed  CAS  Google Scholar 

  14. Marcotrigiano J, Gingras AC, Sonenberg N, Burley SK (1999) Cap-dependent translation initiation in eukaryotes is regulated by a molecular mimic of eIF4G. Mol Cell 3(6):707–716

    Article  PubMed  CAS  Google Scholar 

  15. Michelitsch MD, Weissman JS (2000) A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prions. Proc Natl Acad Sci USA 97(22):11910–11915

    Article  PubMed  CAS  Google Scholar 

  16. Nandi T, Dash D, Ghai R, B-Rao C, Kannan K, Brahmachari SK, Ramakrishnan C, Ramachandran S (2003) A novel complexity measure for comparative analysis of protein sequences from complete genomes. J Biomol Struct Dyn 20(5):657–668

    PubMed  CAS  Google Scholar 

  17. Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL (2005) Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 59(3):444–453

    Article  PubMed  CAS  Google Scholar 

  18. Romero P, Obradovic Z, Dunker K (1997) Sequence data analysis for long disordered regions prediction in the Calcineurin family. Genome Inform Ser Workshop Genome Inform 8:110–124

    PubMed  CAS  Google Scholar 

  19. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK (2001) Sequence complexity of disordered protein. Proteins 42(1):38–48

    Article  PubMed  CAS  Google Scholar 

  20. Shin SW, Kim SM (2005) A new algorithm for detecting low-complexity regions in protein sequences. Bioinformatics 21(2):160–170

    Article  PubMed  CAS  Google Scholar 

  21. Sim KL, Creamer TP (2002) Abundance and distributions of eukaryote protein simple sequences. Mol Cell Proteomics 1(12):983–995

    Article  PubMed  CAS  Google Scholar 

  22. Wootton JC (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18(3):269–285

    Article  PubMed  CAS  Google Scholar 

  23. Wootton JC, Federhen S (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem 17(2):149–163

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors thank Dmitry A. Kondrashov and John L. Markley for helpful comments regarding this paper. The authors would also like to thank Sarah C. Cunningham for assistance with the statistics tests. R.M.B was supported by NLM training grant T15LM007359 and DOE training grant DE-FG2-04ER25627. C.A.B. and G.N.P were supported by the Center for Eukaryotic Structural Genomics NIH/NIGMS grant numbers U54 GM074901-01 and P50 GM064598.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George N. Phillips Jr..

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bannen, R.M., Bingman, C.A. & Phillips, G.N. Effect of low-complexity regions on protein structure determination. J Struct Funct Genomics 8, 217–226 (2007). https://doi.org/10.1007/s10969-008-9039-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10969-008-9039-6

Keywords

Navigation