NPD Statistics

Why is the sub-nuclear localisation of a protein important?

Many nuclear proteins that participate in related pathways or processes appear to concentrate in specific sub-nuclear regions of the mammalian nucleus. The importance of this organisation is demonstrated by the dysfunction that is often correlated with the mis-localisation of nuclear proteins in human disease and cancer. Thus, determining the subnuclear localisation of proteins is important for the understanding of genome regulation and function, as well as providing important clues as to the molecular function of novel proteins.

Can we predict the sub-nuclear localisation of proteins from primary sequence?

One important aspect of the NPD project is the development of search algorithms to be used for searching protein and sequence databases. By accumulating statistical data correlating nuclear protein structure and localisation with molecular function we hope to create algorithms to help us predict the sub-nuclear localisation of novel proteins directly from the primary sequence made available by the completion of the Human Genome and other sequencing projects.

The following statistical information, derived from the literature and the NPD project, provides an overview of the complexity of various nuclear compartments.

Distribution of Nuclear Gene-Traps among Various Sub-Nuclear Compartments
Summarised from: Sutherland et al., (2001).

Protein domains and motifs abundant in proteins that co-localise in the nucleus
Compartment Most common motif amongst gene-traps Most common motifs amongst known proteins Abundance amongst all human proteins
Nucleolus DEAD/H box helicase (20%) DEAD/H box helicase (8%)
47th
Splicing speckles RS (40%) RS (23%)
Not known
RRM (30%) RRM (30%)
6th
Chromatin PHD (33%) PHD (12%)
75th
Bromo (33%)
78th
Chromo (17%)
180th
Diffuse C2H2 Zinc finger (12%) Not known
1st
KRAB+C2H2 (7%)
43rd

Derived from Sutherland et al., (2001). The most frequent protein sequence motifs or domains present in the sequences of gene-trapped proteins and the known proteins of various sub-nuclear compartments form the literature were identified using the InterPro or SMART On-Line tools. These were compared with the frequency with which the same motifs have been detected in the human genome sequence.

Biochemical Characteristics of Proteins found in Various Sub-Nuclear Compartments

Compartment Average Size Average pI
Nucleolar 73 kDa 8.3
Splicing Speckles 92 kDa 9.3
Chromatin 119 kDa 7.6
Diffuse 81 kDa 7.3

General trend in nuclear protein size:    Chromatin > Splicing Speckles > Diffuse > Nucleolar.
General trend in nuclear protein acidity:   Acid-----Diffuse-->Chromatin-->Nucleolar-->Splicing Speckles------>Basic
Sutherland et al., (2001).

References

  1. Bickmore, WA and HGE Sutherland (2002) Addressing protein localization within the nucleus. EMBO J. 21(6):1248-1254.
  2. Sutherland HG, Mumford GK, Newton K, Ford LV, Farrall R, Dellaire G, Caceres JF, Bickmore WA. (2001) Large-scale identification of mammalian proteins localized to nuclear sub-compartments. Hum Mol Genet. 10(18):1995-2011.