<!------------Discussion----------->
The molecular weight of a protein in kilodaltons (kD) can be 
estimated by adding 10% to its length in amino acids, then dividing by 10. 
For example, a 400-residue protein is approximately 44 kD,
based on an average of 110 daltons per amino acid, assuming
random composition.
<P>
More accurately, as is done in the Protein Browser, a protein's weight 
can be calculated from the individual molecular weights of the amino 
acids. Processed signal peptides and covalent modifications 
(possibly transient or partial) must be taken into account, depending on 
the situation.  
<P>
The molecular weights displayed in the Protein Browser have been 
taken from UniProtKB when accession numbers were available; otherwise, 
they have been based on genomic sequence as processed with the UniProtKB 
<A HREF="http://www.expasy.ch/tools/pi_tool.html" TARGET=_blank>molecular 
weight tool</A>. This reflects the removal of isotopic 
abundances and predicted signal peptides, but not modified amino 
acids. Numbers have been rounded to the nearest kD, which 
suffices for gel migration (although clearly not for mass 
spectroscopy). For genomics purposes, the amino acid count of a
full-length coded protein is more useful than its molecular weight. 
<P>
It is likely that intrinsic membrane proteins and very long 
proteins are under-represented in this data set. The display thus calls 
attention to unusual molecular weight values that 
could suggest multiple domains or chain fusion, which can be 
lineage-specific events.  On predicted protein tracks, 
molecular weights below 100 kD may indicate an incompletely-predicted protein.  
<P>
Mammalian molecular weights average 56 kD (627 aa) in the data 
set used. The distribution exhibits a very long tail of a few 
proteins with much higher molecular weights. Larger proteins are
under-represented in genomic predicted tracks because of 
incomplete cDNAs and long introns. Conversely, short proteins may be 
merged or missed altogether. At present, mammalian proteins 
appear to be somewhat smaller than the average <em>Drosophila</em> protein (649 
aa), but approximately 80% larger than the average <em>E. coli</em> 
protein (350 aa). Larger size suggests increased domain content. 
Because 120 kD can suffice to encode a stand-alone functioning 
enzyme (e.g. lysozyme), 3-5 domains should be expected for the 
average mammalian protein. 
<P>
<CENTER>
<IMG height=256 width=288 src="mw.jpg"> 
</CENTER>
