MuD Logo Annotating the affect of missense mutations on the
protein's function using sequence and structural features

The discrimination between functionally-neutral amino acid substitutions and deleterious mutations that harm protein function is of major importance to our understanding of diseases in molecular details. The rapidly growing amount of experimental data enables the development of computational tools to facilitate the annotation of these substitutions. To this end, we estalished MuD (Mutation Detector). MuD utilizes structural- and sequence-based features with a Random Forests classifier to assess the impact of a given substitution on the protein's function and/or structure. MuD was benchmarked using a cross-validation process on a subset of the Bromberg and Rost dataset(1) and achieved Mathew's correlation coefficient of 0.49 which is as good as current tools.

The uniqueness of MuD is its interactivity. The user can guide the classifier by supplying structural data to improve the prediction accuracy. The strength of this semi-automatic scheme is well demonstrated on three independent cases (Table 1), which were not included in the Bromberg and Rost dataset(1) used to develop the predictor. Following the scheme of our server, first we removed ligands which are biologically irrelevant from the crystal structures and then selected the correct oligomerization state according to the literature. Thus the prediction improved (Table 1) surpassing the performance of SNAP(1), SIFT(2) and PolyPhen(3).


The Sub-BR dataset

The Sub-BR utilzed for training and benchmarking MuD is a balanced set of non-neutral and neutral substitutions. It comprises of 12,133 substitutions from 1178 proteins. Out of the 12,133 of the Sub-BR dataset, 10,253 substitutions originated from the PMD dataset (7) and 2,065 are evolutionary model-based substitutions. The structural classification of the Sub-BR proteins according to the Structural Classification Of Proteins database (SCOP) (8) can be found in the link below:

Families, Super-families, Folds, Classes


  1. Bromberg, Y. and Rost, B. (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res, 35, 3823-3835.
  2. Ng, P.C. and Henikoff, S. (2001) Predicting deleterious amino acid substitutions. Genome research, 11, 863-874.
  3. Ramensky, V., Bork, P. and Sunyaev, S. (2002) Human non-synonymous SNPs: server and survey. Nucleic acids research, 30, 3894-3900.
  4. Markiewicz, P., Kleina, L.G., Cruz, C., Ehret, S. and Miller, J.H. (1994) Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. Journal of molecular biology, 240, 421-433.
  5. Rennell, D., Bouvier, S.E., Hardy, L.W. and Poteete, A.R. (1991) Systematic mutation of bacteriophage T4 lysozyme. Journal of molecular biology, 222, 67-88.
  6. Loeb, D.D., Swanstrom, R., Everitt, L., Manchester, M., Stamper, S.E. and Hutchison, C.A., 3rd. (1989) Complete mutagenesis of the HIV-1 protease. Nature, 340, 397-400.
  7. Kawabata, T., Ota, M. and Nishikawa, K. (1999) The Protein Mutant Database. Nucleic acids research, 27, 355-357.
  8. Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic acids research, 36, D419-425.