====== SIMAP ======

{{http://boincsimap.org/boincsimap/img/simaplogo.png|SIMAP}}

===== Project Description =====

==== What is SIMAP? ====

Today, protein sequence comparison is one of the most powerful tools in computational biology. It allows characterizing protein sequences based on the information that is preserved in evolution. Many computational methods in biology and medicine are based on protein sequence analysis, e.g. to predict the function and structure of genes and proteins. SIMAP facilitates these methods by providing pre-calculated protein similarities and protein domains.

SIMAP is a database of protein similarities and protein domains. It contains about all currently published protein sequences and is continuously updated. Protein similarities are computed using the FASTA algorithm which provides optimal speed and sensitivity. Protein domains are calculated using the InterPro methods and databases. SIMAP is to our knowledge the only project that combines comprehensive coverage with respect to all known proteins and incremental update capabilities.

==== What is SIMAP used for? ====

Because of the huge amount of known protein sequences in public databases it became clear that most of them will not be experimentally characterized in the near future. Nevertheless, proteins that have evolved from a common ancestor often share same functions (so-called orthologs). So it is possible to infer the function of a non-characterized protein from an ortholog with known function. A well-known example are the investigations about mouse genes and proteins. Their results are also beeing true for orthologous human genes and proteins in many cases. Protein similarities provide information about relations between proteins and are necessary for the prediction of orthologs.

Protein domains (often called function domains) are the structural building blocks of proteins. They are responsible for the activities of a certain protein, e.g. binding of small molecules, catalytic reactions or binding other proteins in large complexes. The knowledge about protein domains is stored in huge repositories like the InterPro databases. The prediction of domains in newly sequenced proteins is based on those database and provides a fully-automatic functional annotation of these proteins. Therefore we calculate protein domains for all proteins in SIMAP, thus providing the largest system for protein function prediction worldwide.

There are many more bioinformatics methods that rely on protein similarity and domains. Our protein similarity database provides pre-computed similarity and domain data and represents the known protein space. This opens completely new perspectives compared to the commonly used method to repeatedly re-calculate such kind of data. SIMAP is regularly updated. The similarity matrix is simply beeing incrementally extended if new sequences occur. The use of SIMAP is completely free for education and public research.

==== What do you do in BOINCSIMAP? ====

SIMAP contains about all currently published protein sequences and is continuously updated. In BOINCSIMAP, we calculate every month the similarities and domains of newly imported proteins, in order to keep the SIMAP database up-to-date. Protein similarities are computed using the FASTA algorithm which provides optimal speed and higher sensitivity compared to the popular BLAST. Protein domains are calculated using the InterPro methods and databases.

The computational costs to calculate the similarity data depend on the square of the number of contained sequences. So the computational effort for keeping the matrix up-to-date is constantly increasing. Our internal resources that perform calculations for SIMAP since years are not longer sufficient to keep track of all new sequences. That's why we implemented a SIMAP-client for the BOINC platform (Berkeley Open Infrastructure for Network Computing) which is based on the FASTA algorithm to detect sequence similarities.
The situation for proteins domains is different but of similar complexity. The computational costs are proportional to the number of sequences and the number of domain models. Due to the growth of the sequence space and the frequent updates in the domain databases the computational effort for keeping the domain predictions up-to-date is constantly increasing.

==== Why do it? ====

The SIMAP database is a huge bioinformatic resource that is used by scientists for very different purposes. Individual researchers use the SIMAP database via the public Web portal, e.g. to investigate the evolution and function of individual proteins. Furthermore, many bioinformatics projects access SIMAP directly via the public Webservice or large-scale download facilities. Your support for BOINCSIMAP keeps the SIMAP database running.

==== How do it? ====

BOINC lets you contribute unused computing power on your home PC to projects doing research in many scientific areas. You can contribute to a single project (like BOINCSIMAP), or to any combination of them. It's easy to participate in a BOINC project: download and install BOINC. You will be asked to select a project and enter your email address and a password. That's it!

==== What's new? ====

SIMAP integrates the human microbiome. We are not alone: our body contains about as many microbial as human cells. Most of these microbes are concentrated in the intestinal tract. But microorganisms are also found on our skin and on many other locations of our body. These microbial are crucial for our health, as they support our organism and prevent us from infections by pathogens. Due to the high diversity of the human microbiome its biological and medical investigation has just started during the last years. Modern research technologies, such as DNA sequencing, play a key role in this research. Sequences, e.g. from healthy and sick patients or from different body locations, are analyzed by sophisticated computational methods. SIMAP supports the researchers studying the human microbiome by integrating the sequences from HMPDACC into its exhaustive similarity matrix. This will help to understand the evolution and function of the microbes living with us.

==== Author(s) ====

SIMAP is a joint project of the University of Vienna (Austria), the Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) and Technical University Munich, Center of Life and Food Science Weihenstephan (both in Germany). Please contact Thomas Rattei (University of Vienna).

==== Credentials ====

  * Website: **http://boincsimap.org/boincsimap/**
  * Join to team: **[[http://boincsimap.org/boincsimap/team_display.php?teamid=212|Ukraine]]**

==== Significant dates ====

  * First seen on: **2005-12-17**
  * Date of completion: <html><font color="green"><strong>Project is Active</strong></font></html>

===== Features =====

==== OSes & Applications ====

^                              ^  Windows x86       ^  Windows x86_64    ^  Linux x86         ^  Linux x86_64      ^  MacOS X (Intel)   ^  MacOS X (PowerPC)  ^
^BOINCSIMAP simap application  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}   |
^BOINCSIMAP hmmer application  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}  |  {{:ru:yes.gif|}}   |