General information about the core-ortholog sets
- The criterion of transitive closure is applied to compile groups of orthologous genes from pairwise orthology predictions with InParanoid. InParanoid orthology predictions are in most cases derived from the Inparanoid database. InParanoid orthology predictions for the Plant core ortholog set were run locally at the CIBIV. The number of sequences representing each proteome are given in parenthesis.
- All protein sequences for a core-ortholog cluster are aligned with MAFFT (Katoh et al. 2005).
- Hidden Markov Models are generated for each group of orthologous proteins using the program hmmbuild with default options. HMMs are calibrated using hmmcalibrate.
The list of core-ortholog sets is far from being exhaustive and may not be exactly what you would like to have for your analysis. Currently, we are working on a solution to generate core-ortholog sets on the fly. Until this is implemented, we offer in a case by case way to compile custom core-ortholog sets.
Modelorganisms
This set of hmms is compiled from genes for which orthologs are identified in the following 5 species:
- Homo sapiens (22,983)
- Ciona intestinalis (14,278)
- Drosophila melanogaster (13,854)
- Caenorhabditis elegans (20,084)
- Saccharomyces cerevisiae (5,788)
A total of 1031 genes fulfill this criterion. This dataset should be suitable for analyzing deep splits in eukaryote evolution, since the corresponding genes must have been present already in the common ancestor of animals and fungi. Furthermore, orthology assignment is possible with our stringent criteria, indicating that these genes presumably have neither undergone massive duplication and deletion events nor have they diverged too much.
Vertebrates
This set of hmms is compiled from genes for which orthologs are identified in the following 6 species:
- Homo sapiens (22,983)
- Mus musculus (23,132)
- Tetraodon nigroviridis (28,005)
- Gallus gallus (16,715)
- Xenopus tropicalis (18,473)
- Ciona intestinalis (14,278)
A total of 2,909 genes fulfill this criterion. This dataset should be particularly suitable for analyzing vertebrate evolution, since the corresponding genes must have been present already in the common ancestor of all vertebrates. Furthermore, orthology assignment is possible with our stringent criteria, indicating that these genes presumably have not undergone massive duplication and deletion events on the vertebrate lineage.
Insecta
This set of hmms is compiled from genes for which orthologs are identified in the following 5 species:
- Homo sapiens (22,983)
- Anopheles gambiae (13,277)
- Apis mellifera (13,448)
- Drosophila melanogaster (13,854)
- Aedes aegypti (15,419)
A total of 3,210 genes fulfill this criterion. This dataset should be particularly suitable for analyzing insect evolution, since the corresponding genes must have been present already in the common ancestor of all insects. Furthermore, orthology assignment is possible with our stringent criteria, indicating that these genes presumably have not undergone massive duplication and deletion events on the insect lineage.
Metazoa_extended
This set of hmms is compiled from genes for which orthologs are identified in the following 12 species:
- Vertebrates
- Homo sapiens (22,983)
- Mus musculus (23,132)
- Canis familiaris (19,314)
- Tetraodon nigroviridis (28,005)
- Gallus gallus (16,715)
- Xenopus tropicalis (18,473)
- Insects
- Anopheles gambiae (13,227)
- Apis mellifera (13,448)
- Drosophila melanogaster (13,854)
- Nematodes
- Caenorhabditis elegans (20,084)
- Caenorhabditis briggsae (19,334)
- Caenorhabditis remani (25,595)
In this dataset we do allow for genes being lost in either the insects or the nematode species. The requirements are the following:
- An ortholog must be identifiable according to our criteria in all vertebrate species.
- An ortholog must be identifiable in at least one of the insect species.
- An ortholog must be identifiable in at least one of the nematode species.
A total of 2,503 genes fulfill these criteria. These genes should be particularly suitable to analyze invertebrate and vertebrate evolution. However, it should be kept in mind that loss of a gene in certain subtrees of the metazoan tree is presumably more likely to have happend than with the other datasets. Please not also, that only for the vertebrate species it is guaranteed that a gene corresponding to an ortholog cluster is represented. Therefore, only the vertebrate species are listed as 'reference species' in the hamstr search.
Plants
This set of hmms is compiled from genes for which orthologs are identified in the following five plant species:
- Arabidopsis thaliana (Inparanoid, 26,819)
- Oryza sativa (Inparanoid, 77,843)
- Physcomitrella patens (JGI v. 1.1, 35,937)
- Selaginella moellendorffii (JGI v. 1.0, 22,284)
- Chlamydomonas rheinhardtii (JGI v. 1.0, 14,597)
A total of 1,242 genes fulfill the transitive closure.