Features
We develop a novel database Alpha&ESMhFolds which allows the direct comparison of AlphaFold2 and ESMFold predicted models for 42,942 proteins of the Reference Human Proteome, and when available, their comparison with 2,900 directly associated PDB structures with at least a structure to sequence coverage of 70%.
How to browse
From the home page of the web server, it is possible to query the model database in two ways:
- UniProt accession
- FASTA sequence
- Search page for advance search with different criteria (eg: gene name, TM score)
Results
For all entries in the model database, the results page shows multiple information.
- Protein information: At the top of the page, a table is displayed that contains general information on the selected protein, including:
- Protein name, gene name and UniProt accession (with a cross-link to the corresponding UniProt page).
- Sequence length.
- Protein source (either SwissProt or TrEMBL).
- Presence of Signal or Transit peptide.
- Highest-coverage PDB Chain (with a link to the corresponding PDB page). This field is present only if the PDB chain has a sequence coverage of the ATOM residues of at least 70%.
- Alternatively, if the sequence is highly similar (more than 50% of similarity over a coverage of at least 70%) to an entry endowed with a PDB structure, a link to the putative template is shown.
- Comparing ESMFold and AlphaFold2 models: A tab displays the comparison between the AlphaFold2 and ESMFold computed structures. This includes:
- The sequence alignment obtained from the superimposition of the two structural models. Residues are coloured according to the model confidence (pLDDT), while a green bar highlights residues which correctly match. A FASTA-like file containing the alignment is available for download. The file will contain two gapped sequences with their corresponding ids.
- The structure superimposition of the two models. Two different colours are adopted to distinguish ESMFold models (green) and AlphaFold2 models (purple). The graphical viewer is our implementation of PDBe Mol* and can be similarly interacted with (see original documentation; some operations are not available in our viewer). Additionally, residues shown in the sequence alignment can be clicked to zoom in on the corresponding position. Both models, as well as their superimposition, are available for download in PDB format.
- Different statistics, including:
- Number and percentage of residues with pLDDT over different thresholds (50, 70 and 90), for each model individually and for the consensus.
- Information regarding the sequence alignment (start and end position for each model, length of the alignment, number of matches, mismatches and gaps).
- Scores of the structure alignment (TM-score, RMSD, GDT and TM-score obtained when considering only residues with pLDDT over different thresholds).
- Comparing predicted models and PDB chain: Finally, if the entry is endowed with a PDB chain, two similar tabs show the comparison between the experimental model and each predicted model. In this case, PDB chains are coloured in white in the graphical viewer, and the statistics displayed vary slightly.