Sweeties - E-pRSA

Features

The target sequence is first processed by two different and complementary protein languade models (PLMs), ProtT5 and ESM2, to generate a concatenated vector of 1280+1024=2304 features for each residue. The output consists of a single value between 0 and 1, representing the putative RSA of the residue. A threshold of 20% is also adopted to distinguish Buried and Exposed residues.

Input

The required input is a protein sequence in FASTA format, of a lenght in the range of 50 to 5,000 valid residues (ARNDCQEGHILKMFPSTWYVUXZB).
The user may insert the FASTA in three different ways:

copying and pasting the sequence in fasta format in the panel;
uploading a FASTA file;
activating "Batch Predictions" to upload a multiFASTA file containing up to 1,000 sequences (each no longer than 5,000 residues).

When the input has problems, the input box turns red and it will notify what it is necessary to change; when the box becomes green, the "Submit" bottom can be pressed.

Output

There are three main sections in the result page:

Job Information:
general information about the job are present, including the Job ID, the date of submission and completion, the protein ID, the protein length, the counts and percentages of exposed vs buried predictions, and the count and percentage of predicted interaction sites obtained with ISPRED-SEQ (a previously developed method).
In case of a Batch Job, the number of proteins and total residues submitted are also shown, together with a button to download the results in a tab-separeted format.

Feature Viewer:
Results are visualized with the neXtProt feature viewer.
The first line displays the residues of the sequence; the second and third lines show the output of E-pRSA, respectively the putative RSA and the binary classification (Exposed or Buried). The last line shows predictions done with ISPRED-SEQ on putative exposed residues, highlighting residues that are likely Protein-Protein Interaction Sites.

Data Tables:
At the bottom of the screen, information relative to job is described.
On the left side of the table there is a list of filters that can be select and combined for statistics.
On the top-right of the table, pressing "Download TSV" results will be displayed in the table on a Tab-Separated-Value format file, with an additional column on the left side containing the protein ID. This can be useful in case of multiple file combination.

The Feature Viewer and the Data Tables are only shown for single sequence predictions.