DDGEmb

DDGEmb is a method for predicting the impact of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on protein stability starting from protein sequence using protein language model embeddings. DDGEmb support both single-point and multi-point variations in input.

Features

DDGemb exploits the power of ESM2 protein language model (Lin et al., 2023) for protein and variant representation in combination with a deep-learning architecture based on a Transformer encoder (Vaswani et al., 2017) to predict the ΔΔG for single- and multi-point variations.
The input variation is firstly encoded as the difference between the ESM2 embeddings of wild-type and variant sequences. The difference matrix is then provided in input to a deep architecture including convolutional and transformer encoder layers.


Input

From the Home Page, a standard DDGemb job including up to 100 variations on a single protein sequence can submited. This requires:

From the Batch job Page, it is possible to analyze up to 2000 variations (either single- or multi-point) occurring on at most 500 protein sequences. This requires:

All variations are checked against the protein sequence (positions must be within the protein length and the wild-type residues need to be consistent). Once validated, the job will be submitted.


Output

There are four main sections in the result page of a standard DDGemb job:

For Batch jobs, the result page will instead show two buttons to download results in JSON and TSV format.