In order to access the job launch pages, you need to have a registered openID provider, and also request access to use the EVfold server. Th
ere is a link to Request access on the login page. By filling in the resulting form, your request for access will be registered with us. We
will regularly review and approve these requests and you will receive email when access is available to you.
Your job name can be any text that helps you remember and distinguish the job from others that you launch. This job name will appear in your
result file names. Remember that the server removes spaces and special characters and shortens long job names.
No, if you leave this blank, the job name will be used. This setting is only used for putting titles on plots, such as contact maps.
If the default E-value (-3) generates a poor quality alignment because too many remote members were added, you may need to redo your job wit
h a more negative e-value. If your multiple sequence alignment is too small (too few members) you might try adjusting to a less negative num
ber. Some exploration may be needed.
It may be that your alignment contains too many poorly-matched members. This introduces "gappy" rows. If you see this, you may need to incre
ase the E-value to exclude more poorly-matched members. You also can adjust the setting called "Match Column Gap Pct Limit" under the Genera
te Alignment settings. A higher threshold will make more columns become upper case "match columns", but be aware that a high proportion of g
aps in too many columns is a warning sign that the alignment quality is not good, and prediction quality may suffer.
Currently, our server requires user-created alignments to follow certain conventions. Please read the related topic on the Tutorial page and
make sure your alignment is properly constructed.
The EVfold server needs to find the exact member of the family alignment which corresponds to the complete protein sequence you entered in t
he "Protein:" box in the Basic Settings. Sometimes the pfam alignment contains several references to the same protein name, so you can speci
fy the multiple sequence alignment member that should be used by giving the complete FASTA identifier in the alignment. For example, F1N151_
BOVIN/454-553 is one of four members for F1N151_BOVIN in PF00085.
Only Pfam-A accession numbers are retrieved automatically by the EVfold server at this time. If your accession number comes from a different
set, you may try to upload the alignment yourself using the "Upload Alignment" function.
Currently, this section enables you to visualize computed EC pairings in the context of a known structure. If you are working with an unknow
n structure, please use the "Predict 3D Structure From Sequences" option. In the future, we plan to make available visualization of EC pairi
ngs even without a known structure, but for now a known structure is required.
We use the residue number from the complete protein sequence as the frame of reference for generating EC pairings and making plots, such as
contact maps. In order to compare EC pairings to the geometric locations of the residues in a Known Structure pdb file, the residues must be
located. To find corresponding residues in the structure file, a Smith-Waterman alignment is performed. This alignment is fairly tolerant o
f small gaps and mismatches in pdb structure files, but the two sequences cannot differ greatly. Thus, It is important that the pdb file use
d as the known structure contain a sequence that is nearly identical to the sequence of the domain in the multiple sequence alignment. If mi
smatches are present, the residue numbers (relative to the sequence specified in the "Protein:" box) can be entered into the setting "Free M
ismatch Offset List", to aid in forming a good quality alignment.
While a job is running, there is a button available on the status page with title "Stop This Job". If that button is clicked, the job is sto
pped before completion and you will see these messages. You will also receive these messages if your job has been running for any period longer than three days. This is our initial approach for li
miting job congestion on our limited server processing capacity. If your job ran for three days and then was stopped, try alternate settings
which take less time (for example, DI scoring instead of PLM).
Sequences are retrieved by name or accession number from UniProtKB release 2013_08. Alignments are retrieved from Pfam release 27.0, or are
constructed using HHblits database uniprot20_2013_03 or using HMMER jackhmmer from uniref100 release 2013_08. PSIPRED Secondary structure pr
edictions are based on uniref90 release 2013_08. MEMSAT-SVM secondary structure predictions are based on collections of models included with
release memsat-svm1.2. Structures are retrieved from a local mirror of RCSB PDB, which is updated nightly.

SEQUENCE ALIGNMENT

HHblits

Remmert M, Biegert A, Hauser A, Soding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2011 Dec 25;9(2):173-5.

Jackhmmer

Johnson LS, Eddy SR, Portugaly E. Hidden markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics. 2010 Aug 18;11:431,2105-11-431.

CONTACT PREDICTION

PLM (also known as plmDCA)

Ekeberg M, Lovkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: Using pseudolikelihoods to infer potts models. Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707.

DI (also known as mf-DCA)

Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6(12):e28766.
 
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301.

SECONDARY STRUCTURE AND TRANSMEMBRANE TOPOLOGY PREDICTION

PSIPRED

Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999 Sep 17;292(2):195-202.

MEMSAT-SVM

Nugent T, Jones DT. Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics. 2009 May 26;10:159,2105-10-159.

FOLDING

CNS

Brunger AT. Version 1.2 of the crystallography and NMR system. Nat Protoc. 2007;2(11):2728-33.

DATABASES

UNIPROT

UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015 Jan 28;43(Database issue):D204-12.

PFAM

Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: The protein families database. Nucleic Acids Res. 2014 Jan 1;42(1):D222-30.

PDB

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235-42.

The results tarfile contains the information available through the job status webpage, plus a complete set of structure predictions with evaluation and many intermediate files from the evfold pipeline such as the details of the secondary structure prediction, or the mapping of residues between uniprot sequence and comparison pdb structure. The results are categorized into subdirectories, which are:

  • alignment
  • contact_maps
  • ev_couplings
  • job_config
  • residue_numbering
  • sequence
  • models_compared_to_known_structure
  • structure_inputs
  • structure_outputs

The details of the files found in each directory is included in a README.txt file. To see a copy of README.txt click here