The Basics

  • Choosing a Protein:

    In the "Protein" box at the top of the form, you can enter a uniprot squence identifier (such as P01112 or RASH_HUMAN), or paste a sequence into the box. In general, this should be the complete protein sequence, even if you intend to compute structure only for a part of the protein (a domain). If you know that your protein is a transmembrane domain, set "Is Membrane" to "Yes".

  • Is there a known structure in the protein data bank (PDB) that you want to compare to?

    If not, leave this blank. Otherwise, enter the pdb identifier for "Known Structure", for example "5p21". The known structure will be used for comparison only (not guiding structure calculations).

  • Setting the Job Name

    Give your job a name. This name will be attached to result filenames and used as a default name in certain cases (for instance, if you pasted in a protein sequence, this will become the sequence identifier).

  • Entering your Email Address

    Jobs may take a long while to complete. We will email you a notification when the job does complete, or if it stops before completion.

  • Starting the job

    When you are ready to Launch the job, use the Launch button at the bottom of the page. The job will be queued for processing if there are no problems or missing pieces. Once the job is queued, you will receive an email notification that the job had been submitted. If this is the first job you have launched on our sever, you will first get an email with a web URL to click on to validate your email address. The email will contain a link to the job status page, showing how far the processing has progressed. When the job is complete, you will receive a second email message with links to the results.

Beyond the Basics

  • Domains

    By default, we will generate a multiple sequence alignment for your protein, using the complete protein sequence. If the structure computation should be focussed on a particular domain within the protein sequence, you will need to guide the alignment generation stage (HHblits/jackhmmer) by entering the residue offsets for the beginning and end of the domain (relative to the complete protein sequence). That can be done under the advanced settings section "Alignment Generation / Selection".

  • Alignment quality

    Multiple sequence alignments are generated on the server using HHblits or jackhmmer. It may be that the alignment which is generated has too few members ... or too many which are only weakly related to your sequence of interest. You can tune the inclusiveness of the generated alignment with the E-Value parameter under the section "Alignment Generation / Selection".

  • Using existing multiple sequence alignments

    If you want to use an existing multiple sequence alignment (either your own, or a retrieved one) instead of a generated one, you can do that under the "Alignment Generation / Selection" section.

  • Adjusting the EC set, and the plots of contact maps

    The number of top ranking EC's to be applied during structure calculations will be set by default to a few values near the residue length of the sequence. Contact maps will be plotted with these default EC counts as well. But you can alter the counts of ECs to be used and also adjust the contact map plotting parameters under the sections "Number of Evolutionary Constraints / Number of Structure Variants" and "Contact Maps / Comparing Computed Structure to PDB Known Structure".

  • Other settings

    There are quite a few other settings, but if you don't run into problems and are not interested in exploring the effects of manipulating various low level details, you can ignore them.

When your job is complete, you will be able to download the results as a gzipped tar file. Inside that file will be:

  • the sequence of the whole protein for which 3D structure is being computed (e.g. YES_HUMAN.fa)
  • the sequence of the protein domain for which 3D structure is being computed (e.g. YES_HUMAN_domain_region.fa)
  • the family alignment in fasta and possibly stockholm format (e.g. PF00018_v25.0.fa,
  • a table of weights for each member in the family (e.g. PF00018_v25.0.weight_table)
  • a table of residue pair scores, from which constraints are chosen (e.g. PF00018_P07947_MI_DI.txt)
  • secondary structure predictions for each residue (e.g. YES_HUMAN_psipred.txt)
  • a basic residue offset mapping table (UniProt, alignment, secondary structure) (e.g. YES_HUMAN.indextable)
  • an extended residue offset mapping table, adding the pdb atom locations (e.g. YES_HUMAN.indextableplus)
  • a ranked and flagged table of evolutionary constraint pairs (e.g. PF00018_P07947_DIScores.csv)
  • a pdb structure used for comparison/evaluation (e.g. pdb2hda.ent)
  • the un-minimized calculated structures output from the cns_solve distance geometry step (e.g. PF00018_P07947_10_1.pdb, PF00018_P07947_10_2.pdb, PF00018_P07947_10_3.pdb, ... PF00018_P07947_200_1.pdb, PF00018_P07947_200_2.pdb, PF00018_P07947_200_3.pdb)
  • the energy minimized calculated structures output from cns_solve (e.g. PF00018_P07947_10_1_hMIN.pdb, PF00018_P07947_10_2_hMIN.pdb, PF00018_P07947_10_3_hMIN.pdb, ... PF00018_P07947_200_1_hMIN.pdb, PF00018_P07947_200_2_hMIN.pdb, PF00018_P07947_200_3_hMIN.pdb)
  • the individual RMSD comparisons to the PDB structure from pymol (e.g. PF00018_P07947_10_pymol.txt ... PF00018_P07947_200_pymol.txt)
  • a table of evolutionary constraint pairs with added RMSD values (e.g. PF00018_P07947_DIScoresCompared.csv)
  • a plot of RMSD versus evolutionary constraint score rank for the top ranking constraints (e.g. YES_HUMAN_FP_Plot.pdf)
  • contact maps showing used evolutionary constraints on top of actual residue pair contacts from the pdb structure (e.g. YES_HUMAN_ContactMap_10.pdf ... YES_HUMAN_ContactMap_200.pdf)
  • a list of all files included in the results download (e.g. PF00018_P07947.manifest)

Also the results download will contain files used as input into cns_solve, including:

  • a set of constraint selections (e.g. PF00018_P07947_10_DIs.tbl ... PF00018_P07947_200_DIs.tbl)
  • secondary structure constraints (e.g. YES_HUMAN_SS_distance.tbl and YES_HUMAN_SS_angle.tbl)
  • the residue sequence (e.g. PF00018_P07947.seq)
  • the molecular topology files (mtf) specifying atom bindings (e.g. PF00018_P07947_10.mtf ... PF00018_P07947_200.mtf)
  • the starting point 3D structures (e.g. PF00018_P07947_10_extended.pdb ... PF00018_P07947_200_extended.pdb)

Additional files may be present, such as lists of calculated structures, degree of energy minimization, and MATLAB figure formatted contact maps.

My job stopped during the processing step 'constraint pair scoring'.
One possible reason that jobs stop during this step is that a protein sequence identifier typed in to the "Enter Your Protein" box may occur multiple times in a PFAM alignment file. In such a case, you can disambiguate which particular domain copy to focus on using the "PFAM Member Selector" setting, located immediately after the PFAM Accession setting in the Alignment Generation / Selection section. For example, if the PFAM alignment contains:
>PCBP1_HUMAN/99-162, and
and you wished to calculate the structure of PCBP1_HUMAN/281-343, you could enter "PCBP1_HUMAN/281-343" in "PFAM Member Selector" to specify that the sequence in this particular copy of the domain would become the basis of the processing. When uploading your own alignments, you can avoid problems by adjusting the sequence identifiers when necessary to eliminate duplicates of the name of your protein of interest.
I want to upload my own Multiple Sequence Alignment.
You can do this by selecting "2. Upload Alignment" in the Alignment Generation / Selection section, and choosing the alignment file you want to upload. To avoid problems, make sure that:
  • The file format is FASTA
  • The first sequence in the alignment file is the sequence you wish to compute Evolutinary Couplings for. If your structure is a domain within the complete protein sequence, only the domain subsequence should be present in the alignment.
  • If gaps are dropped from the sequence of interest and capitalization is ignored, it exactly matches the complete protein sequence, or some subsequence of the complete protein sequence
  • The FASTA identifier (up to the first space) for the sequence of interest is not duplicated elsewhere in the alignment
  • All sequences in the alignment file have the same number of text characters (total sequence width including gaps)
  • Each column has been marked as either a "Match" column (by using capital letters for all residue codes and "-" for gaps) or a "Non-Match" column (by using lower case letters for all residue codes and "." for gaps). Any column marked as a non-match column will not be considered for participation in an EC pairing.
  • The alignment must satisfy certain qualifications, having at least 300 members, and at least a 30% of the columns being Match Columns.