How can I tell if a designed sequence is a good one?

This is a very common question. In general there are two ways in Bench to tell whether a sequence is good: 1) By checking the structure’s score and other metrics in the output of a design: overall lower scores are better and 2) By computational testing with Repack, Relax and Bench Homology. These two approaches are explored in detail below:

1) SCORE METRICS

A straightforward way to see whether a structure has improved after running Design is to look at the structures’ scores. If the score is lower after running Design that is a good indicator that the overall protein structure has improved.

Some protocols will allow partial scores to be calculated — e.g. just at a protein/ligand interface or at a protein/protein interface instead of the entire protein. Lower partial scores are very frequently used to identify the best output sequences and to figure out when a design is improving.

If you are designing for overall stability, then total score is a good metric. If you are designing for a certain function, then a combination of the best partial score and the improved total score should be used to assess the structure.

2) STRUCTURE COMPUTATIONAL STABILITY METRICS

A designed protein sequence really consists of a sequence and its full-atom structure. A good sequence is one where the sequence and structure are at an energy minimum, in other words where the designed structure is bio-physically stable in solution. Cyrus Bench contains many methods to test stability of a structure: Repack & Minimize, Relax, and Homology (not in CAD).

The type of stability testing you should do to test a design depends on how much you have changed the sequence and structure, as follows:

Test A: Small number of mutations (fewer than 10) on a fixed backbone or only lightly minimized backbone

Run Repack & Minimize on the final designs that you want to evaluate. If the side chains shift significantly, especially in the protein core or near the core, then the designs are not as good. However side chains on the surface may shift, especially if they have few contacts.

Test B: More aggressive design of one stretch (e.g. 4 or more mutations in a row), or significant (more than 0.1 Ångstrom) changes to backbone

In this aggressive design scenario if the results pass the test in (A), also try running Relax. If the structures after running Relax drifts less than 1 Angstrom in the designed region, these are good designs. If they drifts less than 0.5 Angstroms these are very good designs. If it drifts more than 1.5 Angstroms, you may consider further redesign. For example you may consider using the output from Relax as input for the next stage of design).

Test C: Mutations to more than 15-20% of residues in the protein or designed (intended) backbone changes of more than 1-2 Angstroms

These can be tested in (B), and it is very positive sign if they are passing in (B). For an even more stringent test consider running these as input in Cyrus Bench Homology. This assumes that you have sequence homologs, which you will by definition if you began your design from a published structure in the PDB or a privately held structure.
If the structures are coming back to within 2 Ångstroms especially in the designed region, then these are very high quality and self-consistent designs. Achieving this type of design can require multiple cycles of design and homology modeling to converge on a sequence/structure pair.

How can I tell if a designed sequence is a good one?

Available content