One of the most effective ways to improve protein stability and solubility is to use evolutionary information to guide mutation selection. This workflow shows how you can create a multi-sequence alignment from homologs in the NCBI database, then sample the mutations that are most prevalent in the alignment with one of our design tools. This will test the point mutations from homologs to see if any of them improve the structure. This can often give you much better design improvements than many other design approaches.
- Run PSSM
- Split protein into subregions
- Run Design with each subregion individually with the PSSM as a guide
- Pool best results from subregions for a full protein Design
- Select top hits for wet lab verification
Step One: Run PSSM
You can create a PSSM on your own or send us the sequence so we can do it for you at email@example.com. Instructions for creating your own PSSM can be found here.
Step Two: Split protein into subregions
Our Design tools usually can’t run Design for all positions simultaneously. We limit the number of permutations that you can run to 10^11. Select subregions that have interacting side chains so co-mutating side chains can be sampled in the same run. You also may want to have some overlap between subregions in order to improve sampling at the subregion edges.
In the following example, I am running Design for a small protein with the NMR structure of the Internal UBA Domain of HHR23A, 1iFY.pdb. I loaded the structure into CAD, then ran 10 repeats of Relax in order to optimize the structure. I did a final Minimize on the best Relax. This took optimized the structure from an initial score of 559 REU down to -65 REU which is an excellent amount of improvement.
I created three subregions. One subregion only contained the core (green). The surface residues were split into two subregions (purple and blue).
Step Three: Run Design with each subregion
I selected Relax Design in order to allow full structure conformational sampling which is often necessary to accommodate more aggressive mutation sampling. You must first choose a Selector, then you can Upload a PSSM. If you do it in the reverse order, the PSSM will have to be reloaded. Then you should see the mutations for this subregion from the PSSM like below.
By default, the PSSM score threshold is 0, but you can increase or decrease that in order to sample more or less obscure mutations present in homologs in the PSSM. The score is a log scale representation of point mutations found in a multi-sequence alignment. Be sure to select a large number of repeats because there are many potential trajectories available when a lot of mutations are possible. People often choose 100 or 500 repeats depending on how large of a mutation library they wish to test. A larger library is often desired for a challenging target.
Step Four: Pool best results from subregions for Design on the whole protein
Once the Design has run, you can look at the Sequence Logo in order to quickly see what mutations were favored at each position. For the above design, the Sequence Logo is shown below.
Once you have the results from all subregions, you can make a design run which uses the favored mutations from these runs for a single design run. Here are the preferred mutations for the three subregions for this example:
As you can see above, The preferred residues found during the three design runs are listed. We can calculate the number of permutations if we ran all of the residues as things we will allow to be sampled for a design run done for the whole protein. In this example, the final number of permutations is 3.9 x 109, which is under our threshold. So I can run all these residues for one Relax Design with 100 repeats.
Once this design run is finished, we see further narrowing of the preferred mutations.
Step Five: Select top hits for wet lab verification
You may want to test all the unique sequences from this design run. You could also narrow down the results by selecting the best designs and running a few repeats of Relax. This can often further improve some mutants if full conformational sampling wasn’t done during the Relax Design trajectory. Some mutants will not improve with Relax. In the example below, the last Relax Design were ranked by score. Then the top 5 sequences were run with 10 repeats of Relax. Of all these, the most favorable sequence is shown below.
Above, we show the best mutant in blue aligned to the optimized wild type in green. The design achieved 16 REUs of improvement, which is a significant improvement for such a small protein. Mutated residues are shown as darker.