Team:Calgary/Modelling/MATLAB

From 2010.igem.org

MATLAB

Matrix Laboratory Software

The protein production simulation was produced using the Matrix Laboratory software, MATLAB produced by MathWorks. Specifically this project uses the Simbiology application, which is a collection of computational tools for simulating biological processes. The power of this tool lies in its ability to build multi species and reaction models, then simulate how the species will interact.


Protein Production Abstraction

Our proposed model relies on an abstraction of the process of protein production, taking into account the formation of incorrectly folded intermediates and the presence of aggregated misfolded protein. Our proposed model is shown below

Figure 1. The proposed path consists of a number of species. The first one identified is "natal peptide" this species represents the initial amount of amino acid chain being produced by the ribosome. It is the "starting" amount of potential protein/misfolded protein/inclusion body present in the system.

The second species identified is the "unstable protein" this references proteins that have not reached their fully stable conformations yet or have been destabilized by environmental conditions. Unstable protein has the potential to be degraded by cellular proteases which results in the "degraded protein" species. Additionally unstable proteins have the potential to clump together to form the "inclusion body" species. The inclusion body species is also degraded by proteases so the degraded protein species is also a potential result.

The final species is the stable functional protein form. This stable form is also degraded by proteases but in very very small amounts


Relationship Equations

In order to begin investigating the factors that may affect the final state of the natal peptide it first becomes important to define how each of the species in the system interact. This is where the MATLAB software becomes useful. The proposed model pathway can be represented in the MATLAB Simbiology Toolbox.

Figure 2. Each of the blue circles correspond to the species identified in the previous section, while the yellow circles define the reaction that occurs between the two species. The Simbiology software uses these defined reactions to determine the amounts of each species present in the system as the species interact over a period of time. The results of the simulation are described in a later section

The reactions present in the system are described by the following equations


natal peptide -> unstable protein

This reaction is an irreversible reaction where all of the natal peptide present becomes unstable protein with a rate constant of 1. This reflects the assumption that all of the initial amino acid sequence will at some point be present as unstable protein capable of being degraded or forming inclusion bodies

unstable protein + inclusion body <-> 2 inclusion body

This reaction is a reversible reaction that defines how unstable proteins form into inclusion bodies. The simulation assumes that there is a baseline very small concentration of inclusion body present. This base value may interact with unstable protein to form more inclusion bodies resulting in a positive feed back that begins to form higher concentrations of inclusion bodies. The rate constant at which this process occurs is dependent on a number of factors discussed in a later section. Inclusion body formation represents a potential outcome of the protein expression system

unstable protein -> degraded protein

This reaction represents unstable protein being degraded by proteases. This reaction follows the Henri-Michaelis-Menten equilibrium process. In this process there is a pseudo equilibrium present that is determined to be one way, as degraded protein does not return to its pre degraded state. The constant for this reaction is a set value that is relative to the particular protease. For simplicity a single value was selected to represent this impact. Once the unstable protein has been degraded it is removed from the simulation system and can not form inclusion bodies or functional proteins. As a result degraded protein represents one of the potential "outcomes" of a protein expression experiment

inclusion body -> degraded protein

Our system assumes that inclusion bodies are degraded by the same process as unstable proteins, but that the rate of degradation is slower than that of a single unstable protein. This is proposed to be the result of the protease being unable to access individual proteins of the inclusion body for degradation. As a result the process is considerably slower than the degradation of straight unstable proteins. The difference is reflected in the rate constant for each of the two reactions

unstable protein <-> functional protein

This reaction represents the process of the unstable protein becoming stable functional protein. The reaction is reversible as different environmental conditions, discussed later on, can determine whether or not the protein is in a stable state or unstable state.


Factors Under Investigation

For the purpose of our model there were two categories of factors that we investigated. These categories are Environmental factors and sequence factors. It is important to note that these divisions are arbitrary and don't necessarily exist in reality. Specifically these categories were created for the convenience of organizing the factors being investigated and determining how to best evaluate their impact.

Environmental factors

The collection of factors refers to those features of the cells environment that will have an impact on the stability of proteins being produced. When we use the term environment we are referring to both the external temperature at which the cell is growing, the pH of its environment and the concentration of protein within cellular compartments. These factors are not determined by features of the protein being produced but will still affect the likelihood of inclusion body formation and/or protein instability.

Defining environmental impact on protein stability

We have assumed that environmental conditions affect inclusion body formation by altering the equilibrium of the following equation:

Functional Protein <-> Unstable Protein <-> Inclusion Body

A publication by Brandt et al supports this concept as they showed that isolated protein in a solution can be converted back and forth from its stable form and inclusion body form based on temperature and pH. Specifically that high temperatures and strongly basic pH will encourage the formation of inclusion bodies. This means that equilibrium constant increases as the formation of inclusion bodies has become more favourable

Critical Assumption

From this information we have assumed that altering the environmental conditions of temperature, pH and protein concentration will have a quantifiable effect on the rate constants for the above equation. Currently the exact impact of these factors hasn't been determined. The results section only describes rate constants determined for convenience based on qualitative understanding of the processes.

Sequence Dependent Factors

For these factors we have tried to look at the features of the mRNA and amino acid sequence that could impact the likelihood of inclusion body formation. As a general rule sequence factors were selected based on how the particular feature affects the time taken for the sequence to reach its fully folded stable confirmation. The reason for this based on the assumption that if the protein has more intermediate stages or is more thermodynamically stable in a non folded confirmation then there is more time for the unstable proteins to interact and begin the formation of inclusion bodies.

Critical Assumption

Sequence features such as scarce amino acids ( Tryptohphan ), mRNA structural features that inhibit translation time through the ribosome and the ratio of hydrophobic amino acids to charged amino acids all have a quantifiable effect on the rate constant at which unstable protein becomes stable protein.

This evidence has been indirectly supported from the literature and discussions with researchers in the field. However at this point the precise impact of these factors is still under investigation. As such for the results section, hypothesized relationships alone have been used.


Result Cases

The result cases represent the preliminary testing of the model to see if the model simulation can be used to analyze the protein production process under different conditions. The rate constants used for each of the cases were selected for convenience in order to determine if relevant results could be obtained. This means that the rate constants used in the models do not directly correspond to biological data. However the relationships between the different rate constants are representative of biological data. It is important to note that this data is very preliminary and only demonstrates that the modelling approach we have taken can be used to investigate the factors affecting protein misfolding.

Result Case 1: Successful Expression of Stable Protein

In this case the initial amount of peptide produced is very stable and the equilibrium favours the fast formation of stable correctly folded protein ( red line). Unstable protein is still produced in this scenario, but is quickly degraded by and does not form inclusion bodies. Additionally the unstable protein is not exposing significant hydrophobic amino acids. This also helps prevent the formation of inclusion bodies.


Figure 3. In this figure the concentration units are arbitrarily mM and the time value is in seconds. These units were selected for convenience and are not assumed to be accurate for the process being modelled. This issue is discussed in more detail in the conclusion section

Natal peptide(blue line)

An initial arbitrary amount of natal peptide ( 10 M ) is produced in the cell

Unstable Protein (green line)

There is still a peak in unstable protein, as this species will still be present. But this amount will be degraded quickly.

Functional Protein (red line)

In this scenario the natal peptide is quickly driven towards stable protein.

Inclusion Body (light blue line)

In this scenario very few inclusion bodies are produced as there is not enough unstable protein present to induce nucleation

Degraded Protein (purple line)

Unstable protein and inclusion bodies are degraded. In this model unstable protein is degraded faster than inclusion bodies due to the size difference


Result Case 2: High hydrophobic to charged amino acid ratio with a protein having many unstable intermediates

In this case the produced peptide is highly unstable and is composed of significantly more hydrophobic amino acids than charged amino acids. This indicates that in the unstable form a significant amount of the exposed amino acids will be hydrophobic. This state of unstable protein will strongly drive towards inclusion body formation.

Figure 5. In this figure the concentration units are arbitrarily mM and the time value is in seconds. These units were selected for convenience and are not assumed to be accurate for the process being modelled. This issue is discussed in more detail in the conclusion section

Natal Peptide(blue line)

An initial arbitrary amount of natal peptide ( 10 mM ) is produced in the cell

Unstable Protein (green line)

A high hydropathy value increases the time required for the protein to fold into its correct shape. This means more unstable protein will be present in the equilibrium. The green line on the graph represents this value. The peak on the graph represents the point of nucleation whereby the concentration of unstable protein reaches a peak that rapidly increases the drive to inclusion bodies.

Functional Protein (red line)

The increased hydropathy content decreases the presence of functional protein.

Inclusion Body (light blue line)

The concentration of inclusion body increases rapidly when the concentration of unstable protein reaches the point of nucleation ( peak of the green line )

Degraded Protein (purple line)

Unstable protein and inclusion bodies are degraded. In this model unstable protein is degraded faster than inclusion bodies due to the size difference


Result Case 3: Over Produced Protein at High Temperature

In this scenario the protein is highly stable and drives strongly towards it’s stable functional conformation. However the over production of the peptide means there are high concentrations of unstable protein present, and the higher temperature increases the kinetic movement of the unstable protein. This causes the unstable proteins to collide more frequently and form inclusion bodies before they have a chance to become stable protein. Since the protein is highly stable the nucleation point, the peak of the green line, occurs at a higher concentration and greater time value than that of the unstable protein in the previous case.



Figure 5. In this figure the concentration units are arbitrarily mM and the time value is in seconds. These units were selected for convenience and are not assumed to be accurate for the process being modelled. This issue is discussed in more detail in the conclusion section

Natal Peptide(blue line)

An initial arbitrary amount of natal peptide ( 10 mM ) is produced in the cell

Unstable Protein (green line)

The highly stable protein produced in this scenario spends very little time in the unstable form. It is quickly converted to stable protein or driven into an inclusion body. In this scenario the concentration and temperature causes the equilibrium to favor inclusion bodies over the functional protein.

Functional Protein (red line)

In this scenario the functional protein is very stable and favored. However the concentration of unstable intermediate causes the intermediate proteins to be caught up in inclusion bodies prior to formation of correctly folded protein.

Inclusion Body (light blue line)

The concentration of inclusion body increases rapidly when the concentration of unstable protein reaches the point of nucleation ( peak of the green line ). This point is reached more slowly than in the previous scenario as the protein is more stable, and a certain concentration must be met before nucleation occurs.

Degraded Protein (purple line)

Unstable protein and inclusion bodies are degraded. In this model unstable protein is degraded faster than inclusion bodies due to the size difference


Conclusions

There are a number of immediate problems with this model approach that affect the accuracy of the results. The first and most significant is that the values for the amounts of species and the kinetic constants were selected for convenience. This means they were selected such that their relative relationships would allow us to determine if the model could be made to match the behaviour seen in the literature. As a result the the outputs of the simulation can not be taken as "real" because they do not use true values. The second major issue is the equilibrium equations proposed for each of the reactions are abstractions and may not be the best ways of relating the species. The third major issue is that there aren't any clearly defined relationships between each of the factors being investigated and misfiling. Therefore the relationship between factors such as temperature and sequence features such as hydrophobic/charged amino acids are accounted for in a qualitative way only.

Accounting for these caveats this model was still a success in a number of ways. The results demonstrate that the MATLAB Simbiology software can be used to simulate the process of inclusion body formation. The graphs obtained match closely, albeit in a qualitative way, the process of inclusion body formation as it is described in the literature. Lastly, this approach has provided us with a framework that can be used to study the factors affecting protein misfolding and aggregation.


Future Directions

The most necessary future direction is to find a "test" protein and apply the principles of the model to determine if the simulation results match with literature results. If this can be shown more test proteins can be evaluated with the model and the model can be made more general. The second future direction is to explore the concept of "cut off" values. From the equilibrium graphs in the results section it is clear that there is some amount of inclusion body present, some amount of unstable protein and some amount of stable functional protein. This implies that there will always be a ratio between the three different species and also that even when inclusion body is present some functional protein will be too. A cut off value would be the ratio of functional protein to non functional such that functional protein can still be obtained. The final future direction is to develop a way to account for the four different categories of inclusion bodies that are seen in the literature values.

References

Ashbaugh, H. S., & Hatch, H. W. (2008). Natively unfolded protein stability as a coil-to-globule transition in charge/hydropathy space. Journal of the American Chemical Society, 130(29), 9536-9542.

Roberts, C. J. (2007). Non-native protein aggregation kinetics. Biotechnology and Bioengineering, 98(5), 927-938.

Wang, W., Nema, S., & Teagarden, D. (2010). Protein aggregation--pathways and influencing factors. International Journal of Pharmaceutics, 390(2), 89-99.