# Team:USTC/Modeling/b

### From 2010.igem.org

## Overview

I
n this section, we mainly focus on the co-evolution relationship between the **pdu-pdu proteins**.

Based on these modeling, we want to know more about the evolution status of pdu shell microcompartment. And furthermore, we want to prove that our artificial organelle is open to a variety of signal peptide, thus making its somehow more** prone to contain disparate reactions**.

## Results

**Co-evolution of pdu-pdu proteins.**

Use the method from previous mature algorithm [1], we check the pdu shell microcompartment proteins as interactive partners *in vivo*.

## Method

**1.sequence analysis**
Sequence related to all the components of Pdu microCompartments were retrieved using PSI-BLAST [2]with only 100 alignments and other parameters default. Multiple sequence alignments of them were constructed based directly on the PSI-BLAST alignments. The ClustalW phylogeny program was used to calculate a distance matrix with the neighbor-joining method[3].

**2.Correlation analysis**
Distance matrices were generated from the multiple alignments using ClustalW [4],we employed a linear regression analysis measuring the correlation between pairwise evolutionary distances among all proteins in a multiple sequence alignment.
We computed the linear correlation coefficient r(Pearson's correlation coefficient defined as:
with r ranges from -1 to 1, where Xbar is the mean of all X_{ij} -values and Y is the mean of all Y_{ij}-values. In our context, X_{ij} and Y_{ij} are pairwise sequence similarity distances between shell-inside components and shell components. Positive values of r would indicate a positive co-evolution; i.e. receptors that appear to be evolutionarily close, have ligands that, in turn, are more closely related than other pairs of any two ligands .By contrast, r-values of around zero would indicate no correlation, and negative values of r would indicate anti-correlation.

**3.Estimation of statistical significance of correlation**
The significance of the computed value r was assessed by calculating the r of two randomized distance matrix. We repeat this process 10000 times to get a distribution of r as figured. From the resulting 1000 values rrand,a z-score for the actual observed value r was calculated as where s is the standard deviation of rrand and rrandbar is the mean (effectively zero for truly random data)

## Conclusion

We calculate the correlation coefficient between the shell components and the inside components. We also cut the 20 amino acids of the **N-terminal** of the inside components to see if the N-terminal of the inside components can be more correlated with the inside components. To calculate the correlation coefficient, two methods, pearson and spearman （see supplementary）are utilized. To see the results more clearly we calculated the z-score of each value and draw the heat map. From the results, we can see that the components of the shell components and inside components are co-evolved, as we expected. Also, the scores of PduK are not very high suggesting that it might be less probable to interact with N-terminal of inside components, which might be proportional to its low copies in the microcompartment.

## Supplementary

1. To see the original data of pdu-pdu co-evolution. go to [5]

2. The results of spearman correlation coefficient:

## Reference

[1] Chern-Sing Goh et al., Co-evolution of proteins with their interaction partners, *Journal of Molecular Biology*, 2000 Jun 2; **299**(2):283-93.

[2] Stephen F. Altschul et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, *Nucleic Acids Research*, 1997; **25**(17): 3389 -3402.

[3] N Saitou and M Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees., *Molecular Biology and Evolution*, 1987 Jul; **4**(4):406-25.

[4] Julie D. Thompson, Desmond G. Higgins, and Toby J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, *Nucleic Acids Research*, 1994 Nov 11; **22**(22):4673-80.