Team:UC Davis/Projects
From 2010.igem.org
(Difference between revisions)
(8 intermediate revisions not shown) | |||
Line 39: | Line 39: | ||
function tabFunction(tab) | function tabFunction(tab) | ||
{ | { | ||
- | var crosstalkContent = "<br/><p class='header'>THE PROBLEM</p>In synthetic biology, the issue of crosstalk acts as a substantial barrier against developing fully-controlled biological systems. Much like in the development of electrical systems where crosstalk causes harmful interference and unpredictable behavior, crosstalk prevents us from completely understanding how our biological constructs function, and quite often can affect the efficacy of these systems. As such, it is clear that a method to computationally predict crosstalk in a given biological system would be a valuable scientific resource, and would effectively help minimize the negative effects of crosstalk. This is where our computational tool, CPOTATo, comes in.<br/><br/><div style='text-align: center;'><img src='https://static.igem.org/mediawiki/2010/c/c3/CPPaperScreen1.jpg'></div><br/>Crosstalk can be attributed to several aspects of biological systems, one of which is the interaction between proteins from the synthetic circuit and proteins from the host organism. CPOTATo (Crosstalk Predictive Organism-Targeted Analysis Tool) takes advantage of this fact in an attempt to predict protein combinations that may cause crosstalk in various biological systems.<br/><br/>The primary reason for the crosstalk between proteins is usually the homology between them. Consider the following abstract example: In the chassis' system, Protein A naturally interacts with Protein B; however, Protein C, a protein produced by the synthetic circuit we wish to implant, is very homologous to Protein A. Since Protein A and Protein C are very similar, there is a certain degree of probability that Protein C will interact with Protein B. Therefore, unless Protein C's original purpose was to interact with Protein B, this is an undesirable interaction that may lead to unpredictable crosstalk.<br/><br/><p class='header'>APPROACH</p>CPOTATo executes several consecutive database queries to different existing databases of genomic information in order to infer any instances of crosstalk based on protein-protein relationships. The tool relies on two inputs: the amino acid sequence of the target protein, and the target organism or chassis. We query 3 individual databases in order to obtain the information we need: Interpro, UniprotKB and String.<br/><br/>From the <a href='http://www.ebi.ac.uk/interpro/'>Interpro</a> database, we obtain one or more Interpro entries, each of which consists of a group of proteins homologous to our target protein. Interpro contains entries that correspond to various \"families\" of proteins based on whether they share some conserved protein sequence signature; these conserved signatures imply that the proteins within a given entry are homologously similar in some way (i.e. functionally, structurally etc.) Although Interpro is a metadatabase that takes information from several other databases, each of which has their own way of evaluating protein similarity, for our purposes, we only take into account entries from PFAM and TIGRFAM, two databases that deal exclusively with protein signature similarity related to protein function. We do hope to incorporate more databases as the project progresses, however, in the hopes that it will make our results more accurate.<br/><br/>After querying Interpro, we then use these Interpro entries as the query parameter to the <a href='http://www.uniprot.org/help/uniprotkb'>UniprotKB</a> database in order to filter out all proteins that are not produced within the chassis. The UniprotKB database contains information on individual proteins, and as such, we can acquire a complete list of all proteins within a given Interpro entry, and furthermore, we can ignore all proteins not produced within the target organism.<br/><br/>And finally, we query the <a href='http://string-db.org/'>String</a> database. This particular database contains documented protein-protein interaction information and when queried with a given protein, will return a list of all documented interactions that protein participates in within a given organism. For our purposes, this is the perfect source of interaction information, as we simply query the String database (many, many, MANY times) to obtain all the known interactions involving any of our homologous proteins.<br/><br/><div style='text-align:center'><img src='https://static.igem.org/mediawiki/2010/c/c6/CPPaperScreen2.jpg'></div><p class='header'>SCORES</p>The finalization of the numerical score analysis is still pending. | + | var crosstalkContent = "<br/><p class='header'>THE PROBLEM</p>In synthetic biology, the issue of crosstalk acts as a substantial barrier against developing fully-controlled biological systems. Much like in the development of electrical systems where crosstalk causes harmful interference and unpredictable behavior, crosstalk prevents us from completely understanding how our biological constructs function, and quite often can affect the efficacy of these systems. As such, it is clear that a method to computationally predict crosstalk in a given biological system would be a valuable scientific resource, and would effectively help minimize the negative effects of crosstalk. This is where our computational tool, CPOTATo, comes in.<br/><br/><div style='text-align: center;'><img src='https://static.igem.org/mediawiki/2010/c/c3/CPPaperScreen1.jpg'></div><br/>Crosstalk can be attributed to several aspects of biological systems, one of which is the interaction between proteins from the synthetic circuit and proteins from the host organism. CPOTATo (Crosstalk Predictive Organism-Targeted Analysis Tool) takes advantage of this fact in an attempt to predict protein combinations that may cause crosstalk in various biological systems, and furthermore, to derive a numerical representation of the strength of this prediction, whether that be in the form of a probability or some other value on a specifically designed scoring scale.<br/><br/>The primary reason for the crosstalk between proteins is usually the homology between them. Consider the following abstract example: In the chassis' system, Protein A naturally interacts with Protein B; however, Protein C, a protein produced by the synthetic circuit we wish to implant, is very homologous to Protein A. Since Protein A and Protein C are very similar, there is a certain degree of probability that Protein C will interact with Protein B. Therefore, unless Protein C's original purpose was to interact with Protein B, this is an undesirable interaction that may lead to unpredictable crosstalk.<br/><br/><p class='header'>APPROACH</p>CPOTATo executes several consecutive database queries to different existing databases of genomic information in order to infer any instances of crosstalk based on protein-protein relationships. The tool relies on two inputs: the amino acid sequence of the target protein, and the target organism or chassis. We query 3 individual databases in order to obtain the information we need: Interpro, UniprotKB and String.<br/><br/>From the <a href='http://www.ebi.ac.uk/interpro/'>Interpro</a> database, we obtain one or more Interpro entries, each of which consists of a group of proteins homologous to our target protein. Interpro contains entries that correspond to various \"families\" of proteins based on whether they share some conserved protein sequence signature; these conserved signatures imply that the proteins within a given entry are homologously similar in some way (i.e. functionally, structurally etc.) Although Interpro is a metadatabase that takes information from several other databases, each of which has their own way of evaluating protein similarity, for our purposes, we only take into account entries from PFAM and TIGRFAM, two databases that deal exclusively with protein signature similarity related to protein function. We do hope to incorporate more databases as the project progresses, however, in the hopes that it will make our results more accurate.<br/><br/>After querying Interpro, we then use these Interpro entries as the query parameter to the <a href='http://www.uniprot.org/help/uniprotkb'>UniprotKB</a> database in order to filter out all proteins that are not produced within the chassis. The UniprotKB database contains information on individual proteins, and as such, we can acquire a complete list of all proteins within a given Interpro entry, and furthermore, we can ignore all proteins not produced within the target organism.<br/><br/>And finally, we query the <a href='http://string-db.org/'>String</a> database. This particular database contains documented protein-protein interaction information and when queried with a given protein, will return a list of all documented interactions that protein participates in within a given organism. For our purposes, this is the perfect source of interaction information, as we simply query the String database (many, many, MANY times) to obtain all the known interactions involving any of our homologous proteins.<br/><br/>The following is a screenshot of the results page of the tool.<div style='text-align:center'><img src='https://static.igem.org/mediawiki/2010/c/c6/CPPaperScreen2.jpg'></img></div><br/><br/>The results screen has several important fields. The first field is a combobox at the top of the form that contains all associated Interpro entries. Upon selecting an Interpro entry, the Homologues list will change to reflect what homologous proteins (to the target protein) exist in the selected Interpro entry. The user can also select one of these homologues to dynamically change the list of Interactions as well to those that are associated with the selected homologue.<br/><br/>For example: in this screenshot, we see that protein phoQ is produced in E. Coli, is within Interpro entry IPR003594, and is documented to interact with phoP, crcA, ugd, rcsF etc.<br/><br/>The final function to take note of is the Interactor Search function. This allows the user to interactively search for a given Interactor (say, \"phoB\"), and the tool will automatically cite all instances of these interactors within all Interpro entries in the box below.<br/><br/><p class='header'>SCORES</p>In order to derive the final score/probability that will be reported by the tool, we must take into account the various scores already reported by the databases we are querying. The score reported by Interpro is a probabilistic significance score describing how well the target protein \"fits\" in that Interpro group (an e-value). The score reported by UniprotKB (after consulting with a UniprotKB representative), will probably not be included in the final calculation because it simply represents an Apache Lucene full body text match score for the search string (in other words, it really gives no insight to the biological relationships of the proteins whatsoever). And finally, the score reported by String is a combination of several statistical values calculated by the database in the background that describes the overall probability of interaction. The finalization of the numerical score analysis is still pending. Please see the Changelog for the latest developments. <br/><br/><p class='header'>In Development:</p><div class='version'>v0.2.0</div><ul><li>Designing scoring models that will take into account all homologues' Interpro e-values and String interaction probabilities to generate the final reaction probability.</li><li>Creating statistical significance models in order to evaluate accuracy of current results</li><li>Hoping to utilize Flash to create a graphical network view of all related interactions for the current query</li></ul><br/><br/><p class='header'>Changelog</p><div class='version'>v0.1.0:</div><ul><li>Queries completed. Some documented cases of crosstalk have been run through the tool, and the results are consistent with what's in the literature.</li><li>Score system is still in development. Currently trying to devise a \"ranking\" system for each Interpro entry to rank each individual protein according to their individual e-values. Unfortunately, Interpro does not provide an easy way to acquire these numbers through one query. The only feasible alternative seems to be to re-query all homologues into the Interpro database to acquire their individual e-value. Asynchronous queries, anyone?</li></ul>"; |
Line 69: | Line 69: | ||
</script> | </script> | ||
- | <body onload="tabFunction()"> | + | <body onload="tabFunction(4)"> |
<br/> | <br/> | ||
<table class="pikachu"> | <table class="pikachu"> |
Latest revision as of 03:30, 28 October 2010
|
|
|