http://2010.igem.org/wiki/index.php?title=Special:Contributions&feed=atom&limit=20&target=AlejandroHD2010.igem.org - User contributions [en]2024-03-29T06:22:02ZFrom 2010.igem.orgMediaWiki 1.16.5http://2010.igem.org/Team:Heidelberg/Project/Mouse_InfectionTeam:Heidelberg/Project/Mouse Infection2010-10-28T03:54:48Z<p>AlejandroHD: /* 1. Viral capsid / Gene delivery */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|Project_Mouse_Infection}}<br />
<br />
{{:Team:Heidelberg/Side_Top}}<br />
__TOC__<br />
<br/><br />
<br/><br />
=== construct schemes ===<br />
<br/><br />
[[Image:PBS SV40 luc2.png|frameless|center|250px|'''Figure 1: Positive control construct.''']]<br />
<br/><br />
[[Image:PBS SV40 luc2 miR122.png|frameless|center|250px|'''Figure 2: Off-targeting construct.''']]<br />
<br/><br />
[[Image:PBS_H1_shR_hAAT.png|frameless|center|250px|'''Figure 3: Tuning construct.''']]<br />
<br/><br />
[[Image:PBS SV40 luc2 hAAT.png|frameless|center|250px|'''Figure 4: Tuned construct.''']]<br />
<br/><br />
[[Image:PBS TetR miR122 4x.png|frameless|center|250px|'''Figure 5: Repressor construct.''']]<br />
<br/><br />
[[Image:PBS SV40 TetO2 luc2.png|frameless|center|250px|'''Figure 6: Operator construct.''']]<br />
<br/><br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
<html><br />
<a name="top"></a><br />
<style type="text/css"><br />
.tocnumber {display:none;}<br />
#toc ul ul, .toc ul ul {<br />
margin:0 0 0 0em;<br />
}<br />
#toctitle {display:none;}<br />
</style><br />
</html><br />
<br />
=<i>in vivo</i> study=<br />
==Abstract==<br />
Gene therapy offers a great tool for treatment of various human diseases and has shown tremendous promise and success in many recent clinical trials. Nevertheless, two most critical hurdles that remain are 1) the development of delivery vectors that allow for the highly cell- and tissue-specific expression of a gene of interest (GOI), as well as 2) the implementation of novel systems that permit the tightly controlled and fine-tuned expression of this gene in human patients. To fill in these two essential gaps, the iGEM Team 2010 of Heidelberg is introducing molecular evolution of viral gene transfer vectors as well as mi(cro)RNA-based gene regulation technology into the realm of synthetic biology. By combining these two powerful new methodologies, we have succeeded in engineering synthetic viral miRNA vectors that we could demonstrate to provide potent, specific and regulated gene expression in cultured mouse and human hepatocytes. Most importantly, we also document the successful use of our novel approach to achieve tightly controlled and liver-specific expression of a vector-encoded recombinant DNA in intact livers of adult mice. Our work and data thus pave the way for the future further expansion and utilization of our unique miRNA/vector kits in numerous biotechnological and especially biomedical applications.<br />
<br />
==Introduction==<br />
Gene therapy offers a fantastic and unique tool for efficient treatment of various human diseases with a genetic cause, ranging from metabolic disorders and cancers to infections with different viral pathogens (Zincarelli, Soltys et al. 2008). In principal, the underlying concept comprises the initial identification of a certain genetic defect, such as a deletion of a critical tumor suppressor gene which then causes cancer, or a mutation in a gene that is essential for a given metabolic pathway and whose loss or malfunction accordingly triggers symptoms of a disease. Once identified, gene therapists then attempt to correct this specific defect by introducing a correct copy of the missing or mutated gene back into the affected cell, hoping that this newly delivered gene will behave just like the original that it replaces. For transfer of this therapeutic gene sequence, one can choose between physical carriers, such as liposomes or nanoparticles, or alternatively, many groups engineer naturally occurring viruses, such as Adenoviruses, Lentiviruses or Adeno-associated viruses (AAV) as gene delivery vehicles. The latter hold the particular advantage that viruses have evolved to potently and, at least in many cases, specifically transfer their genetic cargo to cells, implying that a recombinant therapeutic DNA (or RNA) replacing these genes will behave just the same. <br><br />
<br />
However, as one can easily imagine from the complexity of gene expression networks in all cells, especially in those of higher organisms, this whole process of transferring a foreign gene into a cell and expressing it there is everything but trivial, and success is not necessarily guaranteed. In essence, there are several main challenges that still need to be overcome before gene therapy, or more generally gene transfer, in humans can become a routine application, namely, improvements in 1) specificity, 2) potency, 3) control and 4) related to all this, safety. Unsurprisingly, there have been a plethora of attempts in the past to tackle these challenges, and a great deal of success has been reported in the literature. What is curious about these findings and advances, however, is that the vast majority of them have merely focused on the recombinant DNA template itself and have tried to (and often managed to) improve its particular properties, such as stability or encapsidation into the shell of a recombinant virus for delivery to the target cell.<br><br />
<br />
Still, these particular approaches to improve the therapeutic DNA have taught us an important lesson, namely, that for successful, specific and potent gene transfer into human cells, especially into human patients, DNA is not enough ! Instead, there are two additional levels that are at least equally important and that have therefore been in the center of our team's focus: 1) the carrier used for gene delivery and 2) the control of gene expression inside the cell. More specifically, we have decided to study and improve the above already mentioned Adeno-associated virus (a large family of over 100 naturally occurring different serotypes) as a gene delivery vehicle due to a number of outstanding advantages of this particular virus for use in cell and tissue engineering, including the complete lack of pathogenicity of the parental wildtype viruses plus their capability to infect a wide range of target cells and tissues ([https://2010.igem.org/Team:Heidelberg/Project/Mouse_Infection#References Grimm, 2002]). On the other hand, this broad tropism can also be a disadvantage in cases where one would like to target a very specific cell type and concurrently avoid off-target gene expression elsewhere in the body. Therefore, several research groups (including the one of our instructor Dr. Grimm) have very recently begun to devise novel methods to molecularly engineer synthetic "designer" AAVs based on naturally occurring capsids which ideally combine and enhance their best properties. For instance, it is now possible to create entire libraries of AAV capsids by inserting short targeting peptides into exposed regions of the virus surface, or alternatively, by randomly point mutating single amino acids in the capsid proteins. The latest and most potent approach is, however, to shuffle entire capsid genes by first fragmenting and subsequently re-assembling the parts into hybrid particles. Ideally, the resulting chimeras will combine the assets of all parental viruses, such as high potency in a given cell type coupled with evasion of neutralization by human anti-AAV antibodies ([https://2010.igem.org/Team:Heidelberg/Project/Mouse_Infection#References Grimm et al., 2008]). <br><br />
<br />
Regarding the second critical level, regulation of gene expression following DNA delivery, it would be particularly crucial in the context of many gene transfer/therapy applications to be able to fine-regulate the level of the resulting protein expression, considering that uncontrolled over-production of certain proteins can be highly toxic or even lethal to cells and tissues. Unfortunately, as mentioned above, most approaches in this direction thus far have mainly focused on the DNA itself and accordingly mostly dealt with the development of improved promoter sequences that mediate controlled gene expression under certain conditions. What has been largely ignored until very recently, however, is the fact that gene expression can also be regulated on the RNA level, namely, via RNAi technologies. In this respect, we reasoned that the novel and broadly applicable miRNA tools engineered by our iGEM team should prove extremely useful in combination with gene therapy vectors, considering that they have been devised to permit very tight control over the expression of a given gene on the RNA level (by controlling translation and/or mRNA stability, which is the natural mechanism of miRNA action). Specifically, we wanted to explore the feasibility to fuse a gene of interest (GOI) with binding sites for either a naturally occurring miRNA or for a synthetic miRNA that is co-expressed from a second "tuning" vector. In either case, the binding site can then be designed to match the mi/shRNA either perfectly, which will result in nearly complete knockdown of gene expression, or it can be imperfect, which will yield a more subtle reduction in gene expression (for details, see our synthetic miRNA kit). <br><br />
<br />
Accordingly, we utilized our various novel tools to establish an entirely new system for fine-regulated gene expression from viral gene therapy vectors, in which control is exerted no longer on the DNA level, but in addition also on the steps of 1) gene delivery and 2) fine-tuning. <br><br />
<br />
Most importantly, as with any novel system for regulated gene expression, there is no guarantee that results obtained in cultured cells are relevant for, and will straight-forwardly translate into, identical finding in living animals or ultimately human patients. Therefore, we reasoned that it would be mandatory following the design and in vitro validation of our novel constructs to also assay and ideally confirm their functionality in living organisms in vivo, i.e., in adult mice. <br />
In the following, we will summarize our results from the in vivo evaluation of the two main strategies outlined above in the livers of adult mice and will demonstrate how modifying the viral gene vehicle capsid can drastically affect the specificity and level of gene expression in cultured cells versus living animals, and how engineering a given gene expression cassette by adding perfect or imperfect mi/shRNA binding sites can further fine-tune the strength of gene/protein expression at will. <br />
<br><br />
<br />
==Results==<br />
===Strategies===<br />
As outlined, we have pursued two fundamentally different strategies to achieve regulated gene expression in mice, which are control via the viral capsid as well as via RNAi/miRNAs. Generally, to measure the success of each individual strategy, it was mandatory to be able to detect and quantify gene expression in living adult mice to begin with. Moreover, we had to define a standard based on which we would evaluate the specificity and efficiency of each new construct and strategy that we wanted to test. As we were going to target the liver of mice (since it is one of the most therapeutically relevant organs plus concurrently very susceptible to gene transfer), we decided to use the particular AAV serotype 8 (AAV8) as our standard, based on previous findings by Dr. Grimm and others that it mediates maximum transduction of livers in adult mice. Additionally, we picked Firefly luciferase as our reporter gene as it can be easily detected and quantified in living mice following a simple intraperitoneal injection of luciferin substrate and using a sensitive camera system. <br />
Hence, in our very first animal experiment, we infused adult white mice with an AAV8 vector expressing Firefly luciferase from the universally active SV40 promoter and then measured luciferase levels in the liver 5 days after injection. As shown in the center of Figure 1, we were indeed able to detect a very strong expression in all livers of three independent mice, with an average photon count of 2.77x108 per mouse. In all subsequent experiments, this value served as our reference based on which we aimed to achieve a fine-regulation of in vivo gene expression. <br />
<br />
===1. Viral capsid / Gene delivery===<br />
Next, we asked how delivering the exact same luciferase vector to mice from a different viral capsid would alter the expression levels in the liver. For this purpose, we used our lead candidate from the virus shuffling and selection procedure described in the homology based capsid shuffling section. As noted there, this one particular capsid was curious in that it was the only one capable of potently transducing Huh7 and HepG2 cells, strongly suggesting that it might also yield reasonable gene expression in vivo. Indeed, when we performed the experiment with this synthetic capsid under the exact same conditions as with our standard AAV8 capsid, we again noted effective luciferase gene expression in the livers of three independently injected mice. <br><br />
Three findings were particularly remarkable in this experiment:<br> <br />
1) The average photon counts per mouse were 1.01x107 and thus about 27-fold lower than with the AAV8 standard vector. This clearly and impressively proved our theory that simply replacing the viral capsid can dramatically affect the expression of a given gene in vivo and validates our hypothesis that DNA is not enough, but that the vector for DNA delivery is equally important !<br><br />
2) Intriguingly, when we normalized the luciferase measurement settings for both viruses (wildtype 8 and synthetic iGEM vector) to achieve equal expression data in the liver, we realized that the wildtype virus showed a dramatic off-targeting activity outside the liver. In fact, basically the entire mouse was lighting up positive in the camera at these sensitive settings, confirming published findings that AAV8 has an extremely broad in vivo tropism that includes the brain. In striking contrast, although our own virus gave virtually identical luciferase values in the liver, it was completely inactive anywhere outside of this organ. This clearly validated our strategy to select and test a viral clone that was positive in cell culture on two independent liver cell lines and thus proves the power of our virus evolution/selection strategy to enrich for synthetic capsids with desired gene transduction capabilities. <br><br />
3) Curiously, we then found that our synthetic clone clearly outperformed the corresponding AAV8 counterpart on cultured cells, despite the fact that it was about 27-fold less efficient in the liver, as noted above. In fact, it is a well known phenomenon for all wildtype AAV viruses that they either function in cell culture or in mice, but not both. Hence, our finding with our new synthetic clone which gave reasonable expression in vivo AND in vitro is even further proof for our concept to molecularly evolve viral gene transfer vectors and then select synthetic chimeras exhibting properties that do not exist in nature.<br><br />
<br />
===2. miRNA-based OFF targeting===<br />
In the above described first set of experiments, we aimed to fine-regulate gene expression by modifying the viral capsids delivering the same gene as was originally expressed from our standard AAV8 vector. In the next step, we asked whether it was also possible to exploit the endogenous expression of cellular miRNAs in order to further control gene expression levels from a viral vector. Generally, this strategy can be exploited in at least two ways: 1) a GOI can be fused with a perfectly matching binding site for such a miRNA, which will typically result in nearly complete inhibition of gene expression, or 2) this site can be imperfect which will yield a gradual reduction of expression. <br><br />
A particularly useful application of the first (perfect site) strategy would be to purposely de-target the expression of a viral vector or a gene, respectively, from a certain tissue in which gene expression is not supposed to occur. In order to test this possibility with our system, we engineered our luciferase vector to contain four perfect binding sites for the miR-122, which is specific for liver and highly abundant in hepatocytes. We then again packaged the resulting vector construct into the potent AAV8 capsid and also injected this virus into adult mice. Impressively, and exactly as predicted and hoped for, we found that luciferase expression from this vector was almost 100-fold reduced as compared to the unmodified standard AAV8 vector. In fact, average photon counts per liver (again three independent mice) dropped to 4.95x106 or 1.8%, even though the luciferase gene was still delivered by the same highly potent capsid. <br><br />
<br />
This obviously validated our hypothesis that cellular miRNAs can be exploited to substantially regulate the expression of a vector-encoded recombinant DNA and is thus further proof for our concept that neither DNA nor the vector are sufficient to achieve the maximum possible level of control over in vivo gene expression.<br />
<br><br />
<br />
===3. miRNA-based fine-tuning===<br />
As just indicated, we predicted that fusing our luciferase gene with an imperfect binding site for a miRNA would result in a more gradual reduction in gene expression as compared to the perfect site. To test this hypothesis, we generated a series of expression vectors in which the luciferase gene was tagged with perfect or imperfect binding sites for an artificial miRNA (an shRNA directed against the hAAT gene). This miRNA was then packaged into and co-expressed from a second AAV8 vector that we generated in parallel. The results we obtained are shown in Figure1.<br><br />
<br />
Three important conclusions can be drawn from this experiment:<br><br />
1) Modifying the luciferase gene with different binding sites has no effect on basal gene expression levels (at least not in this example in our hands) when compared to the standard unmodified AAV8 vector (see figure above), which is remarkable as it suggests that our strategy should be broadly applicable for many other recombinant DNA vectors as well.<br><br />
2) Tagging the luciferase gene with a perfect binding site and then co-expressing the corresponding artificial miRNA from a second vector resulted in the expected potent down-/fine-regulation of gene expression to about 30% of starting levels from the unmodified standard vector. Note that the effect is less pronounced than what we observed with the perfect sites for miR-122 above, which can however be easily explained with the fact that the miR-122 contained four perfect binding sites versus only one in the present vector for the artificial miRNA. Moreover, one has to take into account that the artifical miRNAs was co-expressed from a second vector in the current setting which certainly resulted in vastly different kinetics as compared to an endogenous miRNA.<br><br />
3) Tagging luciferase with an imperfect miRNA binding site had the expected intermediate effect on gene expression and resulted in a fine-regulation to about 58% of the levels observed with the unmodified control vector. This is an extremely remarkable result as it clearly shows that using our miTuner strategy, it is possible to achieve highly similar extents of fine-regulation of gene expression in cultured cells as well as in living adult animals. It thereby strongly validates our overall strategy to combine synthetic miRNAs and corresponding binding sites with pre-existing gene expression cassettes in order to fine-tune the latter in a very deliberate, controlled and predictable (from straight-forward in vitro experiments) fashion. <br> <br />
<br />
===Bioluminescence imaging===<br />
<br />
<html><br />
<map name="2" id="2"><br />
<area shape="rect" coords="260,25,353,91" href="#AAVcomparison" alt="" /><br />
<area shape="rect" coords="252,198,364,277" href="#AAVcomparison" alt="" /><br />
<area shape="rect" coords="85,339,180,407" href="#miR122" alt="" /><br />
<area shape="rect" coords="491,287,586,357" href="#hAAT" alt="" /><br />
<area shape="rect" coords="380,367,474,437" href="#hAAT" alt="" /><br />
</map><br />
<img src="https://static.igem.org/mediawiki/2010/3/3d/Mouse_data_for_Wiki.png" width="618" height="464" border="0" alt="" title="" usemap="#2" /><br><br />
<p>Figure 1: Scheme of luciferase <br><br><br><br><br><br />
<br />
<a name="miR122"></a></html><br />
[[Image:Mir122_off_targetting.jpg|thumb|300px|right|miR122 OFF-targetting]]<br />
<html><a name="hAAT"></a></html><br/><br />
[[Image:HAAT_tuning.jpg|thumb|300px|right|hAAT tuning]]<br />
<html><a name="AAVcomparison"></a></html>[[Image:20101028_Helatransduction.jpg|thumb|300px|right|HeLa Transfection]]<br />
<br />
==Discussion==<br />
<br />
==Methods==<br />
===Contructs===<br />
The <i>in vivo</i> analysis should enlighten our gene therapy approach using AAV tropism as well as miRNA binding sites as trigger for expression. The following constructs have been subcloned separately into the AAV context to accomplish those tasks: <br/><br />
# positive control, <br />
# off-targeting construct, <br />
# synthetic tuning construct and <br />
# on-targeting construct. <br />
<br/><br />
All but one virus were packaged by the AAV rep and cap gene with Adenovirus 5 (Ad5) as a helper plasmid. Accordingly, one virus construct was packaged into a shuffled cap gene from our [https://2010.igem.org/Team:Heidelberg/Project/Capsid_Shuffling/Homology_Based homology based capsid shuffling] attempt. <br />
<br/><br />
# The positive control (see sidebar, fig. 1) consisted of the SV40 promoter driving a firefly luciferase (luc2) gene, thereby leading to an unspecific expression of the luciferase protein in all mice tissues. In addition to packaging this construct into a wild type AAV virus, the positive control was also packaged as a transgene into our [https://2010.igem.org/Team:Heidelberg/Project/Capsid_Shuffling/Homology_Based shuffled capsid] which after random selection <!--selection pressure--> was already able to [https://2010.igem.org/Team:Heidelberg/Notebook/Homology_Based/October#16/10/2010 positively transduce Huh7 and HepG2 cells] <i>in vitro</i>. <br />
# The off-targeting construct (see sidebar, fig. 2) was composed of an SV40 promoter driving a firefly luciferase (luc2) gene with binding sites against miR-122 behind it. In order to achieve the highest expression in all mice cells but the liver cells - a single perfect binding site of miR-122 was used for <i>in vivo</i> study. <br />
# The synthetic tuning construct (see sidebar, fig. 3) consisted of two viruses injected at the same time in the mice. The one virus packaged the expression construct of shRNA haat driven by the H1 promoter ("tuning" construct, see sidebar, fig.3). The second virus packaged the following transgene: SV40 promoter driving luc2 with shRNA haat binding site behind it ("tuned" construct, see sidebar, fig. 4). In order to ensure a synthetic tuning effect, a perfect binding site and one with a bulge that was introduced at position 9-12 were used for <i>in vivo</i> experiments, respectively. Those two binding sites should lead to a significant knockdown in the first case and a slight repression of luciferase expression in the latter as compared to the positive control.<br />
# The on-targeting construct consisted of two independent viruses which were co-infected into mice, as well. One of these viruses packaged the Tet Repressor (TetR) driven by an SV40 promoter ("repressor" construct, see sidebar, fig. 5). The expression of TetR is under the control of miR-122 as four binding sites of this miRNA were cloned into the 3’UTR of the gene. The second virus was composed of an SV40 promoter driving the Tet operator (TetO<sub>2</sub>) which monitors the expression of luc2 ("operator" construct, see sidebar, fig. 6). With this setup, luc2 expression should be inhibited by the TetR in all mice tissues except for liver cells, where TetR is down-regulated by miRNA 122.<br />
<br />
===Production of recombinant virus===<br />
The viruses were produced in HEK 293-T cells and purified on an iodixanol gradient according to [https://2010.igem.org/Team:Heidelberg/Notebook/Methods#Virus_Production the virus production protocol]. <br />
<br />
Before infection, the titer of the viruses was quantified using [https://2010.igem.org/Team:Heidelberg/Notebook/Methods#Quantitative_Realtime_PCR quantitative realtime PCR].<br />
<br />
===Procedure involving animals===<br />
The mouse experiments were conducted in accordance with the animal facility of the [https://2010.igem.org/Team:Heidelberg/Team/Institutes German Cancer Research Center in Heidelberg]. Female NMRI mice were obtained from a collaboration with Dr. Oliver Müller. At 8-10 weeks of age, the animals were injected in the tail vein (TV), with <nowiki>~</nowiki> 1x10<sup>11</sup> particles of AAV-SV40-luciferase in 200µl of 1x phosphate-buffered saline. The mice are transferred to a holding device which restrains the mouse while allowing access to the tail vein. The tails were warmed before the injections and injections were carried out using 27 gauge needles. All the mice recoverd from the injection quickly without loss of mobility or interruption of grooming activity {{HDref|Zincarelli et al., 2008}}.<br />
<br />
===in vivo animal imaging===<br />
Mice were anesthesized in an isofluran chamber. The mice were injected intraperitoneally with 200µl of a 30 mg/ml concentration of D-luciferin. This injection starts the luminescence of luc2. Mice were measured for one to seven minutes post injection under the in vivo bioluminometer.<br />
<br />
==References==<br />
Grimm, D. (2002). "Production methods for gene transfer vectors based on adeno-associated virus serotypes." Methods 28(2): 146-157.<br><br />
Grimm, D., J. S. Lee, et al. (2008). "In vitro and in vivo gene therapy vector evolution via multispecies interbreeding and retargeting of adeno-associated viruses." J Virol 82(12): 5887-5911.<br><br />
Zincarelli, C., S. Soltys, et al. (2008). "Analysis of AAV serotypes 1-9 mediated gene expression and tropism in mice after systemic injection." Mol Ther 16(6): 1073-1080.<br><br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Project/Mouse_InfectionTeam:Heidelberg/Project/Mouse Infection2010-10-28T03:54:22Z<p>AlejandroHD: /* Introduction */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|Project_Mouse_Infection}}<br />
<br />
{{:Team:Heidelberg/Side_Top}}<br />
__TOC__<br />
<br/><br />
<br/><br />
=== construct schemes ===<br />
<br/><br />
[[Image:PBS SV40 luc2.png|frameless|center|250px|'''Figure 1: Positive control construct.''']]<br />
<br/><br />
[[Image:PBS SV40 luc2 miR122.png|frameless|center|250px|'''Figure 2: Off-targeting construct.''']]<br />
<br/><br />
[[Image:PBS_H1_shR_hAAT.png|frameless|center|250px|'''Figure 3: Tuning construct.''']]<br />
<br/><br />
[[Image:PBS SV40 luc2 hAAT.png|frameless|center|250px|'''Figure 4: Tuned construct.''']]<br />
<br/><br />
[[Image:PBS TetR miR122 4x.png|frameless|center|250px|'''Figure 5: Repressor construct.''']]<br />
<br/><br />
[[Image:PBS SV40 TetO2 luc2.png|frameless|center|250px|'''Figure 6: Operator construct.''']]<br />
<br/><br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
<html><br />
<a name="top"></a><br />
<style type="text/css"><br />
.tocnumber {display:none;}<br />
#toc ul ul, .toc ul ul {<br />
margin:0 0 0 0em;<br />
}<br />
#toctitle {display:none;}<br />
</style><br />
</html><br />
<br />
=<i>in vivo</i> study=<br />
==Abstract==<br />
Gene therapy offers a great tool for treatment of various human diseases and has shown tremendous promise and success in many recent clinical trials. Nevertheless, two most critical hurdles that remain are 1) the development of delivery vectors that allow for the highly cell- and tissue-specific expression of a gene of interest (GOI), as well as 2) the implementation of novel systems that permit the tightly controlled and fine-tuned expression of this gene in human patients. To fill in these two essential gaps, the iGEM Team 2010 of Heidelberg is introducing molecular evolution of viral gene transfer vectors as well as mi(cro)RNA-based gene regulation technology into the realm of synthetic biology. By combining these two powerful new methodologies, we have succeeded in engineering synthetic viral miRNA vectors that we could demonstrate to provide potent, specific and regulated gene expression in cultured mouse and human hepatocytes. Most importantly, we also document the successful use of our novel approach to achieve tightly controlled and liver-specific expression of a vector-encoded recombinant DNA in intact livers of adult mice. Our work and data thus pave the way for the future further expansion and utilization of our unique miRNA/vector kits in numerous biotechnological and especially biomedical applications.<br />
<br />
==Introduction==<br />
Gene therapy offers a fantastic and unique tool for efficient treatment of various human diseases with a genetic cause, ranging from metabolic disorders and cancers to infections with different viral pathogens (Zincarelli, Soltys et al. 2008). In principal, the underlying concept comprises the initial identification of a certain genetic defect, such as a deletion of a critical tumor suppressor gene which then causes cancer, or a mutation in a gene that is essential for a given metabolic pathway and whose loss or malfunction accordingly triggers symptoms of a disease. Once identified, gene therapists then attempt to correct this specific defect by introducing a correct copy of the missing or mutated gene back into the affected cell, hoping that this newly delivered gene will behave just like the original that it replaces. For transfer of this therapeutic gene sequence, one can choose between physical carriers, such as liposomes or nanoparticles, or alternatively, many groups engineer naturally occurring viruses, such as Adenoviruses, Lentiviruses or Adeno-associated viruses (AAV) as gene delivery vehicles. The latter hold the particular advantage that viruses have evolved to potently and, at least in many cases, specifically transfer their genetic cargo to cells, implying that a recombinant therapeutic DNA (or RNA) replacing these genes will behave just the same. <br><br />
<br />
However, as one can easily imagine from the complexity of gene expression networks in all cells, especially in those of higher organisms, this whole process of transferring a foreign gene into a cell and expressing it there is everything but trivial, and success is not necessarily guaranteed. In essence, there are several main challenges that still need to be overcome before gene therapy, or more generally gene transfer, in humans can become a routine application, namely, improvements in 1) specificity, 2) potency, 3) control and 4) related to all this, safety. Unsurprisingly, there have been a plethora of attempts in the past to tackle these challenges, and a great deal of success has been reported in the literature. What is curious about these findings and advances, however, is that the vast majority of them have merely focused on the recombinant DNA template itself and have tried to (and often managed to) improve its particular properties, such as stability or encapsidation into the shell of a recombinant virus for delivery to the target cell.<br><br />
<br />
Still, these particular approaches to improve the therapeutic DNA have taught us an important lesson, namely, that for successful, specific and potent gene transfer into human cells, especially into human patients, DNA is not enough ! Instead, there are two additional levels that are at least equally important and that have therefore been in the center of our team's focus: 1) the carrier used for gene delivery and 2) the control of gene expression inside the cell. More specifically, we have decided to study and improve the above already mentioned Adeno-associated virus (a large family of over 100 naturally occurring different serotypes) as a gene delivery vehicle due to a number of outstanding advantages of this particular virus for use in cell and tissue engineering, including the complete lack of pathogenicity of the parental wildtype viruses plus their capability to infect a wide range of target cells and tissues ([https://2010.igem.org/Team:Heidelberg/Project/Mouse_Infection#References Grimm, 2002]). On the other hand, this broad tropism can also be a disadvantage in cases where one would like to target a very specific cell type and concurrently avoid off-target gene expression elsewhere in the body. Therefore, several research groups (including the one of our instructor Dr. Grimm) have very recently begun to devise novel methods to molecularly engineer synthetic "designer" AAVs based on naturally occurring capsids which ideally combine and enhance their best properties. For instance, it is now possible to create entire libraries of AAV capsids by inserting short targeting peptides into exposed regions of the virus surface, or alternatively, by randomly point mutating single amino acids in the capsid proteins. The latest and most potent approach is, however, to shuffle entire capsid genes by first fragmenting and subsequently re-assembling the parts into hybrid particles. Ideally, the resulting chimeras will combine the assets of all parental viruses, such as high potency in a given cell type coupled with evasion of neutralization by human anti-AAV antibodies ([https://2010.igem.org/Team:Heidelberg/Project/Mouse_Infection#References Grimm et al., 2008]). <br><br />
<br />
Regarding the second critical level, regulation of gene expression following DNA delivery, it would be particularly crucial in the context of many gene transfer/therapy applications to be able to fine-regulate the level of the resulting protein expression, considering that uncontrolled over-production of certain proteins can be highly toxic or even lethal to cells and tissues. Unfortunately, as mentioned above, most approaches in this direction thus far have mainly focused on the DNA itself and accordingly mostly dealt with the development of improved promoter sequences that mediate controlled gene expression under certain conditions. What has been largely ignored until very recently, however, is the fact that gene expression can also be regulated on the RNA level, namely, via RNAi technologies. In this respect, we reasoned that the novel and broadly applicable miRNA tools engineered by our iGEM team should prove extremely useful in combination with gene therapy vectors, considering that they have been devised to permit very tight control over the expression of a given gene on the RNA level (by controlling translation and/or mRNA stability, which is the natural mechanism of miRNA action). Specifically, we wanted to explore the feasibility to fuse a gene of interest (GOI) with binding sites for either a naturally occurring miRNA or for a synthetic miRNA that is co-expressed from a second "tuning" vector. In either case, the binding site can then be designed to match the mi/shRNA either perfectly, which will result in nearly complete knockdown of gene expression, or it can be imperfect, which will yield a more subtle reduction in gene expression (for details, see our synthetic miRNA kit). <br><br />
<br />
Accordingly, we utilized our various novel tools to establish an entirely new system for fine-regulated gene expression from viral gene therapy vectors, in which control is exerted no longer on the DNA level, but in addition also on the steps of 1) gene delivery and 2) fine-tuning. <br><br />
<br />
Most importantly, as with any novel system for regulated gene expression, there is no guarantee that results obtained in cultured cells are relevant for, and will straight-forwardly translate into, identical finding in living animals or ultimately human patients. Therefore, we reasoned that it would be mandatory following the design and in vitro validation of our novel constructs to also assay and ideally confirm their functionality in living organisms in vivo, i.e., in adult mice. <br />
In the following, we will summarize our results from the in vivo evaluation of the two main strategies outlined above in the livers of adult mice and will demonstrate how modifying the viral gene vehicle capsid can drastically affect the specificity and level of gene expression in cultured cells versus living animals, and how engineering a given gene expression cassette by adding perfect or imperfect mi/shRNA binding sites can further fine-tune the strength of gene/protein expression at will. <br />
<br><br />
<br />
==Results==<br />
===Strategies===<br />
As outlined, we have pursued two fundamentally different strategies to achieve regulated gene expression in mice, which are control via the viral capsid as well as via RNAi/miRNAs. Generally, to measure the success of each individual strategy, it was mandatory to be able to detect and quantify gene expression in living adult mice to begin with. Moreover, we had to define a standard based on which we would evaluate the specificity and efficiency of each new construct and strategy that we wanted to test. As we were going to target the liver of mice (since it is one of the most therapeutically relevant organs plus concurrently very susceptible to gene transfer), we decided to use the particular AAV serotype 8 (AAV8) as our standard, based on previous findings by Dr. Grimm and others that it mediates maximum transduction of livers in adult mice. Additionally, we picked Firefly luciferase as our reporter gene as it can be easily detected and quantified in living mice following a simple intraperitoneal injection of luciferin substrate and using a sensitive camera system. <br />
Hence, in our very first animal experiment, we infused adult white mice with an AAV8 vector expressing Firefly luciferase from the universally active SV40 promoter and then measured luciferase levels in the liver 5 days after injection. As shown in the center of Figure 1, we were indeed able to detect a very strong expression in all livers of three independent mice, with an average photon count of 2.77x108 per mouse. In all subsequent experiments, this value served as our reference based on which we aimed to achieve a fine-regulation of in vivo gene expression. <br />
<br />
===1. Viral capsid / Gene delivery===<br />
Next, we asked how delivering the exact same luciferase vector to mice from a different viral capsid would alter the expression levels in the liver. For this purpose, we used our lead candidate from the virus shuffling and selection procedure described in the homology based capsid shuffling section. As noted there, this one particular capsid was curious in that it was the only one capable of potently transducing Huh7 and HepG2 cells, strongly suggesting that it might also yield reasonable gene expression in vivo. Indeed, when we performed the experiment with this synthetic capsid under the exact same conditions as with our standard AAV8 capsid, we again noted effective luciferase gene expression in the livers of three independently injected mice. <br><br />
Three findings were particularly remarkable in this experiment:<br> <br />
1) The average photon counts per mouse were 1.01x107 and thus about 27-fold lower than with the AAV8 standard vector. This clearly and impressively proved our theory that simply replacing the viral capsid can dramatically affect the expression of a given gene in vivo and validates our hypothesis that DNA is not enough, but that the vector for DNA delivery is equally important !<br><br />
2) Intriguingly, when we normalized the luciferase measurement settings for both viruses (wildtype 8 and synthetic iGEM vector) to achieve equal expression data in the liver, we realized that the wildtype virus showed a dramatic off-targeting activity outside the liver. In fact, basically the entire mouse was lighting up positive in the camera at these sensitive settings, confirming published findings that AAV8 has an extremely broad in vivo tropism that includes the brain. In striking contrast, although our own virus gave virtually identical luciferase values in the liver, it was completely inactive anywhere outside of this organ. This clearly validated our strategy to select and test a viral clone that was positive in cell culture on two independent liver cell lines and thus proves the power of our virus evolution/selection strategy to enrich for synthetic capsids with desired gene transduction capabilities. <br><br />
3) Curiously, we then found that our synthetic clone clearly outperformed the corresponding AAV8 counterpart on cultured cells, despite the fact that it was about 27-fold less efficient in the liver, as noted above. (BECKY DATA HERE). In fact, it is a well known phenomenon for all wildtype AAV viruses that they either function in cell culture or in mice, but not both. Hence, our finding with our new synthetic clone which gave reasonable expression in vivo AND in vitro is even further proof for our concept to molecularly evolve viral gene transfer vectors and then select synthetic chimeras exhibting properties that do not exist in nature.<br><br />
<br />
===2. miRNA-based OFF targeting===<br />
In the above described first set of experiments, we aimed to fine-regulate gene expression by modifying the viral capsids delivering the same gene as was originally expressed from our standard AAV8 vector. In the next step, we asked whether it was also possible to exploit the endogenous expression of cellular miRNAs in order to further control gene expression levels from a viral vector. Generally, this strategy can be exploited in at least two ways: 1) a GOI can be fused with a perfectly matching binding site for such a miRNA, which will typically result in nearly complete inhibition of gene expression, or 2) this site can be imperfect which will yield a gradual reduction of expression. <br><br />
A particularly useful application of the first (perfect site) strategy would be to purposely de-target the expression of a viral vector or a gene, respectively, from a certain tissue in which gene expression is not supposed to occur. In order to test this possibility with our system, we engineered our luciferase vector to contain four perfect binding sites for the miR-122, which is specific for liver and highly abundant in hepatocytes. We then again packaged the resulting vector construct into the potent AAV8 capsid and also injected this virus into adult mice. Impressively, and exactly as predicted and hoped for, we found that luciferase expression from this vector was almost 100-fold reduced as compared to the unmodified standard AAV8 vector. In fact, average photon counts per liver (again three independent mice) dropped to 4.95x106 or 1.8%, even though the luciferase gene was still delivered by the same highly potent capsid. <br><br />
<br />
This obviously validated our hypothesis that cellular miRNAs can be exploited to substantially regulate the expression of a vector-encoded recombinant DNA and is thus further proof for our concept that neither DNA nor the vector are sufficient to achieve the maximum possible level of control over in vivo gene expression.<br />
<br><br />
<br />
===3. miRNA-based fine-tuning===<br />
As just indicated, we predicted that fusing our luciferase gene with an imperfect binding site for a miRNA would result in a more gradual reduction in gene expression as compared to the perfect site. To test this hypothesis, we generated a series of expression vectors in which the luciferase gene was tagged with perfect or imperfect binding sites for an artificial miRNA (an shRNA directed against the hAAT gene). This miRNA was then packaged into and co-expressed from a second AAV8 vector that we generated in parallel. The results we obtained are shown in Figure1.<br><br />
<br />
Three important conclusions can be drawn from this experiment:<br><br />
1) Modifying the luciferase gene with different binding sites has no effect on basal gene expression levels (at least not in this example in our hands) when compared to the standard unmodified AAV8 vector (see figure above), which is remarkable as it suggests that our strategy should be broadly applicable for many other recombinant DNA vectors as well.<br><br />
2) Tagging the luciferase gene with a perfect binding site and then co-expressing the corresponding artificial miRNA from a second vector resulted in the expected potent down-/fine-regulation of gene expression to about 30% of starting levels from the unmodified standard vector. Note that the effect is less pronounced than what we observed with the perfect sites for miR-122 above, which can however be easily explained with the fact that the miR-122 contained four perfect binding sites versus only one in the present vector for the artificial miRNA. Moreover, one has to take into account that the artifical miRNAs was co-expressed from a second vector in the current setting which certainly resulted in vastly different kinetics as compared to an endogenous miRNA.<br><br />
3) Tagging luciferase with an imperfect miRNA binding site had the expected intermediate effect on gene expression and resulted in a fine-regulation to about 58% of the levels observed with the unmodified control vector. This is an extremely remarkable result as it clearly shows that using our miTuner strategy, it is possible to achieve highly similar extents of fine-regulation of gene expression in cultured cells as well as in living adult animals. It thereby strongly validates our overall strategy to combine synthetic miRNAs and corresponding binding sites with pre-existing gene expression cassettes in order to fine-tune the latter in a very deliberate, controlled and predictable (from straight-forward in vitro experiments) fashion. <br> <br />
<br />
===Bioluminescence imaging===<br />
<br />
<html><br />
<map name="2" id="2"><br />
<area shape="rect" coords="260,25,353,91" href="#AAVcomparison" alt="" /><br />
<area shape="rect" coords="252,198,364,277" href="#AAVcomparison" alt="" /><br />
<area shape="rect" coords="85,339,180,407" href="#miR122" alt="" /><br />
<area shape="rect" coords="491,287,586,357" href="#hAAT" alt="" /><br />
<area shape="rect" coords="380,367,474,437" href="#hAAT" alt="" /><br />
</map><br />
<img src="https://static.igem.org/mediawiki/2010/3/3d/Mouse_data_for_Wiki.png" width="618" height="464" border="0" alt="" title="" usemap="#2" /><br><br />
<p>Figure 1: Scheme of luciferase <br><br><br><br><br><br />
<br />
<a name="miR122"></a></html><br />
[[Image:Mir122_off_targetting.jpg|thumb|300px|right|miR122 OFF-targetting]]<br />
<html><a name="hAAT"></a></html><br/><br />
[[Image:HAAT_tuning.jpg|thumb|300px|right|hAAT tuning]]<br />
<html><a name="AAVcomparison"></a></html>[[Image:20101028_Helatransduction.jpg|thumb|300px|right|HeLa Transfection]]<br />
<br />
==Discussion==<br />
<br />
==Methods==<br />
===Contructs===<br />
The <i>in vivo</i> analysis should enlighten our gene therapy approach using AAV tropism as well as miRNA binding sites as trigger for expression. The following constructs have been subcloned separately into the AAV context to accomplish those tasks: <br/><br />
# positive control, <br />
# off-targeting construct, <br />
# synthetic tuning construct and <br />
# on-targeting construct. <br />
<br/><br />
All but one virus were packaged by the AAV rep and cap gene with Adenovirus 5 (Ad5) as a helper plasmid. Accordingly, one virus construct was packaged into a shuffled cap gene from our [https://2010.igem.org/Team:Heidelberg/Project/Capsid_Shuffling/Homology_Based homology based capsid shuffling] attempt. <br />
<br/><br />
# The positive control (see sidebar, fig. 1) consisted of the SV40 promoter driving a firefly luciferase (luc2) gene, thereby leading to an unspecific expression of the luciferase protein in all mice tissues. In addition to packaging this construct into a wild type AAV virus, the positive control was also packaged as a transgene into our [https://2010.igem.org/Team:Heidelberg/Project/Capsid_Shuffling/Homology_Based shuffled capsid] which after random selection <!--selection pressure--> was already able to [https://2010.igem.org/Team:Heidelberg/Notebook/Homology_Based/October#16/10/2010 positively transduce Huh7 and HepG2 cells] <i>in vitro</i>. <br />
# The off-targeting construct (see sidebar, fig. 2) was composed of an SV40 promoter driving a firefly luciferase (luc2) gene with binding sites against miR-122 behind it. In order to achieve the highest expression in all mice cells but the liver cells - a single perfect binding site of miR-122 was used for <i>in vivo</i> study. <br />
# The synthetic tuning construct (see sidebar, fig. 3) consisted of two viruses injected at the same time in the mice. The one virus packaged the expression construct of shRNA haat driven by the H1 promoter ("tuning" construct, see sidebar, fig.3). The second virus packaged the following transgene: SV40 promoter driving luc2 with shRNA haat binding site behind it ("tuned" construct, see sidebar, fig. 4). In order to ensure a synthetic tuning effect, a perfect binding site and one with a bulge that was introduced at position 9-12 were used for <i>in vivo</i> experiments, respectively. Those two binding sites should lead to a significant knockdown in the first case and a slight repression of luciferase expression in the latter as compared to the positive control.<br />
# The on-targeting construct consisted of two independent viruses which were co-infected into mice, as well. One of these viruses packaged the Tet Repressor (TetR) driven by an SV40 promoter ("repressor" construct, see sidebar, fig. 5). The expression of TetR is under the control of miR-122 as four binding sites of this miRNA were cloned into the 3’UTR of the gene. The second virus was composed of an SV40 promoter driving the Tet operator (TetO<sub>2</sub>) which monitors the expression of luc2 ("operator" construct, see sidebar, fig. 6). With this setup, luc2 expression should be inhibited by the TetR in all mice tissues except for liver cells, where TetR is down-regulated by miRNA 122.<br />
<br />
===Production of recombinant virus===<br />
The viruses were produced in HEK 293-T cells and purified on an iodixanol gradient according to [https://2010.igem.org/Team:Heidelberg/Notebook/Methods#Virus_Production the virus production protocol]. <br />
<br />
Before infection, the titer of the viruses was quantified using [https://2010.igem.org/Team:Heidelberg/Notebook/Methods#Quantitative_Realtime_PCR quantitative realtime PCR].<br />
<br />
===Procedure involving animals===<br />
The mouse experiments were conducted in accordance with the animal facility of the [https://2010.igem.org/Team:Heidelberg/Team/Institutes German Cancer Research Center in Heidelberg]. Female NMRI mice were obtained from a collaboration with Dr. Oliver Müller. At 8-10 weeks of age, the animals were injected in the tail vein (TV), with <nowiki>~</nowiki> 1x10<sup>11</sup> particles of AAV-SV40-luciferase in 200µl of 1x phosphate-buffered saline. The mice are transferred to a holding device which restrains the mouse while allowing access to the tail vein. The tails were warmed before the injections and injections were carried out using 27 gauge needles. All the mice recoverd from the injection quickly without loss of mobility or interruption of grooming activity {{HDref|Zincarelli et al., 2008}}.<br />
<br />
===in vivo animal imaging===<br />
Mice were anesthesized in an isofluran chamber. The mice were injected intraperitoneally with 200µl of a 30 mg/ml concentration of D-luciferin. This injection starts the luminescence of luc2. Mice were measured for one to seven minutes post injection under the in vivo bioluminometer.<br />
<br />
==References==<br />
Grimm, D. (2002). "Production methods for gene transfer vectors based on adeno-associated virus serotypes." Methods 28(2): 146-157.<br><br />
Grimm, D., J. S. Lee, et al. (2008). "In vitro and in vivo gene therapy vector evolution via multispecies interbreeding and retargeting of adeno-associated viruses." J Virol 82(12): 5887-5911.<br><br />
Zincarelli, C., S. Soltys, et al. (2008). "Analysis of AAV serotypes 1-9 mediated gene expression and tropism in mice after systemic injection." Mol Ther 16(6): 1073-1080.<br><br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/ModelingTeam:Heidelberg/Modeling2010-10-28T03:48:56Z<p>AlejandroHD: /* Modeling approach to the project */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|mode_over}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
=Modeling approach to the project=<br />
As the title of our project states, '''“DNA is not enough”'''. There are several upper-level regulation systems in higher organisms. Our main idea was to use one of them to tune the expression of genes and device a tissue-specific gene therapy approach. <br />
The miBricks project consists basically of two ideas. The first is tuning of gene expression using shRNAs/miRNAs and the second is specific targeting of tissues. We intend to tune the expression of a gene by manipulating the binding affinity of a miRNA/shRNA towards the transcript of this gene which results in different expression levels. In order to do this, different binding sites for the miRNA/shRNA are introduced in the 3'UTR of the gene of interest. These binding sites differ from each other in terms of certain sequence-based features. By computational methods, we predict the binding site that should be inserted to achieve the level of expression desired. Targeting of specific tissues is achieved by introducing binding sites for tissue-specific endogenously-expressed miRNA into the construct, thus causing knockdown of a gene based on its presence or absence in the tissue. <br />
<br />
Apart from several bioinformatic tools, our team developed two independent models:<br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Neural_Network_Model Neural Network Model] takes inspiration in the biological nervous system to predict its results. It is the appropriate strategy to model complex processes and it is able to learn from experience. Neural Networks generally require a big amount of data to be fully trained. Even though the experimental data was limiting, the results agree with the experimental values and the model was able to determine the importance of the bulge size for knockdown. <br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Fuzzy_Logic_Model Fuzzy Logic Model] is combining the strength of intuitive integration of prior knowledge with a sophisticated Global Genetic optimization Algorithm. After training the model, it was able to reproduce the experimental data, especially the correlation in a 3-dimensional space of the AU content score and 3' pairing score to the knockdown percentage.<br />
<br><br />
<br><br />
To create our informatic tool to support the project, we looked at the problem from three different perspectives:<br />
<br />
- Adjustment of gene expression in specific tissues.<br />
<br />
- Tuning expression level accurately.<br />
<br />
- Predefine a construct to be used.<br />
<br />
These three paths were the inspiration behind creating miBEAT (miRNA Binding Site Engineering and Assembly Tool), to provide a strategy which makes possible to control the expression of genes in a specific way between tissues. <br />
<br />
To make this work, we tried to match the functionalities of the tool to our experimental project. Additionally, we provided a strategy that guides the user through the cloning process and allows them to use characterized standard parts sent to the MIT parts registry. <br />
<br />
Our complete work is present in the form of a graphical user interface called [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. This tool combines and connects the output of the different models and scripts and then generates a suitable miTuner construct that will express the gene of interest, miGENE, up or down to the desired level. miBEAT consists of three subparts; miRockdown, miBS designer and mUTING. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] is the subpart which contains two computational models that work on different concepts: Neural Network and Fuzzy Logic plus the experimentally obtained data. <br />
The models are sequentially associated with a script based on [http://www.targetscan.org/ Target Scan] algorithm. miRockdown takes as an input the desired knockdown percentage and the sequence of shRNAmir and gives out binding site parameters that are then compared with model predictions to finally generate the appropriate binding site. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miBSdesigner miBS designer] is available as a stand alone for generating customized binding sites, but a modified version of it is also a part of miBEAT, in charge of generating more than 2000 different binding sites for every miRNA sequences, following more than 135 combinations of regions. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miSpec mUTING] provides the tissue specific targeting function to the GUI. It uses literature data for miRNA expression in various tissues and can output miRNA binding sites that could be used to differentiate between target and off target tissues.<br />
<br />
=miRNA binding site features=<br />
<br />
miRNA are non-coding regulatory RNAs functioning as post-transcriptional gene silencers. After they are processed, they are usually 22 nucleotides long and they usually bind to the 3’UTR region of the mRNA (although they can also bind to the ORF or to the 5'UTR), forcing the mRNA into degradation or just repressing translation [Bartel, 2004].<br />
<br />
In vegetal organisms, miRNA usually bind to the mRNA with extensive complementarity. In animals, interactions are more inexact, creating a lot of uncertainty in the in silico prediction of targets[?].<br />
<br />
The seed of the miRNA is usually defined as the region centered in the nucleotides 2-7 in the 5’ end of the miRNA. For an efficient binding site extensive pairing is usually required between the seed and the corresponding part of the mRNA. The seed, and the corresponding pairing sequence of the mRNA are located inside the AGO protein. <br />
<br />
Common types of miRNA seeds: <br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA. <br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1. <br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA. <br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1. <br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA (Friedman et al. 2008)<br />
<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800 px]]<br><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites. Notice the different types of seeds.</center><br />
<br />
Outside the seed, the existence of supplemental pairing (at least 3 contiguous nucleotides and at best centered in nucleotides 13-16 of the miRNA) stabilizes the bound complex and increases the efficacy of the binding site.<br />
<br />
Binding sites with a high local AU content around the binding site have proven to be more effective (possibly because of the destabilization of the mRNA secondary structure around the site).<br />
An arginine at position one of the binding site supposedly binds to a different protein of the RISC complex [Bartel, 2009], thus increasing the binding site efficiency significantly.<br />
<br />
Binding sites at the end or the beginning of the 3'UTR are more efficient. Binding sites within the first 15 nucleotides after the stop codon are not effective, since this region of the mRNA is inside the ribosome when translations stops. Thus a bound RISC complex in this region will dissociate after every round of translation and can not follow it's usual mode of action [Grimson et al., 2007].<br />
<br />
<br />
=Tissue specific miRNAs=<br />
<br />
A useful supplement to achieving tuned gene expression in cells is the ability to specifically target tissues where this should be carried out. Tissue targeting has, for quite some time, been an important field of research that has drawn much attention and is central to gene therapy. [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT] tool not only allows generation of binding sites that regulate the level of expression of a desired gene, but also employs strategies that help target the right tissue and exclude expression in others. This functionality is based on the principle of using tissue specific miRNA binding sites which can be introduced in the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner] construct easily. <br />
<br />
A smart way to specifically target tissues is to exploit the presence and absence of tissue specific endogenous miRNA in the target to specifically express or exclude expression of the gene of interest in the target. We make use of two of strategies based on this principle, namely, on-targeting and off-targeting. The off-targeting concept has been applied previously {{HDref|Wenfang Shi et.al., 2008}} wherein an endogenous miRNA is selected such that it is not present in the target tissue (therefore the gene is expressed) and is present in all the off target tissues (knockdown of the transcripts). Thus the gene is specifically expressed in the target.<br />
<br />
In addition to the off-targeting strategy, we designed a new strategy, the on-targeting. In this case, the miRNA is present in the target tissue and excluded from the off-targets. The binding site for this miRNA is present within the 3'UTR of a repressor gene (in our case TET/O2) construct. The operator for the repressor in turn precedes the gene of interest (miGene) in the miTuner or miMeasure constructs. Therefore in the presence of miRNA in the cell, repressor is degraded and miGene is expressed while in off-targets repressor is translated and represses the expression of miGene.<br />
<br />
<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/ModelingTeam:Heidelberg/Modeling2010-10-28T03:40:48Z<p>AlejandroHD: /* Modeling approach to the project */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|mode_over}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
=Modeling approach to the project=<br />
As the title of our project states, '''“DNA is not enough”'''. There are several upper-level regulation systems in higher organisms. Our main idea was to use one of them to tune the expression of genes and device a tissue-specific gene therapy approach. <br />
The miBricks project consists basically of two ideas. The first is tuning of gene expression using shRNAs/miRNAs and the second is specific targeting of tissues. We intend to tune the expression of a gene by manipulating the binding affinity of a miRNA/shRNA towards the transcript of this gene which results in different expression levels. In order to do this, different binding sites for the miRNA/shRNA are introduced in the 3'UTR of the gene of interest. These binding sites differ from each other in terms of certain sequence-based features. By computational methods, we predict the binding site that should be inserted to achieve the level of expression desired. Targeting of specific tissues is achieved by introducing binding sites for tissue-specific endogenously-expressed miRNA into the construct, thus causing knockdown of a gene based on its presence or absence in the tissue. <br />
<br />
Apart from several bioinformatic tools, our team developed two independent models:<br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Neural_Network_Model Neural Network Model] takes inspiration in the biological nervous system to predict its results. It is the appropriate strategy to model complex processes and it is able to learn from experience. Even if the experimental data were not enough to fully train the model, the results agree with the experimental values and the model was able to determine the importance of the bulge size for the knockdown. <br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Fuzzy_Logic_Model Fuzzy Logic Model] is combining the strength of intuitive integration of prior knowledge with a sophisticated Global Genetic optimization Algorithm. After training the model, it was able to reproduce the experimental data, especially the correlation in a 3-dimensional space of the AU content score and 3' pairing score to the knockdown percentage.<br />
<br><br />
<br><br />
To create our informatic tool to support the project, we looked at the problem from three different perspectives:<br />
<br />
- Adjustment of gene expression in specific tissues.<br />
<br />
- Tune the expression level accurately.<br />
<br />
- Predefine a construct to be used.<br />
<br />
These three paths were the inspiration behind creating miBEAT (miRNA Binding Site Engineering and Assembly Tool), to provide a strategy which makes possible to control the expression of genes in a specific way between tissues. <br />
<br />
To make this work, we tried to match the functionalities of the tool to our experimental project. Additionally, we provided a strategy that guides the user through the cloning process and allows them to use characterized standard parts sent to the MIT parts registry. <br />
<br />
Our complete work is present in the form of a graphical user interface called [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. This tool combines and connects the output of the different models and scripts and then generates a suitable miTuner construct that will express the gene of interest, miGENE, up or down to the desired level. miBEAT consists of three subparts; miRockdown, miBS designer and mUTING. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] is the subpart which contains two computational models that work on different concepts: Neural Network and Fuzzy Logic plus the experimentally obtained data. <br />
The models are sequentially associated with a script based on [http://www.targetscan.org/ Target Scan] algorithm. miRockdown takes as an input the desired knockdown percentage and the sequence of shRNAmir and gives out binding site parameters that are then compared with model predictions to finally generate the appropriate binding site. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miBSdesigner miBS designer] is available as a stand alone for generating customized binding sites, but a modified version of it is also a part of miBEAT, in charge of generating more than 2000 different binding sites for every miRNA sequences, following more than 135 combinations of regions. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miSpec mUTING] provides the tissue specific targeting function to the GUI. It uses literature data for miRNA expression in various tissues and can output miRNA binding sites that could be used to differentiate between target and off target tissues.<br />
<br />
=miRNA binding site features=<br />
<br />
miRNA are non-coding regulatory RNAs functioning as post-transcriptional gene silencers. After they are processed, they are usually 22 nucleotides long and they usually bind to the 3’UTR region of the mRNA (although they can also bind to the ORF or to the 5'UTR), forcing the mRNA into degradation or just repressing translation [Bartel, 2004].<br />
<br />
In vegetal organisms, miRNA usually bind to the mRNA with extensive complementarity. In animals, interactions are more inexact, creating a lot of uncertainty in the in silico prediction of targets[?].<br />
<br />
The seed of the miRNA is usually defined as the region centered in the nucleotides 2-7 in the 5’ end of the miRNA. For an efficient binding site extensive pairing is usually required between the seed and the corresponding part of the mRNA. The seed, and the corresponding pairing sequence of the mRNA are located inside the AGO protein. <br />
<br />
Common types of miRNA seeds: <br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA. <br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1. <br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA. <br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1. <br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA (Friedman et al. 2008)<br />
<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800 px]]<br><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites. Notice the different types of seeds.</center><br />
<br />
Outside the seed, the existence of supplemental pairing (at least 3 contiguous nucleotides and at best centered in nucleotides 13-16 of the miRNA) stabilizes the bound complex and increases the efficacy of the binding site.<br />
<br />
Binding sites with a high local AU content around the binding site have proven to be more effective (possibly because of the destabilization of the mRNA secondary structure around the site).<br />
An arginine at position one of the binding site supposedly binds to a different protein of the RISC complex [Bartel, 2009], thus increasing the binding site efficiency significantly.<br />
<br />
Binding sites at the end or the beginning of the 3'UTR are more efficient. Binding sites within the first 15 nucleotides after the stop codon are not effective, since this region of the mRNA is inside the ribosome when translations stops. Thus a bound RISC complex in this region will dissociate after every round of translation and can not follow it's usual mode of action [Grimson et al., 2007].<br />
<br />
<br />
=Tissue specific miRNAs=<br />
<br />
A useful supplement to achieving tuned gene expression in cells is the ability to specifically target tissues where this should be carried out. Tissue targeting has, for quite some time, been an important field of research that has drawn much attention and is central to gene therapy. [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT] tool not only allows generation of binding sites that regulate the level of expression of a desired gene, but also employs strategies that help target the right tissue and exclude expression in others. This functionality is based on the principle of using tissue specific miRNA binding sites which can be introduced in the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner] construct easily. <br />
<br />
A smart way to specifically target tissues is to exploit the presence and absence of tissue specific endogenous miRNA in the target to specifically express or exclude expression of the gene of interest in the target. We make use of two of strategies based on this principle, namely, on-targeting and off-targeting. The off-targeting concept has been applied previously {{HDref|Wenfang Shi et.al., 2008}} wherein an endogenous miRNA is selected such that it is not present in the target tissue (therefore the gene is expressed) and is present in all the off target tissues (knockdown of the transcripts). Thus the gene is specifically expressed in the target.<br />
<br />
In addition to the off-targeting strategy, we designed a new strategy, the on-targeting. In this case, the miRNA is present in the target tissue and excluded from the off-targets. The binding site for this miRNA is present within the 3'UTR of a repressor gene (in our case TET/O2) construct. The operator for the repressor in turn precedes the gene of interest (miGene) in the miTuner or miMeasure constructs. Therefore in the presence of miRNA in the cell, repressor is degraded and miGene is expressed while in off-targets repressor is translated and represses the expression of miGene.<br />
<br />
<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/ModelingTeam:Heidelberg/Modeling2010-10-28T03:29:55Z<p>AlejandroHD: /* Modeling approach to the project */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|mode_over}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
=Modeling approach to the project=<br />
As the title of our project states, '''“DNA is not enough”'''. There are several upper-level regulation systems in higher organisms. Our main idea was using one of them to tune down the expression of genes, having tissue-specific, exactly tuned gene therapy as objective. <br />
The miBricks project consists of, basically, two ideas. The first is tuning of gene expression using shRNAs/miRNAs and the second is specific targeting of tissues. We intend to tune the expression of a gene by manipulating the binding affinity of a miRNA/shRNA towards the transcript of this gene which results in different expression levels. In order to do this, different binding sites for the miRNA/shRNA are introduced in the 3'UTR of the gene of interest. These binding sites differ from each other on certain sequence-based features. By computational methods, we predict the binding site that should be inserted to achieve the level of expression desired. Targeting of specific tissues is achieved by introducing binding sites for tissue-specific endogenously-expressed miRNA into the construct, thus causing knockdown of a gene based on its presence or absence in the tissue. <br />
<br />
Apart from several bioinformatic tools, our team developed two independent models:<br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Neural_Network_Model Neural Network Model] takes inspiration in the biological nervous system to predict its results. It is the appropriate strategy to model complex processes and it is able to learn from experience. Even if the experimental data were not enough to fully train the model, the results agree with the experimental values and the model was able to determine the importance of the bulge size for the knockdown. <br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Fuzzy_Logic_Model Fuzzy Logic Model] is combining the strength of intuitive integration of prior knowledge with a sophisticated Global Genetic optimization Algorithm. After training the model, it was able to reproduce the experimental data, especially the correlation in a 3-dimensional space of the AU content score and 3' pairing score to the knockdown percentage.<br />
<br><br />
<br><br />
To create our informatic tool to support the project, we looked at the problem from three different perspectives:<br />
<br />
- Adjustment of gene expression in specific tissues.<br />
<br />
- Tune the expression level accurately.<br />
<br />
- Predefine a construct to be used.<br />
<br />
These three paths were the inspiration behind creating miBEAT (miRNA Binding Site Engineering and Assembly Tool), to provide a strategy which makes possible to control the expression of genes in a specific way between tissues. <br />
<br />
To make this work, we tried to match the functionalities of the tool to our experimental project. Additionally, we provided a strategy that guides the user through the cloning process and allows them to use characterized standard parts sent to the MIT parts registry. <br />
<br />
Our complete work is present in the form of a graphical user interface called [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. This tool combines and connects the output of the different models and scripts and then generates a suitable miTuner construct that will express the gene of interest, miGENE, up or down to the desired level. miBEAT consists of three subparts; miRockdown, miBS designer and mUTING. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] is the subpart which contains two computational models that work on different concepts: Neural Network and Fuzzy Logic plus the experimentally obtained data. <br />
The models are sequentially associated with a script based on [http://www.targetscan.org/ Target Scan] algorithm. miRockdown takes as an input the desired knockdown percentage and the sequence of shRNAmir and gives out binding site parameters that are then compared with model predictions to finally generate the appropriate binding site. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miBSdesigner miBS designer] is available as a stand alone for generating customized binding sites, but a modified version of it is also a part of miBEAT, in charge of generating more than 2000 different binding sites for every miRNA sequences, following more than 135 combinations of regions. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miSpec mUTING] provides the tissue specific targeting function to the GUI. It uses literature data for miRNA expression in various tissues and can output miRNA binding sites that could be used to differentiate between target and off target tissues.<br />
<br />
=miRNA binding site features=<br />
<br />
miRNA are non-coding regulatory RNAs functioning as post-transcriptional gene silencers. After they are processed, they are usually 22 nucleotides long and they usually bind to the 3’UTR region of the mRNA (although they can also bind to the ORF or to the 5'UTR), forcing the mRNA into degradation or just repressing translation [Bartel, 2004].<br />
<br />
In vegetal organisms, miRNA usually bind to the mRNA with extensive complementarity. In animals, interactions are more inexact, creating a lot of uncertainty in the in silico prediction of targets[?].<br />
<br />
The seed of the miRNA is usually defined as the region centered in the nucleotides 2-7 in the 5’ end of the miRNA. For an efficient binding site extensive pairing is usually required between the seed and the corresponding part of the mRNA. The seed, and the corresponding pairing sequence of the mRNA are located inside the AGO protein. <br />
<br />
Common types of miRNA seeds: <br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA. <br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1. <br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA. <br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1. <br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA (Friedman et al. 2008)<br />
<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800 px]]<br><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites. Notice the different types of seeds.</center><br />
<br />
Outside the seed, the existence of supplemental pairing (at least 3 contiguous nucleotides and at best centered in nucleotides 13-16 of the miRNA) stabilizes the bound complex and increases the efficacy of the binding site.<br />
<br />
Binding sites with a high local AU content around the binding site have proven to be more effective (possibly because of the destabilization of the mRNA secondary structure around the site).<br />
An arginine at position one of the binding site supposedly binds to a different protein of the RISC complex [Bartel, 2009], thus increasing the binding site efficiency significantly.<br />
<br />
Binding sites at the end or the beginning of the 3'UTR are more efficient. Binding sites within the first 15 nucleotides after the stop codon are not effective, since this region of the mRNA is inside the ribosome when translations stops. Thus a bound RISC complex in this region will dissociate after every round of translation and can not follow it's usual mode of action [Grimson et al., 2007].<br />
<br />
About the mechanistics of the repression, it has been shown that the repressive effect is much higher when the binding site for the miRNA is in the first 15 nt of the 3'UTR. This would match the hypothesis.... BLA BLA BLA<br />
<br />
Targetscan Scores Jan ????<br />
<br />
<br />
=Tissue specific miRNAs=<br />
<br />
A useful supplement to achieving tuned gene expression in cells is the ability to specifically target tissues where this should be carried out. Tissue targeting has, for quite some time, been an important field of research that has drawn much attention and is central to gene therapy. [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT] tool not only allows generation of binding sites that regulate the level of expression of a desired gene, but also employs strategies that help target the right tissue and exclude expression in others. This functionality is based on the principle of using tissue specific miRNA binding sites which can be introduced in the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner] construct easily. <br />
<br />
A smart way to specifically target tissues is to exploit the presence and absence of tissue specific endogenous miRNA in the target to specifically express or exclude expression of the gene of interest in the target. We make use of two of strategies based on this principle, namely, on-targeting and off-targeting. The off-targeting concept has been applied previously {{HDref|Wenfang Shi et.al., 2008}} wherein an endogenous miRNA is selected such that it is not present in the target tissue (therefore the gene is expressed) and is present in all the off target tissues (knockdown of the transcripts). Thus the gene is specifically expressed in the target.<br />
<br />
In addition to the off-targeting strategy, we designed a new strategy, the on-targeting. In this case, the miRNA is present in the target tissue and excluded from the off-targets. The binding site for this miRNA is present within the 3'UTR of a repressor gene (in our case TET/O2) construct. The operator for the repressor in turn precedes the gene of interest (miGene) in the miTuner or miMeasure constructs. Therefore in the presence of miRNA in the cell, repressor is degraded and miGene is expressed while in off-targets repressor is translated and represses the expression of miGene.<br />
<br />
==References==<br />
<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/ModelingTeam:Heidelberg/Modeling2010-10-28T03:10:37Z<p>AlejandroHD: /* Modeling approach to the project */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|mode_over}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
=Modeling approach to the project=<br />
As the title of our project states, '''“DNA is not enough”'''. There are several upper-level regulation systems in higher organisms. Our main idea was using one of them to tune down the expression of genes, having tissue-specific, exactly tuned gene therapy as objective.<br />
<br />
To create our informatic tool to support the project, we looked at the problem from three different angles:<br />
<br />
- Adjustment of gene expression in specific tissues.<br />
<br />
- Tune the expression level accurately.<br />
<br />
- Predefine a construct to be used.<br />
<br />
These three paths were the inspiration behind creating miBEAT (miRNA Binding Site Engineering and Assembly Tool), to provide a strategy which makes possible to control the expression of genes in a specific way between tissues. <br />
<br />
To make this work, we tried to match the functionalities of the tool to our experimental project. Additionally, we provided a strategy that guides the user through the cloning process and allows them to use characterized standard parts sent to the MIT parts registry. <br />
<br />
The miBricks project consists roughly of two ideas. The first being tuning of gene expression using shRNAs/miRNAs and the second being specific targeting of tissues. We intend to tune the expression of a gene by manipulating the binding affinity of a miRNA/shRNA towards the transcript of this gene which results in different expression levels. This is brought about by introducing different binding sites for the miRNA/shRNA in the 3'UTR of the gene. These binding sites differ from each other based on certain sequence based features. By computational methods, we can predict binding site that should be inserted so as to achieve a desired level of expression. Targeting of specific tissues is achieved by introducing binding sites for tissue specific endogenously expressed miRNA into the construct that brings about knockdown of a gene based on its presence or absence in the tissue. <br />
<br />
Our complete work is present in the form of a graphical user interface called [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. This tool combines and connects the output of the different models and scripts and then generates a suitable miTuner construct that will express the gene of interest, miGENE, up or down to the desired level. miBEAT consists of three subparts; miRockdown, miBS designer and mUTING. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] is the subpart which contains two computational models that work on different concepts: Neural Network and Fuzzy Logic plus the experimentally obtained data. <br />
The models are sequentially associated with a script based on [http://www.targetscan.org/ Target Scan] algorithm. miRockdown takes as an input the desired knockdown percentage and the sequence of shRNAmir and gives out binding site parameters that are then compared with model predictions to finally generate the appropriate binding site. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miBSdesigner miBS designer] is available as a stand alone for generating customized binding sites, but a modified version of it is also a part of miBEAT, in charge of generating more than 2000 different binding sites for every miRNA sequences, following more than 135 combinations of regions. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miSpec mUTING] provides the tissue specific targeting function to the GUI. It uses literature data for miRNA expression in various tissues and can output miRNA binding sites that could be used to differentiate between target and off target tissues.<br />
<br />
Apart from all of these tools, our team also developed two independent models:<br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Neural_Network_Model Neural Network Model] takes inspiration in the biological nervous system to predict its results. It is the appropriate strategy to model complex processes and it is able to learn from experience. Even if the experimental data were not enough to fully train the model, the results agree with the experimental values and the model was able to determine the importance of the bulge size for the knockdown. <br />
<br />
The [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#Fuzzy_Logic_Model Fuzzy Logic Model] is combining the strength of intuitive integration of prior knowledge with a sophisticated Global Genetic optimization Algorithm. The results<br />
<br />
=miRNA binding site features=<br />
<br />
miRNA are non-coding regulatory RNAs functioning as post-transcriptional gene silencers. After they are processed, they are usually 22 nucleotides long and they usually bind to the 3’UTR region of the mRNA (although they can also bind to the ORF or to the 5'UTR), forcing the mRNA into degradation or just repressing translation [Bartel, 2004].<br />
<br />
In vegetal organisms, miRNA usually bind to the mRNA with extensive complementarity. In animals, interactions are more inexact, creating a lot of uncertainty in the in silico prediction of targets[?].<br />
<br />
The seed of the miRNA is usually defined as the region centered in the nucleotides 2-7 in the 5’ end of the miRNA. For an efficient binding site extensive pairing is usually required between the seed and the corresponding part of the mRNA. The seed, and the corresponding pairing sequence of the mRNA are located inside the AGO protein. <br />
<br />
Common types of miRNA seeds: <br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA. <br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1. <br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA. <br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1. <br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA (Friedman et al. 2008)<br />
<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800 px]]<br><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites. Notice the different types of seeds.</center><br />
<br />
Outside the seed, the existence of supplemental pairing (at least 3 contiguous nucleotides and at best centered in nucleotides 13-16 of the miRNA) stabilizes the bound complex and increases the efficacy of the binding site.<br />
<br />
Binding sites with a high local AU content around the binding site have proven to be more effective (possibly because of the destabilization of the mRNA secondary structure around the site).<br />
An arginine at position one of the binding site supposedly binds to a different protein of the RISC complex [Bartel, 2009], thus increasing the binding site efficiency significantly.<br />
<br />
Binding sites at the end or the beginning of the 3'UTR are more efficient. Binding sites within the first 15 nucleotides after the stop codon are not effective, since this region of the mRNA is inside the ribosome when translations stops. Thus a bound RISC complex in this region will dissociate after every round of translation and can not follow it's usual mode of action [Grimson et al., 2007].<br />
<br />
About the mechanistics of the repression, it has been shown that the repressive effect is much higher when the binding site for the miRNA is in the first 15 nt of the 3'UTR. This would match the hypothesis.... BLA BLA BLA<br />
<br />
Targetscan Scores Jan ????<br />
<br />
<br />
=Tissue specific miRNAs=<br />
<br />
A useful supplement to achieving tuned gene expression in cells is the ability to specifically target tissues where this should be carried out. Tissue targeting has, for quite some time, been an important field of research that has drawn much attention and is central to gene therapy. [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT] tool not only allows generation of binding sites that regulate the level of expression of a desired gene, but also employs strategies that help target the right tissue and exclude expression in others. This functionality is based on the principle of using tissue specific miRNA binding sites which can be introduced in the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner] construct easily. <br />
<br />
A smart way to specifically target tissues is to exploit the presence and absence of tissue specific endogenous miRNA in the target to specifically express or exclude expression of the gene of interest in the target. We make use of two of strategies based on this principle, namely, on-targeting and off-targeting. The off-targeting concept has been applied previously {{HDref|Wenfang Shi et.al., 2008}} wherein an endogenous miRNA is selected such that it is not present in the target tissue (therefore the gene is expressed) and is present in all the off target tissues (knockdown of the transcripts). Thus the gene is specifically expressed in the target.<br />
<br />
In addition to the off-targeting strategy, we designed a new strategy, the on-targeting. In this case, the miRNA is present in the target tissue and excluded from the off-targets. The binding site for this miRNA is present within the 3'UTR of a repressor gene (in our case TET/O2) construct. The operator for the repressor in turn precedes the gene of interest (miGene) in the miTuner or miMeasure constructs. Therefore in the presence of miRNA in the cell, repressor is degraded and miGene is expressed while in off-targets repressor is translated and represses the expression of miGene.<br />
<br />
==References==<br />
<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/ModelingTeam:Heidelberg/Modeling2010-10-28T02:34:06Z<p>AlejandroHD: </p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|mode_over}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
=Modeling approach to the project=<br />
As the title of our project states, '''“DNA is not enough”'''. There are several upper-level regulation systems in higher organisms. Our main idea was using one of them to tune down the expression of genes, having tissue-specific, exactly tuned gene therapy as objective.<br />
<br />
To create our informatic tool to support the project, we looked at the problem from three different angles: <br />
- Adjustment of gene expression in specific tissues.<br />
<br />
- Tune the expression level accurately.<br />
<br />
- Predefine a construct to be used.<br />
<br />
These three paths were the inspiration behind creating miBEAT (miRNA Binding Site Engineering and Assembly Tool), to provide a strategy which makes possible to control the expression of genes in a specific way between tissues. <br />
<br />
To make this work, we tried to match the functionalities of the tool to our experimental project. Additionally, we provided a strategy that guides the user through the cloning process and allows them to use characterized standard parts existent in the MIT parts registry. <br />
<br />
The miBricks project consists roughly of two ideas. The first being tuning of gene expression using shRNAs/miRNAs and the second being specific targeting of tissues. We intend to tune the expression of a gene by manipulating the binding affinity of a miRNA/shRNA towards the transcript of this gene which results in different expression levels. This is brought about by introducing different binding sites for the miRNA/shRNA in the 3'UTR of the gene. These binding sites differ from each other based on certain sequence based features. By computational methods, we can predict binding site that should be inserted so as to achieve a desired level of expression. Targeting of specific tissues is achieved by introducing binding sites for tissue specific endogenously expressed miRNA into the construct that brings about knockdown of a gene based on its presence or absence in the tissue. <br />
<br />
Our complete work is present in the form of a graphical user interface called [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. This tool combines and connects the output of different models and scripts and then generates a suitable miTuner construct that expresses the gene of interest, miGENE, up to the desired level. miBEAT consists of three subparts; miRockdown, miBS designer and mUTING. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] is the subpart which contains two computational models that work on different concepts; Neural Network and Fuzzy Logic. These models are also associated with a script based on [http://www.targetscan.org/ Target Scan] algorithm. miRockdown takes as an input the desired knockdown percentage and the sequence of shRNAmir and gives out, binding site parameters that are then compared with model predictions to finally generate the appropriate binding site. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miBSdesigner miBS designer] is incorporated within miBEAT and is also available as a stand alone for generating customized binding sites. <br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/miSpec mUTING] provides tissue specific targeting function to the GUI. It uses literature data for miRNA expression in various tissues and can output miRNA binding sites that could be used to differentiate between target and off target tissues.<br />
<br />
=miRNA binding site features=<br />
<br />
miRNA are non-coding regulatory RNAs functioning as post-transcriptional gene silencers. After they are processed, they are usually 22 nucleotides long and they usually bind to the 3’UTR region of the mRNA (although they can also bind to the ORF or to the 5'UTR), forcing the mRNA into degradation or just repressing translation [Bartel, 2004].<br />
<br />
In vegetal organisms, miRNA usually bind to the mRNA with extensive complementarity. In animals, interactions are more inexact, creating a lot of uncertainty in the in silico prediction of targets[?].<br />
<br />
The seed of the miRNA is usually defined as the region centered in the nucleotides 2-7 in the 5’ end of the miRNA. For an efficient binding site extensive pairing is usually required between the seed and the corresponding part of the mRNA. The seed, and the corresponding pairing sequence of the mRNA are located inside the AGO protein. <br />
<br />
Common types of miRNA seeds: <br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA. <br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1. <br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA. <br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1. <br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA (Friedman et al. 2008)<br />
<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800 px]]<br><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites. Notice the different types of seeds.</center><br />
<br />
Outside the seed, the existence of supplemental pairing (at least 3 contiguous nucleotides and at best centered in nucleotides 13-16 of the miRNA) stabilizes the bound complex and increases the efficacy of the binding site.<br />
<br />
Binding sites with a high local AU content around the binding site have proven to be more effective (possibly because of the destabilization of the mRNA secondary structure around the site).<br />
An arginine at position one of the binding site supposedly binds to a different protein of the RISC complex [Bartel, 2009], thus increasing the binding site efficiency significantly.<br />
<br />
Binding sites at the end or the beginning of the 3'UTR are more efficient. Binding sites within the first 15 nucleotides after the stop codon are not effective, since this region of the mRNA is inside the ribosome when translations stops. Thus a bound RISC complex in this region will dissociate after every round of translation and can not follow it's usual mode of action [Grimson et al., 2007].<br />
<br />
About the mechanistics of the repression, it has been shown that the repressive effect is much higher when the binding site for the miRNA is in the first 15 nt of the 3'UTR. This would match the hypothesis.... BLA BLA BLA<br />
<br />
Targetscan Scores Jan ????<br />
<br />
<br />
=Tissue specific miRNAs=<br />
<br />
A useful supplement to achieving tuned gene expression in cells is the ability to specifically target tissues where this should be carried out. Tissue targeting has, for quite some time, been an important field of research that has drawn much attention and is central to gene therapy. [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT] tool not only allows generation of binding sites that regulate the level of expression of a desired gene, but also employs strategies that help target the right tissue and exclude expression in others. This functionality is based on the principle of using tissue specific miRNA binding sites which can be introduced in the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner] construct easily. <br />
<br />
A smart way to specifically target tissues is to exploit the presence and absence of tissue specific endogenous miRNA in the target to specifically express or exclude expression of the gene of interest in the target. We make use of two of strategies based on this principle, namely, on-targeting and off-targeting. The off-targeting concept has been applied previously {{HDref|Wenfang Shi et.al., 2008}} wherein an endogenous miRNA is selected such that it is not present in the target tissue (therefore the gene is expressed) and is present in all the off target tissues (knockdown of the transcripts). Thus the gene is specifically expressed in the target.<br />
<br />
In addition to the off-targeting strategy, we designed a new strategy, the on-targeting. In this case, the miRNA is present in the target tissue and excluded from the off-targets. The binding site for this miRNA is present within the 3'UTR of a repressor gene (in our case TET/O2) construct. The operator for the repressor in turn precedes the gene of interest (miGene) in the miTuner or miMeasure constructs. Therefore in the presence of miRNA in the cell, repressor is degraded and miGene is expressed while in off-targets repressor is translated and represses the expression of miGene.<br />
<br />
==References==<br />
<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T02:03:12Z<p>AlejandroHD: /* Results */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16 {{HDref|Griffiths-Jones S. et al.(2008)}}. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types {{HDref|Landgraf et al.(2007)}}. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used, it can range the output between 0 and 1 or between -1 and 1 {{HDref|Kröse et al, 1996}}.<br><br />
<center>[[Image:NeuralNetwork_HD2010_image2.png|400px]]<br><br />
Figure 2: representation of the mathematical model of a biological neuron.</center><br><br />
<br><br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]<br><br />
Figure 3: Training of a Neural Network.</center><br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br></center><br />
<br><br />
<center>[[Image:regression.png|300px]] <br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br></center><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions {{HDref|Nelles O (2000)}}. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways {{HDref|Bosl et al (2007)}} and to analyze gene regulatory networks {{HDref|Laschov et al(2009)}}. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models {{HDref|Aldridge et al (2009)}} and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems {{HDref|Mamdani et al, (1975)}}. We are using a Sugeno method fuzzy inference system {{HDref|Sugeno(1985)}}, where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge {{HDref|Bartel(2009)}}<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
The data used for training the models can be accessed [https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview here]. It comprises of miRNA and corresponding binding site sequences along with the describing input parameters for the models.<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Laschov D, Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology 1:13 (2007).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, (2000).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4(3):448-472(1992)<br />
<br />
- Landgraf P., Rusu M., Sheridan R., Sewer A., Iovino N., Aravin A., Pfeffer S., Rice A., Kamphorst A.O., Landthaler M., Lin C., Socci N.D., Hermida L., Fulci V., Chiaretti S., Foa R., Schliwka J., Fuchs U., Novosel A., Muller R.U., Schermer B., Bissels U., Inman J., Phan Q., Chien M., A mammalian microRNA expression atlas based on small RNA library sequencing, Cell. 129:1401-1414 (2007).<br />
<br />
- Griffiths-Jones S., Saini H.K., van Dongen S., Enright A.J. miRBase: tools for microRNA genomics. Nucleic Acid Research. 36:D154-D158 (2008).<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T02:00:35Z<p>AlejandroHD: /* Results */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16 {{HDref|Griffiths-Jones S. et al.(2008)}}. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types {{HDref|Landgraf et al.(2007)}}. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used, it can range the output between 0 and 1 or between -1 and 1 {{HDref|Kröse et al, 1996}}.<br><br />
<center>[[Image:NeuralNetwork_HD2010_image2.png|400px]]<br><br />
Figure 2: representation of the mathematical model of a biological neuron.</center><br><br />
<br><br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]<br><br />
Figure 3: Training of a Neural Network.</center><br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br></center><br />
<br><br />
<center>[[Image:regression.png|300px]] <br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br></center><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions {{HDref|Nelles O (2000)}}. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways {{HDref|Bosl et al (2007)}} and to analyze gene regulatory networks {{HDref|Laschov et al(2009)}}. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models {{HDref|Aldridge et al (2009)}} and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems {{HDref|Mamdani et al, (1975)}}. We are using a Sugeno method fuzzy inference system {{HDref|Sugeno(1985)}}, where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge {{HDref|Bartel(2009)}}<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
The data used for training the models can be accessed [https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview here]. It comprises of miRNA and corresponding binding site sequences along with the describing input parameters for the models.<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Laschov D, Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology 1:13 (2007).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, (2000).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4(3):448-472(1992)<br />
<br />
- Landgraf P., Rusu M., Sheridan R., Sewer A., Iovino N., Aravin A., Pfeffer S., Rice A., Kamphorst A.O., Landthaler M., Lin C., Socci N.D., Hermida L., Fulci V., Chiaretti S., Foa R., Schliwka J., Fuchs U., Novosel A., Muller R.U., Schermer B., Bissels U., Inman J., Phan Q., Chien M., A mammalian microRNA expression atlas based on small RNA library sequencing, Cell. 129:1401-1414 (2007).<br />
<br />
- Griffiths-Jones S., Saini H.K., van Dongen S., Enright A.J. miRBase: tools for microRNA genomics. Nucleic Acid Research. 36:D154-D158 (2008).<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T01:56:43Z<p>AlejandroHD: /* Neural Network Model */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16 {{HDref|Griffiths-Jones S. et al.(2008)}}. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types {{HDref|Landgraf et al.(2007)}}. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used, it can range the output between 0 and 1 or between -1 and 1 {{HDref|Kröse et al, 1996}}.<br><br />
<center>[[Image:NeuralNetwork_HD2010_image2.png|400px]]<br><br />
Figure 2: representation of the mathematical model of a biological neuron.</center><br><br />
<br><br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]<br><br />
Figure 3: Training of a Neural Network.</center><br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions {{HDref|Nelles O (2000)}}. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways {{HDref|Bosl et al (2007)}} and to analyze gene regulatory networks {{HDref|Laschov et al(2009)}}. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models {{HDref|Aldridge et al (2009)}} and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems {{HDref|Mamdani et al, (1975)}}. We are using a Sugeno method fuzzy inference system {{HDref|Sugeno(1985)}}, where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge {{HDref|Bartel(2009)}}<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Laschov D, Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology 1:13 (2007).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, (2000).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4(3):448-472(1992)<br />
<br />
- Landgraf P., Rusu M., Sheridan R., Sewer A., Iovino N., Aravin A., Pfeffer S., Rice A., Kamphorst A.O., Landthaler M., Lin C., Socci N.D., Hermida L., Fulci V., Chiaretti S., Foa R., Schliwka J., Fuchs U., Novosel A., Muller R.U., Schermer B., Bissels U., Inman J., Phan Q., Chien M., A mammalian microRNA expression atlas based on small RNA library sequencing, Cell. 129:1401-1414 (2007).<br />
<br />
- Griffiths-Jones S., Saini H.K., van Dongen S., Enright A.J. miRBase: tools for microRNA genomics. Nucleic Acid Research. 36:D154-D158 (2008).<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T01:20:23Z<p>AlejandroHD: /* Neural Network theory */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types {{HDref|Landgraf et al.(2007)}}. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used, it can range the output between 0 and 1 or between -1 and 1 {{HDref|Kröse et al, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions {{HDref|Nelles O (2000)}}. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways {{HDref|Bosl et al (2007)}} and to analyze gene regulatory networks {{HDref|Laschov et al(2009)}}. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models {{HDref|Aldridge et al (2009)}} and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems {{HDref|Mamdani et al, (1975)}}. We are using a Sugeno method fuzzy inference system {{HDref|Sugeno(1985)}}, where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge {{HDref|Bartel(2009)}}<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Laschov D, Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology 1:13 (2007).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, (2000).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4(3):448-472(1992)<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T01:17:37Z<p>AlejandroHD: </p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types {{HDref|Landgraf et al.(2007)}}. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Kröse et al, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions {{HDref|Nelles O (2000)}}. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways {{HDref|Bosl et al (2007)}} and to analyze gene regulatory networks {{HDref|Laschov et al(2009)}}. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models {{HDref|Aldridge et al (2009)}} and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems {{HDref|Mamdani et al, (1975)}}. We are using a Sugeno method fuzzy inference system {{HDref|Sugeno(1985)}}, where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge {{HDref|Bartel(2009)}}<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Laschov D, Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology 1:13 (2007).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, (2000).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4(3):448-472(1992)<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:28:50Z<p>AlejandroHD: /* References */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4(3):448-472(1992)<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:26:41Z<p>AlejandroHD: /* References */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
- Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)<br />
<br />
- Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007).<br />
<br />
- Mamdani, E.H. and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1):1-13, (1975).<br />
<br />
- Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co.,(1985).<br />
<br />
- [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] <br />
Rodriguez J, Ge R, Walker K, and Bell G., Whitehead Institute for Biomedical Research. (2007,2008) <br />
<br />
- Kröse B & van der Smagt P, An introduction to Neural Networks, 8th Ed, (1996).<br />
<br />
- MacKay D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 4,(3):448-472(1992)<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:26:21Z<p>AlejandroHD: /* miBSdesigner */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel (2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A et al(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. Molecular Cell, 27:91-105 2007.<br />
<br />
An experiment in linguistic synthesis with a fuzzy logic controller. Mamdani, E.H. and S. Assilian, International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.<br />
<br />
Industrial applications of fuzzy control. Sugeno, M., Elsevier Science Pub. Co., 1985.<br />
<br />
[http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] Joe Rodriguez, Robin Ge, Kim Walker, and George Bell. Whitehead Institute for Biomedical Research. All Rights Reserved. Copyright(c) 2007,2008 <br />
<br />
Ben Kröse & Patrick van der Smagt, An introduction to Neural Networks, 8th Edition, 1996.<br />
<br />
David J. C. MacKay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol. 4, No. 3, Pages 448-472, 1992.<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Team/miThanksTeam:Heidelberg/Team/miThanks2010-10-28T00:17:37Z<p>AlejandroHD: </p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/Single_Pagetop|mithanks}}<br />
<br />
=Acknowledgements=<br />
<br><br />
First and foremost, we would like to thank '''Professor Dr. Roland Eils''' for bringing together such a diverse team of students and share not only his vision, but also his resources with us. It is far from usual to put this much trust (and money) into a student research group, and it cannot be overstated how much we appreciate it.<br />
<br />
Our greatest thanks go, without any doubt, to '''Dr. Dirk Grimm'''. His total devotion and dedication, his patience and his ability to take our project to levels that we couldn't even have dreamt of have helped to bring out the best in each of our experiments.<br />
<br />
Without '''Jens Keienburg''' none of us would have been part of iGEM or would have ever found our inner dancing abilities (that for some of us were deeply hidden). But he believed in us, even after the first disastrous dancing lessons, and made us into the first iGEM team that would also be able to win any cheerleading competition. To offset the effects of our dancing workout, he fed us with pizza whenever possible. <br />
Of course we are also thankful for all the support and the management of the lab.<br />
<br><br />
We would like thank all of our advisors, who have been a pleasure to work with (although we are not sure they can say the same thing about us), and have been extremely patient with our questions and problems. They were always there for us and even in desperate and laborious times, they stayed with us even if that meant seeing the sunrise from the BioQuant balcony. <br />
<br />
'''PhD Christina Raupp''', for helping with the planning and conducting of virus production and in vivo experiments.<br />
<br />
'''Clarissa Liesche''', who brought us the joy of ELISA, helped us with the measurements and supplied us with everything from enyzmes to energy drinks. It was also a lot of fun exploring and manipulating the wiki with her. <br />
<br />
'''Subgroup leader Joel Beaudouin''', who alwys listened and helped us measuring at the flow cytometer, SP2, SP5, epifluorescence...just every measurement.<br />
<br />
'''Kathleen Börner''', for her continuous help and scientific and non-scientific support, she was always there to help us with everything we would ask of her, all along the way. Simply a beautiful person!<br />
<br />
'''Marina Bechtle''' was always ready to help us even on a very short notice without any complaints! She's amazing! Special thanks for her help with the flow cytometry measurements.<br />
<br />
'''Paula Gonzalez''', who spent hours and hours on the microscope with us and helped us analyzing the fluorescence of thousands of cells.<br />
<br><br><br />
'''Without your help, sacrifice and dedication, we wouldn't be anywhere close to here.'''<br />
<br><br><br />
We would also like to thank '''Eike Kienle''' for his great help with the constructs, '''Nina Schuermann''' for the introduction for the luminometer and advice on the contructs, '''Stefan Mockenhaupt''' for his constructs and all groups of the BioQuant who helped us to collect billions of HEK cells in three days, '''Marlies Muernseer''', UniKlinikum Mannheim, for supplying us with Mouse Primary Hepatocytes every Monday and her incredible help (a truly nice person :) ). <br />
<br />
The modeling subteam would like to thank '''Dr. Charley Choe''' for his support, '''Dr. Rainer König''' for the nice debate and the suggestion to use neural network, and Nao, Tim and Stephen, from the '''iGEM 2009 team''', whose efforts pointed us in the right direction and give us a nice ground to built on. Thanks to Stephen also for his kind help with the RFCs and part submission.<br />
<br />
The website design team would like to thank especially '''[http://de.selfhtml.org/ SELFHTML]''' (in German). We wouldn't have been able to create such a wiki, both from the view of graphics and coding, without using this great tutorial glossary. Furthermore, it helped us avoiding many annoying forum trolls who impede any effective search for appropriate problem solutions.<br />
<br />
Very much appreciated are, of course, the '''companies, foundations and the academic sponsors''', who believed in our project and sponsored our work. <br />
<br><br />
We have to be thankful to '''our families, our partners and our friends''', who had to put up with our strange work timetables and our non availability during the last four months. Thank you all for understanding what science and iGEM meant to us during all this time and helping us to push through the harder times on our way.<br />
<br />
On a strict personal level, and without ever revealing why, we also have to thank: <br />
Nespresso, Ritter Sport (and chocolate in general), Schokakola, the people from Burkina Faso, the dead balloon syndrome, big waterfalls, self-controlled trains, random bacteria who decide to grow up where they shouldn't and contaminate our cell culture, Polish mint-flavoured Vodka, green comfy chairs, table football, explicit hip-hop songs, our mystery coffee buddy, mojitos, tigers, easy-to-strip lab coats, emotional breakdowns in the lab, all of the Harry Potter characters, terrific Mensa food, whoever once forgot to lock the door to the great 7th floor balcony, Bellini... <br />
<br />
<br />
===all of these made iGEM experience... truly unique and unforgettable.=== <br />
==Thank you!!!==<br />
<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:07:32Z<p>AlejandroHD: /* Supplementary Region */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge){{HDref|Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. Molecular Cell, 27:91-105 2007.<br />
<br />
An experiment in linguistic synthesis with a fuzzy logic controller. Mamdani, E.H. and S. Assilian, International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.<br />
<br />
Industrial applications of fuzzy control. Sugeno, M., Elsevier Science Pub. Co., 1985.<br />
<br />
[http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] Joe Rodriguez, Robin Ge, Kim Walker, and George Bell. Whitehead Institute for Biomedical Research. All Rights Reserved. Copyright(c) 2007,2008 <br />
<br />
Ben Kröse & Patrick van der Smagt, An introduction to Neural Networks, 8th Edition, 1996.<br />
<br />
David J. C. MacKay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol. 4, No. 3, Pages 448-472, 1992.<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:06:19Z<p>AlejandroHD: /* Supplementary Region */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge)[https://2010.igem.org/Team:Heidelberg/Modeling/descriptions#references {{HDref|Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007)}}.<br />
In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. Molecular Cell, 27:91-105 2007.<br />
<br />
An experiment in linguistic synthesis with a fuzzy logic controller. Mamdani, E.H. and S. Assilian, International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.<br />
<br />
Industrial applications of fuzzy control. Sugeno, M., Elsevier Science Pub. Co., 1985.<br />
<br />
[http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] Joe Rodriguez, Robin Ge, Kim Walker, and George Bell. Whitehead Institute for Biomedical Research. All Rights Reserved. Copyright(c) 2007,2008 <br />
<br />
Ben Kröse & Patrick van der Smagt, An introduction to Neural Networks, 8th Edition, 1996.<br />
<br />
David J. C. MacKay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol. 4, No. 3, Pages 448-472, 1992.<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:03:52Z<p>AlejandroHD: /* Seed Types */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA {{HDref|Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)}}.<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge)[https://2010.igem.org/Team:Heidelberg/Modeling/descriptions#references [Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007)]]. In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. Molecular Cell, 27:91-105 2007.<br />
<br />
An experiment in linguistic synthesis with a fuzzy logic controller. Mamdani, E.H. and S. Assilian, International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.<br />
<br />
Industrial applications of fuzzy control. Sugeno, M., Elsevier Science Pub. Co., 1985.<br />
<br />
[http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] Joe Rodriguez, Robin Ge, Kim Walker, and George Bell. Whitehead Institute for Biomedical Research. All Rights Reserved. Copyright(c) 2007,2008 <br />
<br />
Ben Kröse & Patrick van der Smagt, An introduction to Neural Networks, 8th Edition, 1996.<br />
<br />
David J. C. MacKay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol. 4, No. 3, Pages 448-472, 1992.<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-28T00:00:59Z<p>AlejandroHD: /* Parameterization Concept */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA [https://2010.igem.org/Team:Heidelberg/Modeling/descriptions#references [Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)]].<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge)[https://2010.igem.org/Team:Heidelberg/Modeling/descriptions#references [Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007)]]. In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
These surface fits show the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. Molecular Cell, 27:91-105 2007.<br />
<br />
An experiment in linguistic synthesis with a fuzzy logic controller. Mamdani, E.H. and S. Assilian, International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.<br />
<br />
Industrial applications of fuzzy control. Sugeno, M., Elsevier Science Pub. Co., 1985.<br />
<br />
[http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] Joe Rodriguez, Robin Ge, Kim Walker, and George Bell. Whitehead Institute for Biomedical Research. All Rights Reserved. Copyright(c) 2007,2008 <br />
<br />
Ben Kröse & Patrick van der Smagt, An introduction to Neural Networks, 8th Edition, 1996.<br />
<br />
David J. C. MacKay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol. 4, No. 3, Pages 448-472, 1992.<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHDhttp://2010.igem.org/Team:Heidelberg/Modeling/descriptionsTeam:Heidelberg/Modeling/descriptions2010-10-27T23:59:08Z<p>AlejandroHD: /* Parameterization Concept */</p>
<hr />
<div>{{:Team:Heidelberg/Single}}<br />
{{:Team:Heidelberg/tables|normal=FFF|highlight=ddd}}<br />
<br />
{{:Team:Heidelberg/Single_Pagetop|modelset}}<br />
{{:Team:Heidelberg/Side_Top}}<br />
<br />
__TOC__<br />
<br />
{{:Team:Heidelberg/Side_Bottom}}<br />
<br />
=miBEAT:=<br />
<br />
miBEAT ('''mi'''RNA '''B'''inding site '''E'''ngineering and '''A'''ssembly '''T'''ool) is a graphical user interface that has as its back-end a compilation of multiple individual models and scripts which interact with each other to generate constructs. <br />
<br />
==miRockdown==<br />
<br />
There is an urgent need for an easy-to-use tool that generates a binding site the user can use to modify protein levels. <br />
Several tools can predict mRNA knockdown, but our approach aims to the final objective: protein levels (specially for medical applications like gene therapy).<br />
<br />
===How to use miRockdown===<br />
Right from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We made it in the most user-friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies their needs.<br><br />
<br><br />
<center>[[Image:Modscheme.png|400px]]<br><br><br />
<div style="font-size:0.95em;" width="400"><b>Overview of the miRockdown script flow.</b><br><br />
The knockdown percentage (kd%) input invokes the selection of the appropriate experimental BS or theoretical binding site parameters. The miRNA sequence starts the generation of BS sequences. Subsequently, these BS sequences are characterized by a modified TargetScan algorithm and finally the parameters of the theoretical BS are compared with the parameters of the generated BSs and the closest of the generated BSs is given as output.</div></center><br />
<br><br><br />
The results of both of our models and the experimentally verified binding sites are integrated in [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] on [https://2010.igem.org/Team:Heidelberg/Modeling/miGUI miBEAT]. <br />
For every binding site request of a user he receives the results of the three different approaches. Thus the user can always choose which of the three differently generated binding sites they want to use. <br />
The binding site with the closest experimentally observed knockdown percentage is displayed, together with its properties and oligos ready to clone into the [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct.<br><br />
The binding sites generated using the models are useful when the user wants to use their own sh/miRNA or when there are not close enough experimentally verified binding sites.<br><br />
A script integrated into miRockdown will correlate the desired kd% with a database for every model. This database consists of a set of binding site parameter objects spanning the complete range of parameters. Additionally, the database contains the models' knockdown percentage calculated for the whole set of objects.<br><br />
With the user-chosen sh/miRNA sequence as input, a binding site generator script is invoked, which created more than 2000 different BS on-the-fly by varying the seed-type, 3'pairing, AU content and bulge size. The 3'pairing and the AU content score of the generated BS are characterized by a modified version of the TargetScan Algorithm {{HDref|Rodriguez et al., 2007}}. The input and output functions of the algorithm were adapted for miRockdown, so that no files have to be generated.<br><br />
Now that the generated binding sites are completely characterized, they are compared with the parameters of the suitable model BS. The generated BS that fits best the parameters of the suitable model BS is selected as the output BS of miRockdown.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
<br />
==miBSdesigner==<br />
Having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easy-to-use application to create in silico binding sites for any given miRNA. Using our device, the user will be able to generate binding sites with several different properties.<br />
<br />
===Input===<br />
The user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).<br />
Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4-nucleotide bulge between 9 and 12. <br />
Apart from these two options, the user can further modify the binding site to meet their individual requirements.<br />
<br />
===Seed Types===<br />
<center>[[Image:Final_sequences_miRNAseeds.png|800px]]</center><br />
<br />
Figure 1: Interactions between two miRNAs and their binding sites with different types of seeds.<br />
<br />
<br />
In miBS designer, the user can choose between several types of seed for their binding site (list ordered by increasing efficacy):<br />
<br />
- 6mer (abundance 21.5%): only the nucleotides 2-7 of the miRNA match with the mRNA.<br />
<br />
- 7merA1 (abundance 15.1%): the nucleotides 2-7 match with the mRNA, and there is an adenine in position 1.<br />
<br />
- 7merm8 (abundance 25%): the nucleotides 2-8 match with the mRNA.<br />
<br />
- 8mer (abundance 19.8%): the nucleotides 2-8 match with the mRNA and there is an adenine in position 1.<br />
<br />
- Apart from any of these options, the user can decide to create a customized seed with one mismatch included. By inputting a number (between 2-7) in the Customized mismatch position textbox<br />
<br />
The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA [https://2010.igem.org/Team:Heidelberg/Modeling/descriptions#references [Bartel D.P., MicroRNAs: Target Recognition and Regulatory Functions, Cell(136):215-233(2009)]].<br />
<br />
===Supplementary Region===<br />
In miBS designer, the user can choose among several types of supplementary regions, starting with 3 matching nucleotides (14-16), increasing sequentially until 8 (13-20), and then total matching (from 13-22, leaving a bulge)[https://2010.igem.org/Team:Heidelberg/Modeling/descriptions#references [Grimson A, Farh KHF, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing, Molecular Cell(27):91-105(2007)]]. In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides (in numbers from 9 to 22, separated by commas).<br />
<br />
===AU Content===<br />
In order to allow the user to improve the efficiency of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in -1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings.<br />
<br />
===Sticky Ends===<br />
To facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the [http://openwetware.org/wiki/The_BioBricks_Foundation:RFC#BBF_RFC_12:_Draft_BioBrick.E2.84.A2_BB-2_standard_for_biological_parts RFC-12 standard for biobricks BB2], the XmaI/XhoI restriction enzymes used in our [https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]-construct, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides and create the overhangs.<br />
<br />
===Output===<br />
miBS designer generates the primer needed to integrate the binding site desired into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers.<br />
<br />
==mUTING==<br />
It is a tool developed to generate binding sites for miRNAs that could be used for tissue targeting based on both on- as well as off-targeting strategy. It takes as input the target and off-target tissues as well as the desired targeting strategy. User can also specify a threshold for difference in the level of relative expression (within a tissue) of miRNAs between target and off-target tissue. The program searches through a database of expression levels to give out a list of possible miRNAs which could be used. Out of these, the desired miRNA can be selected for which the final output is generated in the form of sense and anti-sense oligomers with overhangs that could be used to put binding sites in tandem or into a vector. <br />
<br />
===Input=== <br />
<br />
The input for the tool is rather simple and consists of five fields.<br />
<br />
'''Organism''' – The tool lets you choose between Human, Rat and Mouse as the source organism.<br />
<br />
'''Target''' – From a list of tissues, the target (tissue where gene has to be expressed) can be selected.<br />
<br />
'''Off-target''' – A list from which multiple off-targets can be selected is available. Here, the tissues from which gene expression has to be excluded can be included.<br />
<br />
'''Targeting''' – This options lets you select the targeting strategy you want to employ.<br />
<br />
'''Threshold''' – The threshold for difference in the level of relative expression of miRNA in the target and off-target tissue can be set here. The default value is 0.001.<br />
<br />
===Data=== <br />
The expression data and sequence data that the tool makes use of was recruited from preexisting data sources.<br />
<br />
'''Sequences''' – mature miRNA sequences were obtained from mirBase Sequence Database Release 16[cite]. <br />
<br />
'''Expression profiles''' - miRNA expression profiles were collected from a previously published resource of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types [cite (Landgraf et al., Cell, 129, (2007), 1401-1414)]. The expression values in the data represent the number of cloned mature microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized by the total number of microRNAs that were cloned in each library. These values are then used to calculate the difference in relative miRNA levels for differential expression of the construct.<br />
<br />
===Processing=== <br />
The processing of the data has been done by script written in PERL. After submitting the primary inputs, mentioned above, the tool gives the user a choice of different miRNAs that fulfill the criterion set in the input. These are displayed along with the miRNA expression values in the target (in case of off-targeting) or in the off-targets (in case of on-targeting). The expression values in the off-targets and target in the respective cases are required to be zero. Based on these values, the user can select the most suitable miRNA for their construct.<br />
<br />
===Output=== <br />
The final output is the binding site for the miRNA selected by the user. It consists of the sense strand and the anti-sense strand that would code the binding site. These are flanked by a spacer sequence that could be used for putting binding sites in tandem and for introducing cloning sites.<br />
<br />
=Modeling=<br />
<br />
The Neural Network and the Fuzzy Logic Model explained here are the basis of the [https://2010.igem.org/Team:Heidelberg/Modeling/miRockdown miRockdown] tool. The results of the optimized models are integrated as a database and enable the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset#miRockdown miRockdown] output of binding sites, to have confidently predicted protein knockdown efficiency.<br />
<br />
==Parameterization Concept==<br />
<br />
One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [https://2010.igem.org/Team:Heidelberg/Modeling#miRNA_binding_site_features seed-types].<br />
The targetscan_scores_50-algorithm {{HDref|Rodriguez et al., 2007}} was used to characterize binding sites in respect to 3'-pairing and AU-content score. TargetScan aligns the miRNA with the mRNA sequence starting from a given seed-position in a way the highest possible 3'-score is reached. Binding from miRNA nucleotide 13-16 will add 1 to the score, pairings outside this region add 0.5. Offsets between bound miRNA and mRNA are also allowed, but will there is a penalty of 0.5 points for an offset higher than 2 nucleotides. The AU-content of 30 nucleotides upstream and downstream of the mRNA seed sequence is rated seed type dependent. The impact of the nucleotides decreases with the distance from the seed. The scoring system is based on a regressions applied to datasets from human, mouse, rat and dog mRNA knockdown {{HDref|Grimson et al., 2007}}.<br />
<br />
Since all major prior modeling approaches used mRNA levels as training-set [], our approach needs to will give a completely new insight into miRNA binding site functionality.<br />
{| class="wikitable"<br />
| [[Image:3primevsAU.png|thumb]]<br />
| [[Image:ThreePrimevsbulgeSize.png|thumb]]<br />
| [[Image:SeedTvsthreePScore.png|thumb]]<br />
| [[Image:SeedTvsAUScore.png|thumb]]<br />
|}<br />
<center>3'-pairing-Score vs AU-content-Score vs knockdown percentage: <br><br />
This surface fit shows the correlation of increasing 3' Binding Score and AU content Score with increasing knockdown-efficiency of the binding sites.</center><br />
<br><br />
<br />
==Neural Network Model==<br />
<br />
===Neural Network theory===<br />
Artificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or and external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.<br />
Mathematically there are three basic components that describes a single layer network: the synapses of the artificial neurons that are modeled as weights and that represent how strong is the connection between the input and an artificial neuron. An adder, that sum up all the the weighted inputs and finally an activation function, that controls the amplitude of the output of the single layer. Generally there are three type of activation function: threshold, sigmoid, piecewise linear function. For our model the sigmoid function has been used. it can range the output between 0 and 1 or between -1 and 1.{{HDref|Ben Kröse & Patrick van der Smagt, 1996}}.<br><br />
[[Image:NeuralNetwork_HD2010_image2.png|400px|center]]<br><br />
<br><br />
Figure 2: representation of the mathematical model of a biological neuron.<br><br />
<br />
During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the mean-squared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop.<br />
<br />
<center>[[Image:Neural_Network.png]]</center><br />
<br />
Figure 3: Training of a Neural Network.<br />
<br />
===Model description===<br />
<br />
====Input/target pairs====<br />
The NN model has been created with the MATLAB NN-toolbox. The input/target pairs used to train the network comprise experimental and literature data {{HDref|Bartel et al., 2007}}. The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene ([https://2010.igem.org/Team:Heidelberg/Project/miRNA_Kit miTuner]). Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated.<br><br />
Each input was represented by a four elements vector. Each element corresponded to a score value related to a specific feature of the binding site (as mentioned in the previous paragraph "Parametrization concept"). The three features used to describe the binding site were: seed type, the 3’pairing contribution and the AU-content. The input/target pair represented the relationship between a particular binding site and the related percentage of knockdown.<br />
Two Neural Network were trained. One was trained with a pool of 45 data coming from literature. The other one was trained with experimental data. The latter network comprised 4 inputs instead of 3. the fourth input represented the size of the bulge in base pairs. Afterwards both networks were used to predict percentages of knockdown given certain inputs. The predictions were then validated experimentally and compared among the different networks.<br />
<br />
====Characteristic of the Network====<br />
<br />
The neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the Levenberg-Marquardt algorithm. The algorithm updates the weight and bias values according to Levenberg-Marquardt optimization and overcomes the problem in interpolating noisy data, {{HDref|MacKay, 1992}} by applying a Bayesian framework to the NN learning problem.<br><br />
<br><br />
[[Image:viewnet.png|center]]<br><br />
<br><br />
Figure 3: schematic illustration of the network components. Hidden represent the first layer and it comprised 15 artificial neurons, while output is the second and last layer producing the output. The symbol “w” was the representation of the weights and “b” of the biases.<br />
<br />
===Results===<br />
Two experiment batches were performed. The network trained only with data coming from literature was used to predict the outcome of the first experiment batch. In Figure 4 is showed the regression line of the correlation between the NN outputs and the targets used for training this network while in Table 1 the simulated and experimental percentage of knockdown are showed. It becomes clear by looking the results that the bulge size has indeed an effect on the knockdown percentage, in fact the network is able to simulate with high precision when the bulge size is on the range of 3 and 4 nt, but not when it becomes 1 or 0. It is important to underly here that the network was trained with literature values that did not take into consideration the bulge size as a key factor, TargetScan in fact, does not evaluate this binding site feature in the scoring process.<br> <br />
<center><br />
{| border="1" class="wikitable sortable" cellpadding="6" style="border:solid 1px #AAAAAA; border-collapse:collapse; background-color:#F9F9F9; empty-cells:show; font-size:0.9em;"<br />
!align="right"| 3' score !! AU-score !! bulge !! seed type !! bulge size !! number BS !! KD% experimental !! KD% simulated <br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.85 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.595 || 1 || 3 || 4 || 1 || 0.81 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.576 || 1 || 3 || 4 || 2 || 0.92 || 0.8<br />
|-<br />
|align="right"| 4 || 0.314 || 0 || 3 || 0 || 1 || 0.69 || 0.56<br />
|-<br />
|align="right"| 2.5 || 0.314 || 0 || 3 || 0 || 1 || 0.08 || 0.49<br />
|-<br />
|align="right"| 5 || 0.336 || 0 || 2 || 0 || 1 || 0.72 || 0.42<br />
|-<br />
|align="right"| 1.5 || 0.327 || 0 || 3 || 0 || 1 || 0.28 || 0.44<br />
|-<br />
|align="right"| 2 || 0.327 || 0 || 3 || 0 || 1 || 0.58 || 0.46<br />
|-<br />
|align="right"| 2.5 || 0.221 || 0 || 2 || 0 || 1 || 0.34 || 0.28<br />
|-<br />
|align="right"| 7.5 || 0.597 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.83 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.624 || 1 || 3 || 4 || 1 || 0.77 || 0.82<br />
|-<br />
|align="right"| 7.5 || 0.6 || 1 || 3 || 4 || 1 || 0.76 || 0.81<br />
|-<br />
|align="right"| 7.5 || 0.603 || 1 || 3 || 4 || 1 || 0.82 || 0.81<br />
|-<br />
|align="right"| 5.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 3 || 1 || style="background-color:#cacaca;" | 0.59 || style="background-color:#cacaca;" | 0.63<br />
|-<br />
|align="right"| 5.5 || 0.749 || 1 || 2 || 3 || 1 || 0.345 || 0.61<br />
|-<br />
|align="right"| 6.5 || 0.799 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.9 || style="background-color:#cacaca;" | 0.67<br />
|-<br />
|align="right"| 6.5 || 0.773 || 1 || 2 || 1 || 1 || 0.775 || 0.67<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 1 || 1 || style="background-color:#cacaca;" | 0.68 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|align="right"| 1.5 || 0.38 || 1 || 2 || style="background-color:#cacaca;" | 4 || 1 || style="background-color:#cacaca;" | 0.21 || style="background-color:#cacaca;" | 0.27<br />
|-<br />
|}<br />
</center><br />
<br><br />
Table 1: it shows the simulated data and the experimental results given the features of the binding site. The values in red, underline the discrepancy that occur between the two set of knockdown when the bulge size is the only feature changing. When the bulge size is not 1 the prediction are very precise and within the standard deviation of the experimental values (between 10-25%).<br><br />
<br><br />
[[Image:regression.png|300px|center]] <br><br />
<br><br />
Figure 4: Regression of the training section, line showing the correlation between the NN output and the respective target value.<br><br />
<br><br />
====Brief conclusion====<br />
The bulge size was identified as a very important parameter for knockdown efficiency. This led us to the conclusion of training another Neural Network only with our experimental data and encompassing the bulge size in the input vector.<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
====Simulation and experimental verification====<br />
<br />
==Fuzzy Logic Model==<br />
===Why using a fuzzy inference system to model binding site efficiency?===<br />
<br />
To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO).<br />
<br />
Fuzzy Logic is a rule-based approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:475-89 (2009)]. Key advantages of Fuzzy logic-based approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logic-approaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., Saez-Rodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IF-THEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets.<br />
<br />
Fuzzy inference systems are based on membership functions (MF). MFs rate input parameters how much they satisfy a criterion on a scale from 0 to 1. There can be one, or multiple MFs per input parameter. Like different criteria applied to an input. The height of persons for example can be evaluated with one MF - how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1. Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model could differentiate between young and grown-up persons.<br />
<br />
Simple if-then rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MFs. The higher the satisfaction of the rule, the higher is the membership to the output MF.<br />
The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani et al, 1975]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the non-intuitive combination of the 3'-pairing- and AU-content score, our fuzzy inference system needs to be optimized computationally.<br />
<br />
<br />
How is our fuzzy inference system optimized?<br />
MISO Sugeno Fuzzy Network Model<br />
<br />
Optimizable<br />
<br />
Extendable<br />
<br />
===Fuzzy Model Concepts===<br />
<br />
<br />
[[Image:Nearperfect.png|thumb|Bulged binding sites concept: This model concept evaluates bulged- or "near-perfect" binding sites separately from conventional seed + 3'-pairing binding sites. Rule number 2 considers the bulge-size of the bulged binding site.]]<br />
<br />
[[Image:BulgeAU.png|thumb|Bulged binding sites (including AU-content-score) concept: This concept extends the bulged-BS concept with the addition of AU-content score evaluation. Therefore rule number 2 was modified accordingly.]]<br />
<br />
[[Image:LowthreePrime.png|thumb|Consider low 3' score concept: This model concept takes into consideration, that binding sites with a 3'-score under 3 did not show a significant change in knockdown efficiency compared to a control with only seed pairing {{HDref|Grimson et al., 2007}}. This is realized by rule number 6.]]<br />
<br />
Strength: general prediction, no dependency on conditions. Assured by [normalization strategy] <br />
<br />
based on previous knowledge [Bartel]<br />
<br />
Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenous-like binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties].<br />
A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of <br />
<br />
We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset properties of a large set of binding sites] according to various different BS characteristics.<br />
The targetscan_50_context_scores – Algorithm {{HDref|Rodriguez et al., 2007}} which evaluates binding sites in respect to 3'-pairing and AU-content gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under [https://2010.igem.org/Team:Heidelberg/Modeling/trainingset Model Training Set].<br />
<br />
Input parameters<br />
<br />
Input membership functions<br />
<br />
Output membership functions<br />
<br />
Rules<br />
<br />
<br />
Optimization<br />
<br />
Parameters and their functionality<br />
<br />
Output Membership function values<br />
<br />
7merA1<br />
<br />
7merM8<br />
<br />
8mer<br />
<br />
(Nearperfect)<br />
<br />
(Perfect)<br />
<br />
<br />
<html><br />
<div class="backtop"><br />
<a href="#top">&uarr;</a><br />
</div><br />
</html><br />
===Fuzzy Model Optimization===<br />
Connection of Fuzzy Logic Toolbox and Global Optimization Toolbox via script.<br />
<br />
===Result===<br />
<br />
[http://igem.bioquant.uni-heidelberg.de/igem_2010/FuzzyModelResults.html Click here, if you are interested in more recent model optimizations results!]<br />
<br />
=Data Overview=<br />
<br />
[https://2010.igem.org/Team:Heidelberg/Modeling/Data_Overview Data Overview]<br />
<br />
=References=<br />
<br />
MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. Molecular Cell, 27:91-105 2007.<br />
<br />
An experiment in linguistic synthesis with a fuzzy logic controller. Mamdani, E.H. and S. Assilian, International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.<br />
<br />
Industrial applications of fuzzy control. Sugeno, M., Elsevier Science Pub. Co., 1985.<br />
<br />
[http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_50 targetscan_50_context_scores.pl] Joe Rodriguez, Robin Ge, Kim Walker, and George Bell. Whitehead Institute for Biomedical Research. All Rights Reserved. Copyright(c) 2007,2008 <br />
<br />
Ben Kröse & Patrick van der Smagt, An introduction to Neural Networks, 8th Edition, 1996.<br />
<br />
David J. C. MacKay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol. 4, No. 3, Pages 448-472, 1992.<br />
<br />
{{:Team:Heidelberg/Single_Bottom}}</div>AlejandroHD