Team:Heidelberg/Modeling/descriptions
From 2010.igem.org
Line 1:  Line 1:  
  {{:Team:Heidelberg/  +  {{:Team:Heidelberg/Single}} 
{{:Team:Heidelberg/tablesnormal=FFFhighlight=ddd}}  {{:Team:Heidelberg/tablesnormal=FFFhighlight=ddd}}  
Revision as of 22:52, 26 October 2010
miBEATmiRockdownRight from the beginning of our modeling project, we knew we would have to integrate our trained models into an online GUI. We realized it in the most user friendly way we could think of: The user only needs to input the desired knockdown percentage (kd%) and choose an sh/miRNA sequence, to get a binding site that satisfies the users needs. Overview of the miRockdown script flow. The knockdown percentage (kd%) input invokes the selection of the right experimental and model binding site or binding site parameters respectively. The binding site (BS) sequence input starts the generation of on the fly generated BS sequences, which are characterized by a modified targetscan_scores algorithm. The parameters of the selected model BS are correlated with the generated BS parameters and the most similar of the generated BS is the output.
miBSdesignerVery early we realized that having a binding site designer was crucial to complete the computational approach to our project: miBSdesigner is an easytouse application to create in silico binding sites for any given miRNA. By using our device, the user will be able to generate binding sites with several different properties. InputThe user has to input a name for the miRNA to name the primers. The miRNA sequence must be 22 nucleotides long and has to be input in direction 5’ to 3’ (both DNA and RNA sequences are admitted and any extra characters will be removed from the sequence). The user can also enter a spacer inert sequence if he needs to place the binding site further along in the 3’UTR region (it is recommended that the binding site is at least 15 nucleotides away from the stop codon).Initially the user can choose between a perfect binding site (matching the 22 nucleotides), or an almost perfect binding site (matching all of the nucleotides, but leaving a 4nucleotide bulge between 9 and 12. Apart from these two options, the user can personalize the binding site to meet their individual requirements. Seed typesFigure 1: Interactions between two miRNAs and their binding sites. Examples to show different types of seeds.
 6mer (abundance 21.5%): only the nucleotides 27 of the miRNA match with the mRNA.  7merA1 (abundance 15.1%): the nucleotides 27 match with the mRNA, and there is an adenine in position 1.  7merm8 (abundance 25%): the nucleotides 28 match with the mRNA.  8mer (abundance 19.8%): the nucleotides 28 match with the mRNA and there is an adenine in position 1.  Apart from any of these options, the user can decide to create a customized seed with a mismatch included. The percentages of abundance are calculated among conserved mammalian sites for a highly conserved miRNA (Friedman et al. 2008) Supplementary regionIn miBS designer, the user can choose among several types of supplementary sequences, starting with 3 matching nucleotides (1416), increasing sequentially until 8 (1320), and then total matching (from 1322, leaving a bulge). In case the user needs some other specific supplementary region, he can customize the sequence by inputting the desired matching nucleotides. AU contentIn order to allow the user to improve the efficacy of their binding sites, miBS designer offers options to increase the AU content by adding adenine or uracil to positions around the matches (specifically in 1, 0, 1, 8, 9 and 10). The function is designed so that it varies the AU content without introducing new pairings. Sticky endsIn order to facilitate the task of introducing the binding site into a plasmid, the user can add sequences to both ends of the binding site. Initially, the user can choose among the RFC12 standard for biobricks BB2, the XmaI/XhoI restriction enzymes used IN WHAT??, or some custom sequences input by the user. In the last case, the output sequences will not be directly ready for cloning: the user has to either digest the construction prior to ligation, or to process the primers before ordering them to remove the extra nucleotides. OutputmiBS designer generates the primer needed to integrate the binding site desired, into a plasmid, alongside with the primer for the complementary strand. It will also produce specific names for the two primers. miCrappalloModelingParameterization ConceptOne of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNAsequence. The final parameterization concept unites a basic distinction between perfect, bulged (nearperfect) and endogenous miRNA like BS, with the advanced 3'scoring and AUcontent evaluation. The endogenous miRNA like BS parameter is further split into the three types of seed binding sites. Neural Network ModelNeural Network theoryArtificial Neural Network usually called (NN), is a computational model that is inspired by the biological nervous system. The network is composed of simple elements called artificial neurons that are interconnected and operate in parallel. In most cases the NN is an adaptive system that can change its structure depending on the internal or(and?) external information that flows into the network during the learning process. The NN can be trained to perform a particular function by adjusting the values of the connection, called weights, between the artificial neurons. Neural Networks have been employed to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems. During the learning process, difference between the desired output (target) and the network output is minimised. This difference is usually called cost; the cost function is the measure of how far is the network output from the desired value. A common cost function is the meansquared error and there are several algorithms that can be used to minimise this function. The following figure displays such a loop. Figure 2: Training of a Neural Network. Model descriptionInput/target pairsThe NN model has been created with the MATLAB NNtoolbox. The input/target pairs used to train the network comprise experimental and literature data (Bartel et al. 2007). The experimental data were obtained by measuring via luciferase assay the strength of knockdown due to the interaction between the shRNA and the binding site situated on the 3’UTR of luciferase gene. Nearly 30 different rational designed binding sites were tested and the respective knockdown strength calculated with the following formula>(formula anyone???). Characteristic of the NetworkThe neural network comprised two layers (multilayer feedforward Network). The first layer is connected with the input network and it comprised 15 artificial neurons. The second layer is connected to the first one and it produced the output. For the first and the second layer a sigmoid activation function and a linear activation function were used respectively. The algorithm used for minimizing the cost function (sum squared error) was Bayesian regularization. This Bayesian regularization takes place within the LevenbergMarquardt algorithm. The algorithm updates the weight and bias values according to LevenbergMarquardt optimization and overcomes the problem in interpolating noisy data, (MacKay 1992) by applying a Bayesian framework to the NN learning problem.
ResultsTraining the Neural NetworkThe Network was trained with 46 samples. The regression line showing the correlation between the NN outputs and the targets was R=0.9864.
Simulation and experimental verificationFuzzy Inference ModelWhy using a fuzzy inference system to model binding site efficiency?To be able to evaluate the complex features of an shRNA or miRNA binding site and predict a resulting knockdown percentage of the protein we developed a fuzzy inference system (fis). The parameterized properties of the binding sites serve as input and will be processed into the knockdown percentage as the single output. Thus our fuzzy inference system is characterized as a multiple input, single output fuzzy inference system (MISO). Fuzzy Logic is a rulebased approximate artificial reasoning method developed by Lotfi Zadeh in 1965. Its motivation is the observation that humans often think and communicate in a vague way, and yet can make precise decisions [Nelles O. Nonlinear System Identification Springer Verlag GmbH & Co., Berlin, 2000.]. It has been widely used in engineering and Artificial Intelligence approaches such as Fuzzy Controllers and Fuzzy Expert Systems. Fuzzy Logic has also been used for the modeling of biological pathways [Bosl W. J. Systems biology by the rules: hybrid intelligent systems for pathway modeling and discovery. BMC Systems Biology1:13 (2007).] and to analyze gene regulatory networks [Laschov D., Margaliot M. Mathematical modeling of the lambda switch:a fuzzy logic approach. J Theor Biol. 21:47589 (2009)]. Key advantages of Fuzzy logicbased approaches are (i) the ability to construct models based on prior knowledge of the system and experimental data and (ii) encode intermediate states for inputs and outputs, thus improving other logicapproaches that can only deal with ON/OFF states such as Boolean models [Aldridge B. B., SaezRodriguez J., Muhlich J. L., Sorger P. K., Lauffenburger D. A. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulininduced signaling PLoS Comput Biol.5:e1000340 (2009).] and (iii) simulations can be derived from both qualitative and quantitative data, both of which can be cast into the form of IFTHEN rules. Thus, FL constitutes a powerful approach for the understanding of heterogeneous datasets. Fuzzy inference systems are based on membership functions (MF). MF rate input parameters on a scale from 0 to 1, how much they satisfy a criterion. There can be one, or multiple criteria – called membership function  for one input parameter. The height of persons for example can be evaluated with one MF  how much the person satisfies being tall. On the other hand, there could be 3 MFs, one evaluating the membership to small people, the second to medium sized people and the third one to big people (Figure MembershipFunction1.png). In case of a persons height of 1.8 meter the MF “big” would be satisfied to about 0.6 (Figure MembershipFunctionBig.png). Like this, all input is converted to membership values from 0 to 1. Changing the shape of the MF gives the opportunity to have either functional dependencies, allowing intermediate states of the membership values, or simple ON/OFF states, where the membership value can be only 0 or 1 (Figure MembershipONOFF.png). Thus different kinds of input parameters can be evaluated with a fuzzy inference system. For the simple height example model the age of the person could be taken as second input and evaluated by a MF that is 0 until the age of 18 and 1 for older persons. Thus the model would differentiate between young and grownup persons. Simple ifthen rules can then be used to combine the input MF to an output MF. The satisfaction of a rule by an object (set of input parameters) is defined by the degree of membership of the object to the different MF. The higher the satisfaction of the rule, the higher is the membership to the output MF. The output MF can be a function like the input MF. This is the case in Mamdani method fuzzy inference systems [Mamdani, E.H. and S. Assilian, "An experiment in linguistic synthesis with a fuzzy logic controller," International Journal of ManMachine Studies, Vol. 7, No. 1, pp. 113, 1975.]. We are using a Sugeno method fuzzy inference system [Sugeno, M., Industrial applications of fuzzy control, Elsevier Science Pub. Co., 1985.], where the output MF is either a constant or a linear function depending on input parameters. The advantage of a Sugeno fuzzy inference system is, that it is computationally more efficient and easier to optimize or adapt due to the more simple output MF. Due to the nonintuitive combination of the 3'pairing and AUcontent score, our fuzzy inference system needs to be optimized computationally.
Optimizable Extendable
Fuzzy Model ConceptsStrength: general prediction, no dependency on conditions. Assured by [normalization strategy] based on previous knowledge [Bartel] Our fuzzy inference system can deal with 3 different kinds of shRNA binding sites. Perfect, bulged and endogenouslike binding sites are treated separately, due to the differences in their biological mechanism, as discussed earlier [link to binding site properties]. A perfect binding site is evaluated by a simple ON/OFF input MF evaluating the boolean input of We came up with different concepts of what kind of input parameters to integrate into the fuzzy inference model and how to evaluate them. Therefore we parameterized the properties of a large set of binding sites according to various different BS characteristics. The targetscan_50_context_scores – Algorithm (Rodriguez et al., 2007) which evaluates binding sites in respect to 3'pairing and AUcontent gives out a score that seems appropriate to distinguish especially between endogenous miRNA like binding sites. A more detailed description on the concept of binding site parameterization can be found under Model Training Set. Input parameters Input membership functions Output membership functions Rules
Parameters and their functionality Output Membership function values 7merA1 7merM8 8mer (Nearperfect) (Perfect)
Fuzzy Model OptimizationResultClick here, if you are interested in more recent model optimizations results! Data OverviewReferencestargetscan_50_context_scores.pl Copyright(c) 2007,2008 Whitehead Institute for Biomedical Research. All Rights Reserved Joe Rodriguez, Robin Ge, Kim Walker, and George Bell

