Team:USTC Software/project

From 2010.igem.org

(Difference between revisions)
(MoDeL-Modeling Database Language)
(MoDeL:Modeling Database Language)
Line 2: Line 2:
=Features=
=Features=
==Fun and Function==
==Fun and Function==
-
==MoDeL:Modeling Database Language==
+
== MoDeL:Modeling Database Language ==
 +
=== Bring complex modeling to the next level ===
 +
Chain-Node Model(see figure) is a brand new modeling concept that includes both detailed structure description and universal applicability. Instead of treating complex as a whole, Chain-Node Model view complex as a construction of it basic Part. One basic assumption of our model is that complex always carries its parts’ properties. (i.e. it is )Though imprudent sometimes, this assumption greatly extend the usability of automatic modeling, with just a few parts, one can construct a bunch of complex with functions.
 +
Binding in complex is characterized by connecting Nodes to form trees. It’s as simple as constructing a new node with bond nodes as its children. For example the binding structure of TetR dimer is considered as a TetR2 node with two TetR nodes as its children. Available nodes can be any part on a chain or even nodes of binding sites, which allows you to create a huge binding tree or even a forest of trees. Node makes the description of complicated binding in complex possible.
 +
Together with Chains and Nodes, you can almost model any complex you have in mind. So enjoy the freedom of Chain-Node Modeling! You are suggested to read this One-minute Introduction to have an intuitive idea of our modeling system.
 +
 
 +
=== Modeling with templates ===
 +
Similar reactions happen for similar reasons. For example, the repression of pTetR promoter by TetR dimer can happen on many different DNA molecules with pTetR, regardless of what other biobricks is constructed on the DNA. With this thought, we add the part of Substituent, which can replace inactive parts of Species in Reactions, making it more general, and actually turning it into a reaction template. Having substituent, a pTetR TetR binding reaction is re-interpreted as the binding reaction of ANY dna with pTetR(see figure) with ANY protein with TetR-dimer binding site. Modeling with templates allows us to describe reactions of new complex even without rewrite the reactions and species in database.
 +
 
 +
=== Automatic modeling database language ===
 +
We use a database to store all the information we need in modeling. In order to realize automatic modeling, we construct the database in unified format and make it machine-readable. Every component of database has its specified attributes and values, which makes the format of the database a unique yet standard database language. We call it MoDeL: Modeling Database Language. MoDeL is based on XML language, which makes it flexible and extensible. For more specifications of MoDeL, click here.
==Auto-Modeling Algorithm==
==Auto-Modeling Algorithm==

Revision as of 18:44, 11 October 2010

Contents

Features

Fun and Function

MoDeL:Modeling Database Language

Bring complex modeling to the next level

Chain-Node Model(see figure) is a brand new modeling concept that includes both detailed structure description and universal applicability. Instead of treating complex as a whole, Chain-Node Model view complex as a construction of it basic Part. One basic assumption of our model is that complex always carries its parts’ properties. (i.e. it is )Though imprudent sometimes, this assumption greatly extend the usability of automatic modeling, with just a few parts, one can construct a bunch of complex with functions. Binding in complex is characterized by connecting Nodes to form trees. It’s as simple as constructing a new node with bond nodes as its children. For example the binding structure of TetR dimer is considered as a TetR2 node with two TetR nodes as its children. Available nodes can be any part on a chain or even nodes of binding sites, which allows you to create a huge binding tree or even a forest of trees. Node makes the description of complicated binding in complex possible. Together with Chains and Nodes, you can almost model any complex you have in mind. So enjoy the freedom of Chain-Node Modeling! You are suggested to read this One-minute Introduction to have an intuitive idea of our modeling system.

Modeling with templates

Similar reactions happen for similar reasons. For example, the repression of pTetR promoter by TetR dimer can happen on many different DNA molecules with pTetR, regardless of what other biobricks is constructed on the DNA. With this thought, we add the part of Substituent, which can replace inactive parts of Species in Reactions, making it more general, and actually turning it into a reaction template. Having substituent, a pTetR TetR binding reaction is re-interpreted as the binding reaction of ANY dna with pTetR(see figure) with ANY protein with TetR-dimer binding site. Modeling with templates allows us to describe reactions of new complex even without rewrite the reactions and species in database.

Automatic modeling database language

We use a database to store all the information we need in modeling. In order to realize automatic modeling, we construct the database in unified format and make it machine-readable. Every component of database has its specified attributes and values, which makes the format of the database a unique yet standard database language. We call it MoDeL: Modeling Database Language. MoDeL is based on XML language, which makes it flexible and extensible. For more specifications of MoDeL, click here.

Auto-Modeling Algorithm

User Interface

User-Friendly Design

Input

Output

MoDeL

One-minute introduction

Features

Chain-Node Model

Template Reaction

Auto-Modeling Database Language

Language specifications

History(optional)

Simulation

Auto-Modeling Algorithm

Chain Sorting and Weighting Scheme

liaochen@mail.ustc.edu.cn

Chains in our Chain-Node Model should not have any order since there are no natural ways to sort them. One chain is not preferred to another, since in a complex of multiple chains, which are connected via binding interaction of binding bites on their own, all chains are equivalent. However, a chain-sorting program is a prerequisite for implementations of many other functions, such as comparison of two Chain-Node structures, or pattern matching of template species to those with definite structure.


Sorting is always performed under some rules. Since there are no natural options, we design one to generate the unique order for a list of chains. The rule we developed is something like that for comparing two strings: comparing first the partReference and partType defined in database of the first part of two chains in alphabet order and if equal, then comparing the next until differences or the end of either chain. For convenience, we generate a unicode_c (a string) for each chain by splicing their partReference and partType attribute values in order, for example:

[placI185:ForwardDNA][RBS147:ForwardDNA][lacI153:ForwardDNA][TE145:ForwardDNA]

If one part is binded with another part (a binding site), we will mark it with *, the example above becoms (if placI185 is binded with some part, say lacI153 protein):

[placI185:ForwardDNA:*][RBS147:ForwardDNA][lacI153:ForwardDNA][TE145:ForwardDNA]

If unicode_c of each chain is different with others, then we could sort the chain list by comparing their unicode_c (just the same with comparison of two strings). However, it is not always so: chains with same unicode_c may be equivalent or not. It is due to the mark asterisk *. Unicode_c knows where are the binding sites, but it does not know the binding details of that part. In other words, unicode_c could not soly determines the order of the chains since it lacks of information of the trees defined in our Chain-Node model. The forest (tree set) in Chain-Node Model describes the binding details of parts in chains. We need to find something like unicode_c to represent the essence of the associated trees. Our weighting scheme will be first introduced below.


Assume the chains are sorted already. Each part will be assigned a weight based on its position in the chain list. The weight of the jth part on the ith chain is (i,j). For example, the first part on the first chain posses a weight (0,0) (in C style, counting number from zero). After assignment of parts' weight, we turn to weights of the nodes in trees. Since leaf nodes in trees are actually the representation of parts in binded state on chains, the weight of each leaf node could be inherited from the weight of its corresponding part. For parent node, its weight equals to the maximum of weights of all its children nodes (we define a weight (i, j) is greater than another (i',j') if i < i' or i = i' while j < j'). The recursive definition of tree nodes' weight make it possible to sort the forest. For convenience, we add a virtual node ROOT, whose weight value could also be obtained under the general rule, as the root of all trees in the forest and make a conversion from forest to tree. The tree structure could be uniquely determined by recursively sorting all children of each nodes by their weights.


Back to the question above, if two chains have the same unicode_c, which should be arranged with preference of the other? The answer lies in the exchange of them. The exchange will change the parts' weight on both chains and thus the weight of some nodes of the trees. The tree structure (with virtual root) is also possible to change. If the tree structure is not changed, the two chains are equivalent meaning that it makes no difference to arrange either chain in front of the other. If the tree structure changes, we must find some ways to determine which tree structure is preferred and based on this to arrange the two chains with equal unicode_c. We know each tree could be uniquely converted to a binary tree by taking the first child of each node as the left child and its siblings as its right child in the binary tree. The Huffman algorithm could be used to generate the Huffman code of each node in the binary tree. Since the Huffman code is actually a representation of the path from current node to the root, we can restore the tree structure from the set of Huffman codes of all leaf nodes. Hence, the Huffman codes set of leaf nodes could uniquely represent a tree structure. We could generate a string, unicode_t for tree structure by splicing the Huffman codes and their weight of all leaf nodes sorted by their weights and adding a asterisk between two adjacent codes, for example:

[000:0:1][001:1:3][010:2:3][011:2:4]

Based on this idea, we could obtain the tree unicode_t for any possible order by permuting equivalent chains and choose the order which make the minimum unicode_t value. By checking the unicode_t value of permutation, chains are grouped in a number of equivalent classes, which contains one or more chains. Chains within each equivalent classes are equivalent: exchanging order of any two chains will not change the tree structure.


It is easy to compare two Chain-Model structures with the help of chain sorting. We only need to compare unicode_c value of chains in the chain list and unicode_t value generated under that order. If two structures are equal, their chain list and tree structure must be equal since the chains order is unique in our chain sorting method. If two structures have same unicode_c value of each chain and unicode_t value, they must be same structures since we could unique generate a Chain-Node structure from the strings.

ODE Solver