Team:USTC Software/Simulation

From 2010.igem.org

(Difference between revisions)
(How to find species with specific structural pattern of a template one?)
(How to find species with specific structural pattern of a template one?)
Line 52: Line 52:
[[Image:USTCs_ChainMatching2.png‎|400px|thumb| Combinations of ''NST'' block matchings form a region restriction for ''ST'' part blocks]]
[[Image:USTCs_ChainMatching2.png‎|400px|thumb| Combinations of ''NST'' block matchings form a region restriction for ''ST'' part blocks]]
[[Image:USTCs_ChainMatching3.png‎|400px|thumb| The template ''ST'' block contains 3 ''ANY'' parts. If the first ''ANY'' part is matched A and B, the task is reduced to match the next 2 ''ANY'' parts to C, D, E, F, and G. All matchings could be found by recursively performing this algorithm.]]
[[Image:USTCs_ChainMatching3.png‎|400px|thumb| The template ''ST'' block contains 3 ''ANY'' parts. If the first ''ANY'' part is matched A and B, the task is reduced to match the next 2 ''ANY'' parts to C, D, E, F, and G. All matchings could be found by recursively performing this algorithm.]]
 +
[[Image:USTCs_ChainMatching4.png‎|400px|thumb| Removing unmatched chains and associated trees of the species. It should have same structure with template species after replacing ''ST'' part with matchings if it has the same structural pattern with the template one.]]
 +
The basic idea is simple. We first consider the structural pattern matching between one chain and a chain template. There are two kind of parts on a chain template: ''substituent''-type part (''ST'') and non-''substituent''-type part (''NST''). The whole template chain could be separated into blocks of ''ST'' part and ''NST'' part. Then we could break down the task to matchings of each small block. For each ''NST'' type block, we find all matching regions on the chain to be matched. Each of the combinations of all ''NST'' blocks forms a ''NST'' frame and restricts the matching regions of ''ST'' blocks. Therefore, the minimal task is to find matchings of each ''ST'' part block within a restricted region on the chain to be matched. At the present version of ''MoDeL'', there are 6 ''ST'' part: ''ANY'', ''ANYUB'', ''NZ'', ''NZUB'', ''ONE'' and ''ONEUB''. A general algorithm for all 6 type part is designed and we will illustrate this via an example. Consider a ''ST'' block with all ''ANY'' type parts (we will ''X'' to indicate ''ANY'' type part in the figures). We find matchings of the first part of the block. There may be many possible matchings because part of type ''ANY'' implies arbituary matching without any restriction in '''MoDeL''' definitions. For each matching found, we then deal with the next and do the same thing recursively until the end of the template or matching fails. An easy example is shown in the figure to illustrate this simple idea.
The basic idea is simple. We first consider the structural pattern matching between one chain and a chain template. There are two kind of parts on a chain template: ''substituent''-type part (''ST'') and non-''substituent''-type part (''NST''). The whole template chain could be separated into blocks of ''ST'' part and ''NST'' part. Then we could break down the task to matchings of each small block. For each ''NST'' type block, we find all matching regions on the chain to be matched. Each of the combinations of all ''NST'' blocks forms a ''NST'' frame and restricts the matching regions of ''ST'' blocks. Therefore, the minimal task is to find matchings of each ''ST'' part block within a restricted region on the chain to be matched. At the present version of ''MoDeL'', there are 6 ''ST'' part: ''ANY'', ''ANYUB'', ''NZ'', ''NZUB'', ''ONE'' and ''ONEUB''. A general algorithm for all 6 type part is designed and we will illustrate this via an example. Consider a ''ST'' block with all ''ANY'' type parts (we will ''X'' to indicate ''ANY'' type part in the figures). We find matchings of the first part of the block. There may be many possible matchings because part of type ''ANY'' implies arbituary matching without any restriction in '''MoDeL''' definitions. For each matching found, we then deal with the next and do the same thing recursively until the end of the template or matching fails. An easy example is shown in the figure to illustrate this simple idea.

Revision as of 04:57, 16 October 2010

Contents

Introduction of MoDeL based Automatic Modeling Algorithm

Development of Modeling Database Language (MoDeL) aims to provide a platform that enables separation of work between modeling and data constructing. Users could obtain dynamic behaviors of various quantities and other properties of a biological system as the output of iGame while they know nothing about details of modeling and the reaction network. Previous modeling and simulation tools require users to construct the network themselves: it is very hard for even professional researchers. iGame has no such requirements and users are only required to input data about the initial environmental condition of system. Hence, modeling is a task that is performed to find all reactions and species in the system.

To complete the final network from initial environment conditions, we first break up the task to several parts:

(1) Setup initial conditions and environmental parameters read from the input file.

(2) Species produced are inserted into a list in the order of time. To each species, we search the database to find template species which could be matched to this species in structural pattern. If not found, then go for the next species.

(3) For each template structure we found, we continue to search template reactions containing this template species as a reactant or modifier or both. Only forward or reverse reaction is handled, never for both.

(4) For each template reaction found, we search in the species list for possible species which could be instances of templates of other reactants and modifiers. If not found, then go for next reaction.

(5) For each possible combination of reactants and modifiers, if the reaction parameter conditions are satisfied, we generate the structure of products based on the transfer table and structural templates of products. And then add this new reaction to the reaction list.

(6) Go to step 2 until the end of the species list.

In the following, we will focus on the step (2), (3) and (5). Our algorithms are totally different with that of other software tools since we design them for the purpose of proving the feasibility of MoDeL for synthetic biology. They are of some kind complex because the complexity of biological systems require so.

What are special reactions and How to handle with them?

Chain Sorting and Weighting Scheme

liaochen@mail.ustc.edu.cn

Chains in our Chain-Node Model should not have any order since there are no natural ways to sort them. One chain is not preferred to another, since in a complex of multiple chains, which are connected via binding interaction of binding bites on their own, all chains are equivalent. However, a chain-sorting program is a prerequisite for implementations of many other functions, such as comparison of two Chain-Node structures, or pattern matching of template species to those with definite structure.


Sorting is always performed under some rules. Since there are no natural options, we design one to generate the unique order for a list of chains. The rule we developed is something like that for comparing two strings: comparing first the partReference and partType defined in database of the first part of two chains in alphabet order and if equal, then comparing the next until differences or the end of either chain. Meanwhile, a part in bound and unbound state

If unicode_c of each chain is different with others, then we could sort the chain list by comparing their unicode_c (just the same with comparison of two strings). However, it is not always so: chains with same unicode_c may be equivalent or not. It is due to the mark asterisk *. Unicode_c knows where are the binding sites, but it does not know the binding details of that part. In other words, unicode_c could not soly determines the order of the chains since it lacks of information of the trees defined in our Chain-Node model. The forest (tree set) in Chain-Node Model describes the binding details of parts in chains. We need to find something like unicode_c to represent the essence of the associated trees. Our weighting scheme will be first introduced below.

Assume the chains are sorted already. Each part will be assigned a weight based on its position in the chain list. The weight of the jth part on the ith chain is (i,j). For example, the first part on the first chain posses a weight (0,0) (in C style, counting number from zero). After assignment of parts' weight, we turn to weights of the nodes in trees. Since leaf nodes in trees are actually the representation of parts in binded state on chains, the weight of each leaf node could be inherited from the weight of its corresponding part. For parent node, its weight equals to the maximum of weights of all its children nodes (we define a weight (i, j) is greater than another (i',j') if i < i' or i = i' while j < j'). The recursive definition of tree nodes' weight make it possible to sort the forest. For convenience, we add a virtual node ROOT, whose weight value could also be obtained under the general rule, as the root of all trees in the forest and make a conversion from forest to tree. The tree structure could be uniquely determined by recursively sorting all children of each nodes by their weights.


Back to the question above, if two chains have the same unicode_c, which should be arranged with preference of the other? The answer lies in the exchange of them. The exchange will change the parts' weight on both chains and thus the weight of some nodes of the trees. The tree structure (with virtual root) is also possible to change. If the tree structure is not changed, the two chains are equivalent meaning that it makes no difference to arrange either chain in front of the other. If the tree structure changes, we must find some ways to determine which tree structure is preferred and based on this to arrange the two chains with equal unicode_c. We know each tree could be uniquely converted to a binary tree by taking the first child of each node as the left child and its siblings as its right child in the binary tree. The Huffman algorithm could be used to generate the Huffman code of each node in the binary tree. Since the Huffman code is actually a representation of the path from current node to the root, we can restore the tree structure from the set of Huffman codes of all leaf nodes. Hence, the Huffman codes set of leaf nodes could uniquely represent a tree structure. We could generate a string, unicode_t for tree structure by splicing the Huffman codes and their weight of all leaf nodes sorted by their weights and adding a asterisk between two adjacent codes, for example:

[000:0:1][001:1:3][010:2:3][011:2:4]

Based on this idea, we could obtain the tree unicode_t for any possible order by permuting equivalent chains and choose the order which make the minimum unicode_t value. By checking the unicode_t value of permutation, chains are grouped in a number of equivalent classes, which contains one or more chains. Chains within each equivalent classes are equivalent: exchanging order of any two chains will not change the tree structure.


It is easy to compare two Chain-Model structures with the help of chain sorting. We only need to compare unicode_c value of chains in the chain list and unicode_t value generated under that order. If two structures are equal, their chain list and tree structure must be equal since the chains order is unique in our chain sorting method. If two structures have same unicode_c value of each chain and unicode_t value, they must be same structures since we could unique generate a Chain-Node structure from the strings.


How to find species with specific structural pattern of a template one?

We assume that if a combination of species in the species list could be structurally matched to reactants and modifiers templates defined within a reaction, they are possible to react with each other in the way as the reaction template describes. Hence, it is important to find such matchings between species in the species list and template ones defined in the database. We will first give the basic idea and then some modifications.

Blocks of ST and NST part on a template chain
Combinations of NST block matchings form a region restriction for ST part blocks
The template ST block contains 3 ANY parts. If the first ANY part is matched A and B, the task is reduced to match the next 2 ANY parts to C, D, E, F, and G. All matchings could be found by recursively performing this algorithm.
Removing unmatched chains and associated trees of the species. It should have same structure with template species after replacing ST part with matchings if it has the same structural pattern with the template one.

The basic idea is simple. We first consider the structural pattern matching between one chain and a chain template. There are two kind of parts on a chain template: substituent-type part (ST) and non-substituent-type part (NST). The whole template chain could be separated into blocks of ST part and NST part. Then we could break down the task to matchings of each small block. For each NST type block, we find all matching regions on the chain to be matched. Each of the combinations of all NST blocks forms a NST frame and restricts the matching regions of ST blocks. Therefore, the minimal task is to find matchings of each ST part block within a restricted region on the chain to be matched. At the present version of MoDeL, there are 6 ST part: ANY, ANYUB, NZ, NZUB, ONE and ONEUB. A general algorithm for all 6 type part is designed and we will illustrate this via an example. Consider a ST block with all ANY type parts (we will X to indicate ANY type part in the figures). We find matchings of the first part of the block. There may be many possible matchings because part of type ANY implies arbituary matching without any restriction in MoDeL definitions. For each matching found, we then deal with the next and do the same thing recursively until the end of the template or matching fails. An easy example is shown in the figure to illustrate this simple idea.

After all possible matchings for each chain in the template have been found, we could choose any one from each to form a combination. Matching alternatives of two different chains in the template should not be the same. For example, there are two chains in the template, T1 and T2, and three chains, C1, C2 and C3 in the species. If T1 has one matching to C1 and one to C2 and T2 has one to C2 and one to C3, the possible matchings are (T1,T2)->(C1, C2), (T1,T2)->(C1,C3) and (T1,T2)->(T2,T3). (T1,T2)->(C1,C2) is not allowed. The general matching relations could be written as (n1, n2, n3...)->(m1,m2,m3...), where (n1,n2,n3...) represents chains in the template and (m1,m2,m3...) represents the chains matched in the species for (n1,n2,n3...) respectively. However, it is only one part of the matching relation because matching details, such as the matching positions of parts in the template, are ignored. The next question is: how to validate each matching?

As defined in Chain-Node model, details of distribution of binding are stored in the forest. Matching of chains could not solely determine the overall match -- whether the template and the species have the same structural pattern. However, it is not easy to compare tree structure directly. We designed an algorithm to do this based on the principle: If the template and species have the same structural pattern, it must be possible to isolate the template from the species after replacing the ST part with its matchings. We first replace the ST part of the template with its matchings in the species. And then, we remove the chains which are not matched and the associated trees -- containing leaf nodes whose corresponding parts are on those chains. If the species after removing has the same Chain-Node structure with the template after replacing, the template and the species have the same structural pattern and in brief, the template could be matched to the species. We also illustrate this idea in figures.

How to generate products and cope with intra-molecular reactions?

ODE Solver