Team:USTC Software/modelFeatures

From 2010.igem.org

Revision as of 03:26, 17 October 2010 by Soimort (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Features

Chain-Node Model

In most manual modeling, complex is treated as a whole entity in reactions. Plenty of synthetic biology automatic modeling software treats complex the same way. However, there is a big difference between manual and automatic modeling, that is: Model constructors always know which detail of complex needs to be ignored or preserved, but software does not. So in most software, detailed structure has to be specified by modelers, otherwise will be ignored. In most cases, this requires plenty of manual operations and thus making the software not so automatic. Knowing this, we choose to preserve detailed structure in complex and present Chain-Node Model, which view complex as a construction of its parts throughout the modeling process. With chain and nodes, describing large biological complex with a great number of basic units and binding sites becomes possible. It also offers great flexibility in modeling. Detailed concepts are explained below.

Abstract concept of Chain

When it comes to biobrick assembly, DNA of plasmid is often considered as an abstract straight line where biobricks can be placed on and connected together. It actually acts as the basic frame of biobrick assembly. Inspired by this, we present the abstract concept of chain, which also acts as the basic frame of part assembly. Chain is an abstract straight line where different parts can be placed on and connected together. Part is the basic unit of chain, and chain can have as many parts as possible.

To go beyond DNA, and to unify the data structure of Species, we extend the concept of chain to include all the parts we can describe, including RNA, protein, and even compartment, compound and substituent, etc. Parts of DNA, RNA and protein, are all originated from DNA, and they are considered as “Sequence”. Sequence can be connected to other parts of sequence on a chain. On the contrary, parts of compartment and compound are considered as “Non-sequence”. Parts of non-sequence cannot be connected to any other part on a chain, which means, they can only exist on a One-Part-Chain. Substituent is used as a substitution of any other part, so it can be either sequence or non-sequence.

It is important here to mention that the Sequence/Non-sequence division is only a rule guide the modeling. It is not a constraint to Chain-Node model. Actually, we choose to maximally preserve the flexibility and leave all the freedom of constructing a species to users. Users are allowed to construct ANY species they can using Chain-Node Model. For example, one can create a strange species with compartment, compound, and DNA parts on the same chain, which makes little sense in biology. So, it is really up to the users to decide how to use Chain-Node Model in modeling.

Abstract concept of Node

Node is used to characterize binding structure in biological complex. Node is an abstract concept of binding site or site that can be bond. Links always exist between a binding site and the sites it binds together, indicating the relationship between them. We use trees to represent the binding structure in complex. The parent node represents the binding sites, and the child node represents the bond sites. For example the binding structure of TetR dimer is considered as a TetR2 node with two TetR nodes as its children. There is one exception though, the ROOT node. ROOT node is used as the top node in a tree. It can only have one child and no parent. The function of ROOT is to indicate that its child is the last binding site in this tree, and it binds to nothing any more.

In the reactions of promoter repression and activation, the promoter changes its properties after binding with repressor or inducer. It is not the original promoter any more. This is an example where binding changes the properties of its node. Since we do not store all the possible property alterations within one part, the solution is to replace the old part with a new part that stores the correct properties.

Available nodes can be both a part on a chain or a node of binding site, which leaves a lot of possibilities in constructing complicated binding structure. Again, there are no restrictions, but parts of compartments are not suggested to appear in a binding structure.

The basic assumption

The basic assumption of Chain-Node Model is that a part always carries its properties wherever it is placed on a chain. This means no matter how complicated the complex is, you can always keep track of the location and properties of every single part. Letting complex inherits its parts’ properties, the assumption saves you from rewriting the structure and function of each new complex. However, we should address that the assumption is not suitable for all circumstances. For example, under this assumption, a fusion protein will carry the properties of the fused proteins, which is not true in many cases. A fusion protein may not remain its old properties and even have new properties that none of the original parts have. In this case, our suggestion is to define the fusion protein as a new part. So it is suggested that users choose different strategies of describing the same thing in Chain-Node Model base on different situations, instead of modeling everything in the same way.

Examples

With great power there must come great responsibility. In order to use the powerful Chain-Node Model correctly, please have a look at a few examples we presents.

Example of species: E_coli_cell

Example of species: pLacI:lacI4 and its tree structure.

Template Modeling

Substituent and Template

When dealing with a biological system that contains enormous number of species, it is almost impossible to do automatic modeling if reactions of each species are treated individually. We try to describe only the active parts of species in a reaction, so we add the part of substituent to replace inactive parts in a reaction. Species and reactions with substituents are called template species and template reactions. A template reaction describes the core structure of reactant that causes the reaction to happen, so it can represent a group of reactions happen for the same reason. For example, template reaction of DNA with pTetR and protein with TetR dimer can represent all such binding reactions happen in complex.

In this version of MoDeL, we only have the substituent of ANY, which replaces parts on ONE chain. The number of represented parts can range from zero to infinity. ANY fits perfectly for simple reactions like pTetR:TetR2 binding. Substituent is also a part, so just use it like one. Place it on a chain and choose its partType. For example, a substituent with type FowardDNA can replace zero to infinity parts that have the same type. Leaving the type empty will let substituent match to any type of parts. To have a visual understanding, have a look at the example below.

Template species and template reactions are far more than just brainstorming. We have developed a set of algorithms that can match actual species to template species, and successfully implement it in our software. So now iGAME have the ability to recognize species and reactions using templates, thus making it very “clever” in automatic modeling.

We are still looking for more substituents. In our next version, we are planning to have more detailed substituents that can replace certain number of parts on certain number of chains. Even the arrangement of parts and chains can be specified. We believe it will make our template modeling more powerful.

Examples

Example of template reaction: binding :pTetR:TetR2

Automatic Modeling Database Language

Introduction to MoDeL

With plenty of information for automatic modeling needs to be stored, we design our database, and develop a standard language of describing the database. We call the language: MoDeL.

MoDeL (Modeling Database Language) is a database representation format for synthetic biology automatic modeling. It is oriented towards providing data support for synthetic biology models on a number of topics, including cell signaling pathways, metabolic pathways, biochemical reactions, gene regulations, and so on.

MoDeL is developed as a machine-readable format in XML fashion, programming languages with XML library API (such as libxml2 for C++) are strongly recommended for reading MoDeL database. Examples of MoDeL database can be downloaded from https://2010.igem.org/Team:USTC_Software/resources MoDeL is an attempt to define a universal database markup language for synthetic biology automatic modeling. It supports descriptions of a wide variety of different parts, including biobrick, compound, compartment, substituent, and so on. It can also describe the species constructed by these parts, and reactions happen with the species, such as binding of species, dilution of species, degradation of proteins, amplification of cells, multi-compartment reactions, and so on. Many traditional methods fail to consider structure of species when trying to markup reactions with only reactants, modifiers, products and kinetic laws. The best part of MoDeL is that it can preserve detailed structure in species and reactions using Chain-Node Model. Based on the model, MoDeL is able to markup almost all kinds of reactions in a detailed and clear way.

Overview of MoDeL

To define all the needed components in a systematic way, MoDeL organizes them in five component containers: System: This container contains definitions of units, functions, rules and global parameters. These are all basic elements for other components of database, and many are mathematical concepts without direct synthetic biology meanings. The name of the container – System, is to indicate their relationship with other components.

Compartment: This container contains definitions of compartments. Compartment here is considered as a holder of substance with certain volume, so it only stores related data such as units and volume. It also stores a list of compartments that are permitted to be placed inside. For compartments like E.coli that can multiply, a corresponding compartment with the same id will be created in the compartment container of Part. Part: This container contains definitions of parts. Parts are the basic units that can be inserted in an abstract chain. The container of Part has five sub-containers which are: compartment, biobrick, compound, plasmidBackbone, and substituent. Biobrick contains biobricks from PartsRegistry together with other similar components created by users. Compartment contains cells and other microscopic organisms that can multiply. compound contains small molecules like IPTG and other compounds. PlasmidBackbone contains plasmid backbones come from PartsRegistry or defined by users. Substituent contains substitutes for template modeling.

Species: This container contains definitions of various species defined by Chain-Node Model. Each component stores the CNmodel structure of chains and trees. Related reactions are also stored.

Reactions: This container contains definitions of various reactions. Each component has its own compartments, reactants, modifiers, products, substituent transfers, and kinetic law. This forms a detailed description of reactions.

It is worth noting that MoDeL does not attempt to include all information about each component. In PartsRegistry, a long list of data in XML format is provided including part author, status, sequence, and even group access information. Such data are redundant for our database since they are not necessary elements needed to construct reaction network. Only the useful data is retained.

Relationship with SBML

SBML(The Systems Biology Markup Language) is a machine-readable language in XML fashion. It's oriented towards describing systems where biological entities are involved in, and modified by, processes that occur over time. As a widely used model exchange standard, SBML is chosen to be the output of our auto-modeling module.

Generally speaking, MoDeL needs not to be conformed to any constraints set by SBML. However, since SBML is a widely used standard, it is better for MoDeL to share common concepts and components with SBML as many as possible. For example, MoDeL still uses rules, function definitions and unit definitions in SBML. By doing so, it will be easier for software tools to change formats between MoDeL and SBML.

Examples

Here are a few examples of how different components look like in our database.

Example of a part Example of a species Example of a reaction