Team:Newcastle/E-Science

From 2010.igem.org

Revision as of 14:34, 20 October 2010 by Jannetta (Talk | contribs)

iGEM Homepage Newcastle University BacillaFilla Homepage Image Map

Contents

An e-Science Approach to Synthetic Biology

Motivation:

Motivation
Synthetic Biology is the engineering of biological entities to perform novel and desirable functions. Synthetic biologists already make use of computational resources to a large extent which can be seen by the software tools such as CLONEQC, Biskit and Internet based repositories such as the MIT Biobrick library at http://partsregistry.org/Main_Page. In both Systems and Synthetic Biology analysis consists of researchers taking the outputs from some software and feeding it to the inputs of other software. This pipeline or workflow can be computerised by using software such as Taverna. Research into the application of computerised workflows to Synthetic Biology has revealed the paucity of published material in this area. Therefore, this has been identified as an opportunity for research and possible development of tools, which could greatly enhance methodologies and workflows in Synthetic Biology. We show how an e-Science approach, i.e. the utilisation of advanced computing resources and technologies to support scientists, benefits the Synthetic Biology engineering process. We further propose a development life-cycle and show how this approach can improve the design of synthetic biological entities.

Introduction:

Newcastle workflows Aim.jpg

Aim: The aim of this project was to investigate the application of e-Science approaches in Synthetic Biology, particularly workflows.

Objectives:

  1. To produce a proof-of-concept web service that can be used by a workflow to automate the process of designing a BioBrick.
  2. To produce one or more Taverna workflows, that use this web service, to show how this e-Science approach will benefit the design process (Oinn et al., 2006; Hull et al., 2006) and reduce development time.
  3. To produce simulatable model parts in CellML, that could eventually be compiled into a complex model.
  4. To identify design patterns or motifs which are recognisable patterns in models that can be re-used and simulated independently of the rest of the model. Identifying these motifs should aid the design process and produce re-usable parts to simplify future designs. In principle, the motifs are similar to and serves the same purpose as design patterns in computer software.

Synthetic Biology:

Newcastle workflows Synthetic.jpg
Chopra defines Syntehtic Biology as a field that involves the synthesis of novel biological systems which are not generally found in nature (Chopra and Kamma, 2006).

 
 
 
 
 
 
 

Engineering Principles:

Newcastle workflows Engineering.jpg

Two design approaches to Synthetic biology are emerging, namely top-down and bottom up.

The top-down approach starts with an overview of a system. The system is studied and iteratively broken into modules until the complete system can be described in terms of minimal elements. In the bottom-up approach design starts with minimal elements that, according to certain rules, can fit together like Lego c bricks (http://lego.com) until a complete system is built (Heinemann and Panke, 2006).

Synthetic biologists are adopting principles from established fields of engineering to increase tractability and speed of design (Andrianantoandro et al., 2006).

We define Synthetic Biology as: Synthetic Biology is the engineering of biological systems to perform desirable and predictable functions.

Modelling languages:

Newcastle workflows Modelling.jpg

Systems Biology Markup Language (SBML) was developed to represent biochemical reaction networks in a way that can be used by different software systems to exchange models ().

CellML is a mark-up language used to store and exchange computer based mathematical models. Originally it was intended for the description of biological models, but it has been adopted by several other fields of study.

Both CellML and SBML are based on the XML (Hucka, 2003; Cuellar et al., 2003). The main difference between SMBL and CellML is that in CellML the underlying mathematics of cellular models are described in a very general way.

Virtual Parts:

Newcastel workflows Virtualparts.jpg

Cooling et al. (2010) introduce the concept of Standard Virtual Biological Parts (further on referred to as virtual parts). Virtual parts are mathematical models used to represent biological parts that can be combined to inform system design.

The use of virtual parts, however, goes beyond just the representation of biological parts. The virtual parts can also be used to represent bioenvironmental elements which are intracellular events occurring in cells or chassis. These bioenvironmental elements become interfacing parts that act as "glue" for the aggregation of the standard biological parts.

We refer to the standard biological parts as physical parts and the interfacing parts as non-physical parts.

Development Life Cycle:

Another concept, that can be borrowed from software engineering, is the software development life cycle (SDLC). We used, as a basis, what is known as the classic life-cycle paradigm or waterfall model, see Figure 2.

Figure 2: Waterfall Model

e-Science:

Newcastle workflows Synthetic.jpg

"The term 'e-Science' denotes the systematic development of research methods that exploit advanced computational thinking" Professor Malcolm Atkinson, eScience Envoy

e-Science is an enabling concept. It aims to allow the sharing of resources required in science across administrative borders.

Sharing is accomplished by tapping into techniques such as cloud and grid computing.  
 

Web Services:

Newcastle workflows Webservice.jpg

Web services are set apart from other types of services by the communications protocol they use. To be part of what is known as the Web, using HyperText Transfer Protocol (HTTP) is a prerequisite.

Web services, regardless of the architecture chosen, enable programmatic access to on-line resources and therefore more and more services are developed and become available on-line.

SOAP web services: The Simple Object Access Protocol, or SOAP as it is popularly known was originally designed to be a RPC (Remote Procedure Call) protocol using HTTP as the transport protocol. It was later extended to allow for other transport protocols such as the Simple Mail Transport Protocol (SMTP) (Kennard and Stiver, 2000). SOAP is a packaging protocol for sharing messages between applications (Snell, 2001).

RESTful web services: Roy Fielding described REST, Representational State Transfer, a software architectural style (Fielding and Fielding, 2000). REST uses the WWW as a starting point. Fielding refers to the WWW as the Null Style. As such it is an ideal architecture for web services.

Workflows:

Newcastle workflows Workflow.jpg

A workflow can be described as a sequence of connected steps or work activities that are followed to produce a required outcome. The sequence and the ways the steps impact on each other are regulated by a set of rules (DiCaterino et al., 1997 ).

The processing of the information drawn from all the distributed sources can be accomplished using workflows. To manage the complex and heterogenous nature of such a distributed environment, scientist have turned to computerised workflows.

Methods:

The Synthetic Biology Development Lifecycle

Unlike the waterfall model, there is currently no need for the maintenance phase in synthetic biology.

We propose the model in Figure 1 to fit Synthetic Biology. It is important to note the iterative nature of the software development life cycle which was also inherited by our proposed Synthetic Biology life cycle.

Newcastle workflows Sbdlc.png

Figure 1: A proposed development life cycle for Synthetic Biology.

Using CellML

The modelling language of choice for virtual parts is CellML. One of the main reasons for choosing CellML over SBML (or any of the other modelling language available) is its modular nature and its proven track record in representing intracellular processes in systems biology (Cooling et al., 2010). Identifying motifs

To identify motifs and the parts they were composed of, the CellML model of the subtilin receiver was converted to a graphical representation (Figure 2).

Newcastle workflows SubtilinReceiver.png

Figure 2: A graphic respresentation of the Subtilin Receiver model.

The virtual parts for these parts were extracted from the CellML Subtilin receiver model and assembled again in a new file.

The model in the file was checked using COR because it was quick to reload changed files and to return informative errors Database

A MySQL relational database was designed and created to hold both physical and non-physical virtual parts, required to build a simulatable model of a BioBrick. Both types of parts are saved in the same table but are distinguished by a boolean flag.

The database was populated to hold at least one generic part for each type of part that we used. A naming convention was adopted allows for the retrieval of parts based on the part name.

The structure of the database is shown in the E-R diagram in Figure 3.

Newcastle workflows Erdiagram.png

Figure 3: An E-R diagram of the database created for the BioBrickIt service, to hold virtual parts and motifs. (Click on image for larger format)

Web Service

A web service, which we called BioBrickIt, was developed in Java and served with the Apache Tomcat web server. The web service provides the functionality to populate the database with virtual parts and other information required to create simulatable CellML models. It further provides the functionality to simulate models retrieved.

A choice had to be made between offering a SOAP or a RESTful web service. Time constraints and simplicity of implementation favoured a RESTful service. Resources are exposed as URLs and results are returned as plain text rather than in XML, as would be the case if it was a SOAP service (Pautasso and Leymann, 2008).

The web service accesses the relational database discussed in section 2.4 using Java DataBase Connectivity (JDBC).

The web service can retrieve a simulatable model of a motif by firstly retrieving its rules.

Simulation

Simulation of models were provided by connecting the web service to an instance of JSim. JSim (http://www.physiome.org/jsim/) is a software application, written in Java, for simulating quantitative numeric models. It is possible to run JSim as a service, a stand alone application or an applet.

The JSim server was installed on the same server as the Apache Tomcat web server.

Simulation is requested from the BioBrickIt service by passing the parameter simulation=true with the URL when a model of a motif is requested.

Taverna workflows

To create the workflows that utilised the web service, the decision was made to use Taverna Workbench. Taverna is an open source application for managing workflows. It can be used to integrate molecular biology tools and databases available on the Web. It is especially useful for web services (Hull et al., 2006). Taverna was chosen because of familiarity with the product, but also because time restrictions did not allow for the evaluation and comparison of other tools.

Results:

Motifs

We identified two motifs, a constitutive promoter motif and a coding sequence motif (see Figures 1 and 2)

Newcastle workflows Motif1.png

Figure 1: The constitutive promoter motif.

Newcastle workflows Motif2.png

Figure 2: The coding sequence motif.

The coding sequence motif illustrates how a modular approach can be beneficial to the design process. The constitutive promoter motif that is embedded in the coding sequence motif is shown with the red dotted line.

The motifs were identified using the Subtilin receiver model created by the Newcastle University iGEM team of 2008.

The BioBrickIt Web Service

We developed a RESTful web service which we called BioBrickIt. It is available online at http://msc.jannetta.com:8080/BioBrickIt. BioBrickIt was developed using J2EE web technologies, JAXP for XML handling, the Apache Tomcat web server and a mySQL database. The service and its source code can also be downloaded as a .war file from http://msc.jannetta.com:8080/BioBrickIt/BioBrickIt.war.

The web service exposes the following resources: Adding a component (or virtual part) to the database from a file containing the part in cellML format

  • Listing all the components in the database
  • Listing all the component types in the database
  • Listing all the variables for a specified component
  • Retrieving a specified component
  • Listing the sequence for a physical component
  • Listing all the motifs in the database
  • Listing the rules for a specified motif
  • Retrieving the model for a specified motif
  • Simulating the retrieved model

Database

A MySQL relational database was designed and implemented to hold both physical and non-physical parts. The database was populated manually and using the file upload feature of the BioBrickIt web service. A set of generic parts was all that was required for this research, but the database contains several other parts which are available from the 2008 iGEM Subtilin receiver. The database is online and can be accessed via the BioBrickIt Service at http://msc.jannetta.com:8080/BioBrickIt/.

JSim server

When accessing the BioBrickIt service on http://msc.jannetta.com:8080/BioBrickIt/, a simulation of the generated motif can be requested. This is done by adding simulate=yes as a parameter to the URL. For example requesting a simulation of the generic constitutive promotor can be obtained with: http://msc.jannetta.com:8080/BioBrickIt/GetMotifModel?motif_name=motif_constitutive_promoter&simulate=yes.

Taverna workflows

We used Taverna Workbench to create 3 workflows. The first workflow (Figure 3) makes use of the BioBrickIt Webservice to determine the presence of restriction sites in a provided sequence.

The second workflow (Figure 4) retrieves the sequences of provided physical parts using the BioBrickIt service. It then concatenates the sequences. The intension is that this workflow be extended or embedded in a larger workflow that could conver a CellML model to a sequence.

Newcastle workflows Workflow1.png

Figure 3: Workflow 1, Restriction site finder.

Newcastle workflows Workflow2.png

Figure 4: Workflow 2, Retrieve sequences of virtual parts.

Newcastle workflows Workflow3.png

Figure 5: Workflow 3, Retrieve simulatable CellML model of a motif.


Newcastle University logo.png    Newcastle cbcb logo.pngNewcastle Biomedicine logo.gif    Team Newcastle CEG logo.gif
Newcastle iww logo.jpg  UNIPV Pavia Logo.gif  Newcastle BBSRC.gif    Newcastle Genevision logo.png Newcastle WelcomeTrust.jpg
FaceBook Icon