<p>Without a question there is an urgent need to build a distinct and
<p>Without a question there is an urgent need to build a distinct and
Revision as of 22:54, 27 October 2010
Team
METU Turkey Software is an interdisciplinary team of 8 students and
3 advisors from various backgrounds such as Molecular Biology, Bioinformatics,
Computer Engineering and Computer Education and Instructional Technology.
We have put our knowledge and experience in our fields together to bring
a much needed solution to a daily problem in field of synthetic biology
for iGEM 2010
Tolga Can
Tolga Can received his PhD in Computer Science at the University
of California at Santa Barbara in 2004. He is currently an Assistant
Professor of the Department of Computer Engineering, Middle
East Technical University, Ankara, Turkey. His main research
interests are in bioinformatics, especially prediction and analysis
of protein-protein interaction networks, and statistical methods
such as graphical models and kernel methods.
Yeşim Aydın-Son
Yeşim, has received her M.D in 1999 from HÜTF, Ankara and
completed her Ph.D at University of TN, Knoxville on Genome
Science and Technology in 2006. After working as a research
fellow at City of Hope National Medical Center, Duarte,CA ,
she has recently accepted her current position at METU Informatics
Institute as an Assistant Professor of Medical Informatics.
She is also the acting coordinator of the Bioinformatics Graduate
Program in METU. Main focus of her research is Genomic Biomarker
discovery and applications of biomarker research in Personalized
Medicine . Her research group is working on building a new integrated
application for genome wide association of SNP biomarkers and
discovery of genes and pathways related to diseases, where SNP
genotyping data from both microarray and next generation sequencing
experiments can be analyzed in all-in-one step.
Ömer Nebil Yaveroğlu
Ömer Nebil Yaveroğlu is currently a PhD Student in Imperial
College, London. He has worked as a teaching assistant between
2008 – 2010 in the Computer Engineering Department of Middle
East Turkey. Throughout his MSc studies, he tried to understand
the orthological similarities between the protein interaction
networks of different species using graph theory. He helped
the group as an advisor in the computing related discussions
Burak Yılmaz
I am a recent graduate of METU Molecular Biology and Genetics
department and now studying towards my masters degree on Molecular
Bioengineering at METU. My interest in synthetic biology did
start during my undergraduate years and after graduation I started
up the Sentegen company which is the first biotechnology based
company focused on synthetic biology in Turkey. I continue my
research and training in synthetic biology while also contributing
to the development of the field in my country. We need new scientific
revolutions to solve huge problems of life and emerging field
of synthetic biology is best candidate for biotechnological
revolution. I am interested in synthetic biology applications,
along with Lab-on-a-Chip devices for molecular biology techniques,
and we are designing gene synthesis chips to produce biobricks
- raw materials of synthetic garage biology- faster and cheaper.
I enjoy snowboarding, cycling and write poems.
Muhammad Akif Ağca
Content here
Cihan Taştan
2010 is the last year for me as B.S degree at Molecular Biology
and Genetics department. Moreover, I am studying at Computer
Engineering as my minor department. My research will be on Scientist
position at Cancer and Virology Relationship (Let's say Viral
Oncology) by integrating novel technices of Bioinformatics and
Synthetic Biology approaches. This is my second year at iGEM.
Hassan Salehe Matar
Ascended up the hills of Kilimanjaro, crossed over the savannah
of Serengeti and finally landed on the country of Istanbul my
name is Hassan Salehe. I'm a final year undergraduate student
at the Department of Computer Engineering, Middle East Technical
University. In Metu Turkey Software I'm a database administrator
and a core Software developer. I'm interested in software development,
database management, Computer networks and Bioiformatics I do
marathon, I like swimming and I'm fond of action movies.Oh,
I was about to forget to tell you that I also like traveling.
Thanks!
Ayub Rokhman Wakhid
From the country of thousand islands, across the ocean he
came to Ankara. Now he is finishing his undergraduate study
at the Department of Computer Education and Instructional Technology,
Middle East Technical University. This is the first time he
joined IGEM. He is in design team in Metu Turkey Software. He
has interest in animation, web development, and instructional
technologies.
Muhammad Fakhry Syauqy
A senior undergraduate student of Computer Education and
Instructional Technology at Middle East Technical University.
He came far away from Indonesia to Ankara, Turkey, to make a
great leap on his life. His role in this team is designer. Together
with Ayub, they designs the team's wiki, poster. He is interested
in 2D and 3D designing, web developing and animation. He loves
playing football and working with computer. His motto is "Possibilities
are limitless"
Saygın Karaaslan
Our multimedia support and the core of our design and animation
team. Saygın, is a senior in Biology department at METU and
about to launch his own scientific animation company. After
graduation he will be continuing his academic studies on medical
informatics, scientific data visualization and 3D molecular
animations. He has recently completed the production of OCW
for molecular biology laboratory lectures as video-notes. Never
says no to a good soccer game or mafia II. We look forward for
the day for the premier of his documentary on "History of Science"
Yener Tuncel
I have graduated from METU Molecular Biology and Genetics
department and just started the Bioinformatics Graduate Program
this fall. My main research interests is in systems biology,
and its applications. Currently I am focused on genome wide
association of SNP biomarkers, where we will utilize systems
biology approaches for discovering disease gene and pathway
associations after highthrough-put genotyping studies. During
the course of our research on the iGEM project this summer as
a Molecular Biologist I worked on the standardization of the
parts information for our applicaiton database. Also, as a Bioinformatician
in training, contributed to the development of the algorithms
for the BioGuide software. Besides research, I develop educational
tools for biology and bioinformatics education and getting used
to do couples dancing.
Motivation
Since 2008, we have been participating in iGEM as METU ( Middle East
Technical University) wet-lab team, and each year we have noticed the
increasing number of teams participating, along with an increase in
biobricks entries at partregistry.org.
While having more choices of biobricks to choose from is incredible,
searching for and choosing the appropriate parts is becoming a challenge.
This year during the construction process of iGEM biobrick parts for
our new project, we felt the need for an application to find interacting
parts based on an input/output model to design the genetic constructs.
Using a specialized software for searching the parts registry to find
possible biobricks to include into our construct would be much easy,
fast and accurate than manual. We have shared our need with a group
of friends who are software engineers, and initiated the METU_Turkey_SOFTWARE
team where we worked together over this summer to build the BIO-Guide
software.
Scope and Future Aspects
The partregistry.org is
a continuously growing collection of standard genetic parts that can
be mixed and matched to build synthetic biology devices and systems.
The Registry is based on the principle of "get some, give some". Registry
users benefit from using the parts and information available in the
Registry for designing their own genetically engineered biological systems.
In exchange, the expectation is that Registry users will contribute
back to the information and the data on existing parts and will submit
new parts they have designed in order to improve this community resource.
As an expanding database partregistry.org
needs to be more organized and the standardization template needs to
be improved. Additionally, the potential of multiple ways of using each
part in different construct combination brings out the necessity for
an application to search through the database. BioGuide is the first
designed software that organizes over 1000 parts in
partregistry.org as possible
atomics parts to build new biological device and systems for specific
input and outputs based on graph theory. The requirement of similar
applications and software tools are now inevitable in the emerging field
of synthetic biology. The innovative approach that makes the
partregistry.org easy to use
for synthetic biology applications is the collection of standardized
parts that can be used in any combination with minimal effort under
one database. But while working on our algorithm to search for possible
combinations of parts depending on the given input and output, we have
realized that present standards are inadequate and parts registry form
must be improved.
In very near future a new format for parts registry form is needed
and few additional features should be implemented to have more control
on the database. We are planning to suggest a new format and features
for the parts registry based on the survey results we have received.
And planning to build the next version of Bioguide based on the revised
parts registry form. Along with using new parts registry standards we
will be improving the algorithm, so that the software can search through
more complex relations and returns all possible functional constructs.
Project Introduction
As the field of Synthetic Biology is on the rise, iGEM is growing
up very fast and the number of parts in the parts registry is increasing
with the addition of more complex parts each day. After facing some
difficulty while running our algorithms on the parts registry, the need
for more effective standardization of parts entry was apparent. We have
investigated the information on parts in iGEM’s 2010 distribution and
reorganized the information on the parts registry forms according to
the needs of our algorithm. Then we have used graph theoretic modeling
to visualize the relations between iGEM Parts and to standardize the
representation of the parts as much as possible by graph theoretical
methods. This helped us to find input output relations between the parts.
Furthermore, our program BioGuide is now able to provide alternative
pathways to construct the most reliable and functional Biobrick devices
with respect to given inputs and expected outputs as a guide to Biobricks
parts registry.
Notebook
January
Brain Storming about the iGEM.
What is iGEM.
Previous Wet-Lab Projects developed at METU.
What kind of projects can be developed as a software
team.
NOTE: The first software team in Turkey...
February
Reading articles about iGEM WetLab and Software team
projects.
Looking for the members of the team.
Looking for the instructors who can consult the team.
March
Employing a member to team interested in Synthetic Biology.
Reading articles about Synthetic Biology, Bioinformatics
and Bio Engineering.
Founding the team [ An instructor, and student members
]
April
At this month we have started regular workshop about Synthetic
Biology, Bioengineering, and Bioinformatics.
This month biologists in the team telling the needed
basics to software group.
Week 1
Workshop -1
[Biology basics, What is Synthetic Biology?, and the
works in this field ]
Week 2
Workshop -2
[What is Synthetic Biology?, and the works in this field
]
Week 3
Workshop -3
[iGEM, Parts, Biobricks, and Devices ]
Week 4
Workshop – 4
[iGEM, Parts, Biobricks, and Devices ]
May
At this month we have completed our workshops, and as the masters
in this field started meeting with instructors. /
Meanwhile, looking for sponsors [We have designed a
document telling the iGEM, previous project and our project
generally and started to send it private companies who can
found us.]
This month it is turn to software group; they are transferring
the basics of software concepts to biologists in the team.
Furthermore, we are discussing about how we can apply
the basics of computer engineering to synthetic biology
and iGEM parts.
Week 1
Meeting -1
[First, discussion on articles that have been selected
by consulters. Then, brain storming about the selected iGEM
projects from previous years and our project ]
Week 2
Meeting -2[with consulters ]
[Tellingtheprevious projects to consulters and telling
our initial idea about project. Then, brainstorming about
our project. ]
Week 3
Meeting -3
[Basic databaseconcepts and iGEM parts. ]
Week 4
Meeting -4
[What is ER Diagram and How we can develop a database
model for iGEM parts with ER Model. ]
June
At this month software group is going on to tell the basics
of software development and programming and computer engineering,
discussions about the computer engineering approaches are continuing.
Furthermore, we have constructed the design group for
web page, poster, presentation, and an attractive animation
telling us.
Week 1
Meeting – 5
[Graph theory, Graph theoretic modeling, and graphical
modeling of iGEM parts. Using Input Output loops on iGEM
parts. ]
Week 2
Meeting – 6 [ With Consulters ]
Project Description.
Our tasks on holiday.
Telling our project and basic concepts to design
team.
Discussion on storyboard for animation.
Week 3 and Week 4
HAVE a NICE HOLIDAY
SEE YOU ON JULY 1 as a POWERFUL TEAM;
“METU TURKEY SOFTWARE”.
July
At this month we have started to develop application. And
divided the team to 3 groups [ Software - Gene – Design ].
Gene group is providing row data to software group by
extracting it from the part registery and other resources.
Software group is developing application.
Designers are learning new design tools, and applying
those to our project [Not all Members of the group working
actively for the team].
NOTE: Members are not strictly assigned to a group; this
is just for organization of tasks.
Week 1
Meeting-7
Take stock for current situation.
Discussion on web, poster, animation design.
Discussion on storyboard for animation.
Task analyses for each group [ Software, Gene, Design
].
Week 2
Meeting – 8
Checking the tasks of each group.
· Software Group
Database Design
Interface for DB.
Designing a basic SRS and SDD to be able to state
the requirements of application exactly .
· Gene Group
Extracting I/O information for each part in part
registry according to specified standards by Gene group.
Discussion about expectation from the software.
· Design Group
Team Logo
Web site
Poster
Animation
Presentation
Week 3 and Week 4
DOING THE TASKS.
August
At this month we have started to apply graph theory on iGEM
parts. We have specified nodes, edges, graph types. Furthermore,
we have started to develop a new “ Part Registery Form ” to
be able to more standardize the part entry to be able to apply
some algorithms on the parts more efficiently.
Week 1
Meeting -8
/
Node data extraction algorithm.
Node description.
Visualization of nodes.
Pathway finding according to specified I/O properties.
Representing the nodes with original images.
Week 2
Meeting -9
Whichone is node part or subparts or both are nodes
in different graphs?
Part Combination rules.
Web site, Poster Content
Animation storyboard.
Survey for new “Part Registery Standarts”.
NewPart Registery Form.
Week 3
Meeting -10
/
Extraction of Part Combination Rules
Web, poster, presentation contents generally.
Week 4
Meeting -11
USTC and Berkeley projects.[ https://2009.igem.org/Team:USTC_Software
and https://2009.igem.org/Team:Berkeley_Software
Graphical representation of node relations.
Part Combination Rules
Subpart Combinations Rules
Expectations from the software (SRS: Functional;
Requirements)
September
At this time all bases for application were nearly to finish,
the software group was waiting row data from the gene group.
Meanwhile, they were working on code bases.
Week 1
DOING THE TASKS.
Week 2
Meeting -12
Final Database
Final Graphs
GUI
Expectations from the software (SRS: Functional
Requirements) (Suggestions)
Survey details
Week 3
Meeting -13
Animation (Storyboard, timeline)
Web site (Suggestions to web site)
Poster (How we can use 3D stereoscopic image, How
we can tell the development progress and our concepts
by 3D effect etc...)
Presentation ( Suggestions about presentation )
Week 4
Meeting -14
Final Graphs
GUI ( about 70 % is over )
How to send the software to other teams for collaboration
( just general ideas, details will be talked later ).
With survey or not , can it shade the software?
What to ask to teams while sending?
October
Now everything is nearly over, it is time to put everything
together.
Gene group explaining the methods that we have used
during project.
Software group finalizing, software, testing it importing
new function according to collaboration results with METU
TURKEY wet lab team., trying to solve infinite bugs…
Design group putting all together…
Meanwhile, all team is writing the content for web,
poster, and presentation.
Week 1
Meeting -15
BioGuide Application, missing points.
Content writing
Web, poster, animation
Week 2, Week 3, & Week 4
GOOD NEWS Infinite meetings started :)
Writing content [shared it, tasks assigned to members
according to their fields.]
(including responses between 10th -22nd of Oct 2010)
General Profile of Participants
The following teams are dedicated as collaborators with more
than 60% team participation are
INSA-Lyon
Lethbridge
WashU
Out of 244 participants between 10 to 22.10.2010, 57% of the
participant had scientific degrees from B.Sc to Professor and 18%
had graduate degrees. 18% of participants are enrolled in their
teams as either Instructors or Advisors.
95 teams have responded to the survey while we are still waiting
to hear from 33 teams. 74% teams participated with one or more members
to the survey.
75% of participants were interested with synthetic biology field
for academic purposes.
Browsing the Registry of Standard Parts
56% of participants think that it is not easy to search
for the parts in Registry of Standard Parts. Many comments
indicate a need for a better search engine and more flexible keyword
search options, especially excepting aliases. Also many are longing
for recognizable parts names, which will ease searching with keyword.
Partnership with Google and enforcing standardized parts names
are suggested
As a global organization iGEM can offer the Parts Registry in
different languages and more illustrations describing how the system
works.
Content of Registry of Standard Parts
57% of participants agree that the number of parts registered
in the Registry of Standard Parts is not enough for their projects.
55% of participants think that there are enough and
useful parts distributed in iGEM Plates that we can use in our projects.
Even though most agree the number of parts in the registry is impressive,
still they find it limited when it comes to design different devices
for diverse applications especially in different species other than
E. Coli. Participants believe that if there are more functional standardized
parts, especially protein coding sequences and promoter-RBS , they can
design devices according to the needs of the community instead of designing
what can simply be assembled into a device.
Encouraging development of vectors and standards for new species
and new standardized parts in different research areas is suggested.
Enforcing submission of right DNA sequences and working conditions
for each part is suggested.
Few recommend expanding iGEM into a collaborative effort rather
than an undergraduate tournament, which will increase the number and
the diversity of the parts designed and submitted all throughout the
year.
Submission to the Parts Registry
52% of participants said that they have not encountered
difficulties during submitting parts. Even though participants
are satisfied with the web interface of the registry, most complains
about the pSB1C3 as the new standard plasmid to submit DNA.
71% of participants are like minded with our team's
opinion, which is that The nomenclature of part IDs such as construct,
device, composite parts, protein generator, is confusing as there
is no consensus on how to use them correctly.
Terminology and categorization used on iGEM’s Parts Registry should
be re-described and correct use of terminology should be enforced during
the submission process.
75% of participants agree that different, specified
submission interfaces should be designed for contructs, promoter,
RBS, CDS and terminals is needed during Registry of Standard Parts.
But, there are very strong and valid arguments against it such as,
losing the flexibility of the registry will not allow future submission
of unclassified parts.
We suggest keeping the parts submission interface as is, until
these concerns are addressed.
75% of participants agree that Out-dated, un-available
and not-characterized parts in the Registry of Standard Parts should
be removed to an archive after the consent of the designer.
“It would be great to see some sort of organization like this! I
agree that unavailable parts should be followed up on and removed if
necessary. I also think that parts which are not sufficiently documented
should be highlighted in some way. Once these parts are identified,
teams can actively characterize them as part of their projects or as
side projects.”
“Think about these things: (i) who decides when a part is out-dated,
and how can that person know that an old part cannot have a novel use
in the future? (ii) likewise, an uncharacterized part may be both characterized
and used in the future”
We suggest building a backup system, such as an archive, to sort
out the rarely used, un-available and un-categorized parts until they
are in line with the enforced standards.
91% of participants have same opinion with us, which
is that standardization of the nomenclatures used for each different
composition of parts is necessary.
Standards that should be enforced and Additional New Standards
According to our survey, from high rated to low, these standards
have been rated which has been used while assigning a name to parts
33% Type of part
17% Input
17% Output
14% Version
10% Year
9% Group
Along with above, having short recognizable part names along with
function and performance , Genbank/EMBL link and organism information
is important.
93% of participants have said that for the parts that
are marked as “WORKS” distinguishing the parts with quantitative
experimental validation vs parts without this information is important.
Most participants have encountered with similar problems about parts
that don’t work under their lab conditions or works but not they
were claimed for.
89% of participants have same opinion with us, which
is that iGEM should sub-categorize the “WORKS” comment into 1) “Quantitative”
for parts which are characterized with experiments and 2) “Qualitative”
for parts which are not characterized will be an appropriate measure
for standardization of Biobrick database.
In order to overcome these problems we suggest enforcing the working
conditions title for the registry entrance, in order to collect quantitative
experimental details on submitted parts, which might slow down the registration
process but will definitely increase the quality of the database.
61% of participants agree that POPS (Polymerase Per
Second) should be assigned to every part or biobricks with a promoter,
where appropriate. - 57% of participants have been agree that RIPS
(Ribosome per Second) should be assigned to every part or biobricks
with a RBS brick.
Though most participants agree the need for POPS and RBS information
, they are concerned about the workload it would bring to individual
labs.
“To do this, the Registry need to define a reliable and easy method
of determining the PoPS for teams to use. However, I would say that
there are better systems for quantifying promoter output than PoPS,
and they should be used instead, if possible”.
67% of participants have thought that entering POPS
information should not be mandatory while submitting new parts.
Similarly, 65% of participants disagree that entering RIBS information
should be mandatory while submitting new parts
Even though the researchers feeling the need for this information
they are shying away from requesting it as a mandatory title for parts
registry as it would be difficult for underfunded and inexperienced
groups to perform these measurements.
We strongly suggest starting a forum on how to quantify the performance
of promoters and genes to bring an easy to measure standard for the
efficiency of the parts. Additionally iGEM should the responsibility
and provide the measurements for the each promoter and gene included
in the distributions. The second choice would be even better in terms
of standardization as all the measurement will be performed by one center
under similar conditions and with experienced researchers, which will
allow user to compare and contrast the efficiencies of the parts more
accurately.
82% of participants have thought that information on
working conditions of the parts should be mandatory while submitting
new parts. Most find submiting the detailed experimental
information and working conditions is crucial and even easier than
submitting measurements of POPS or RBS.
Definitions you would like to see at the Registry of Standard Parts
Transcriptional efficiency 13%
Protein lifetime 10%
Ribosome binding efficiency 10%
mRNA lifetime 9%
Translation initiation and efficiency 9%
Protein concentration 9%
Cooperative effects with other molecules 9%
Protein-DNA binding rates and efficiencies 8%
RNA polymerase affects 8%
System copy count 8%
Protein multimerization 6%
Additional titles includes: Catalytic rates and affinities for substrates,
leakiness of promoter in lack of stimulus, POPS at various inducer/repressor
concentrations.
Efficiency of the Database Entries
86% of participants would like to see a ranking/rating
system for the parts by the other iGEM users which will be one indication
of if a part is working and how well in different laboratories.
Few had concerns about how well the rating system will work for
rarely used parts while the widely used parts would even more popular
due the the rating system. Still many believes this would be one
futher towards a peer-reviewed quality control system for the parts.
61% of participants agreed that parts should be updated
regularly by the designers, where most agreed at least when there
is new information on the parts. It has also been suggested
to give permission to all the users of that part for updating information.
73% of participants have been agree with us that excluding
the low ranking parts or the parts with negative feedback from the
future plates will increase efficiency of the system. The
major concern about excluding any part is losing the variety of
parts in the database. Few recommends excluding only the parts that
are not working.
“Efficiency shouldn't be top priority in a database. First and foremost,
data is the top priority. Excluding those parts would make the system
more efficient”
“Some parts may be rare or new and have low efficiency, but can be
very important! Getting rid of them would eliminate any chance of improvement
to these parts, which not only a qualifier for an iGEM gold medal, but
also one of the focuses of biobricks.”
We suggest excluding the parts not-working, low rated or with
negative feedbacks from the annual distribution plates but still archive
them and make their data available through the parts registry. So the
while the individuals labs are receiving plates with higher rated, fully
working parts for their projects, anyone who wants to work on a more
exotic part can search through the achieves and re-vitalize the parts
stored there. The challenge of re-vitalization of parts can be encouraged
as an collaborative effort.
New Options for the Parts Registry Database
96% of participants are like minded with us that it
will be useful to have a link out to the gene/protein information
of the parts and - %97 of participants have been agree that they
would like to know if a part is also involved in known biological
pathways.
For receiving pathway information more participants have
voted for NCBI Cog (59%) than KEGG pathways (38%) when the responses
for both has been distributed among the choices according to response
rates. Adding the blast option to the parts registry has also
been suggested to locate parts of interest. We are sure all of us would
like to see gene-protein and pathway information if these information
was integrated into the database and offered automatically for each
entry in the database.
We are planning to provide this information about the parts to
all parts registry users as a build-in option in the next version of
BioGuide in iGEM 2011.
New Parts Registry Form Suggested for The New Standards
Description
Warning Boxes:
If Out-dated, un-available and not-characterized parts exist
in the Registry of Standard Parts, bring to an archive after the
consent of the designer. Divide archive into three title: Out-dated,
un-available and not-characterized parts
Besides shown as “works”, in the works box there should be explanation
whether the part is characterized or non-characterized.
Parts should be updated regularly by the designers
Excluding the low ranking parts or the parts with negative feedback
from the future plates
Characterization Boxes:
transcriptional efficiency
mRNA lifetime
ribosome binding efficiency
translation initiation and efficiency
protein lifetime
protein concentration
protein multimerization
protein-DNA binding rates and efficiencies
cooperative effects with other molecules
RNA polymerase effects
system copy count
Desription
Search box
with click options
options: searched parts are:
Available
Length OK
Building
Planning
Missing
Unavailable
according to the clicks of above options, search is modified
Description
Assume on the part image;
part DNA sequence is not confirmed, then tag with "non-confirmed
DNA sequence"
non-characterized parts in the Parts Registry are not characterized
further, then it will be tagged as "deprecated"
also:
comment box stated that any team can make comment about experiences
with the part is opened
boxes which had been not filled with the data are highlighted;
transcriptional efficiency
mRNA lifetime
ribosome binding efficiency
translation initiation and efficiency
protein lifetime
protein concentration
protein multimerization
protein-DNA binding rates and efficiencies
cooperative effects with other molecules
RNA polymerase effects
system copy count
if the part is not characterized but "works" then a "Qualitative
part" tag is added
besides "works", "Characterized" or "non-characterized" box
is added
ranking/rating stars for the parts voted by the other iGEM users
which indicate how well the parts perform in different laboratories
is added. For example 4.5 star voted by 27 teams (number of stars
and number of votes)
Design
Code
Human Practices
Material
User Guide
Safety
Methods
Part Extraction Standards
All information about the parts that are essential in experimental
setup of iGEM projects has been utilized. The information for the parts
available provided with all three 384 well plates in Spring 2010 distribution
have been standardized. Our standardization criteria have been discussed
in detail under Database Standardization. ER diagram has been generated
which simply describes the organization of the data. Around 70% of the
parts information has been fetched by the custom parsing code from XML
and Excel files provided by iGEM. Rest of the data had to be collected
and organized manually as the organization of these data cannot be standardized
to generate an algorithm. This step was one of the most time consuming
steps in our project. For each construct and Biobrick the information
collected was; Activity, Inducer, Activator, Repressor and Inhibitor
for promoters and Inducer, Activator, Repressor and Inhibitor information
valid for synthesized molecules (mostly proteins and RNA fragments etc.)
Combination
Rules (Image Combinations) In order to build our input/output relations
graphs first we run our algorithm on the real combination dataset which
contains all few thousand different possible combinations of the biobricks.
But after performing all combinations for the first few hundred biobricks
application’s rate slowed downed tremendously, which also become very
time consuming for displaying biobricks graphs. To overcome this bottleneck
we have developed a new strategy, where we have only used the construct
combinations of the biobricks distributed within the plates. Moreover,
according to information gathered from the subparts of the constructs
distrubuted, we also collected the subpart assembly order, such as 1st:
promoter, 2nd:rbs, 3rd:coding seq, any internal parts and the Last:
terminator. Each specific Biobrick type has been assigned a number as
a unique image ID from 1 to 19. Gathering the information on subparts
was not a direct forward process. ImageID assembly orders for each construct
has been used to extract the type information for each subpart with
that construct. This innovative approach helped us to reveal 400 possible
brick combinations present within the 3x384 well plates distributed
by iGEM in Spring 2010.
Supporting Tools
Future Plan
Database Standardization
Two main focuses of our project was the organization of the available
information about Biobricks on iGEM’s website and development of a software
application to help synthetic biologists at the experimental set-up
level by providing all available construct combinations for any given
input and output relations ,which they can utilize for their own project.
Normalization and re-organization of the part information at iGEM’s
web site was needed in order to develop our application, which will
automatically search the possible construct combinations. For the organization
and analysis of the Biobricks, we used part info for Spring 2010 distribution.
The information on all three 384 well plates distributed by iGEM scrutinized
and checked individually to specify the standards available and needed.
iGEM is providing so many parts within a hierarchical way, but there
is no order in the information flow and no common standards. Furthermore,
the information bulk is being used in an ineffective manner. Some of
the parts distributed are known to be nonfunctional. Web pages for parts
contain lots of information, but majority of them, are again not ordered.
Moreover, some additional information had to be removed or replaced
in such a way that the information for parts can be used effectively.
And removal of the redundant bulk information related with parts at
iGEM’s web site had been recommended for future.
Although, the final standardization, which we have suggested is not
for general public use and it was urgently needed in order to satisfy
the needs of our algorithm. But, still it will be a valuable resource,
since it summarizes the basic information about the parts.
As the first step to build the proposed standardization template,
the headings selected related to parts are listed on Table 1. Submission
of part IDs for individual parts is an accepted and quite valuable way
of tracking information. Although, every part has unique partID, for
every part there is a need to assign unique part names as official iGEM
names. Part names will have an important role as they will be providing
the short description about the part, which synthetic biologists can
immediately recognize and utilize during the construction of unique
Biobricks. Additionally unique part names will be helpful to identify
the devices with more than one Biobrick in their constructs. Assignment
of unique and distinct names for parts describing their nature and content
will be helpful to researchers for the recognition of and search for
the parts.
Headings Selected From Previous Entry Forms for Indication of Standardized
Information
=========================================
PartID:
PartName:
Bricks:
BrickIDs:
ImageIDs:
RFC10:
RFC21:
RFC23:
RFC25:
=========================================
Table 1: The table above basically describes and designates
qualities of parts which identifies their compositions and demonstrates
the status of previously assigned standards. PartID refers to the unique
ID number for parts including atomic parts and assemblies. PartName
refers to the given unique names to parts. Bricks, refers to the shortcut
names which specifies atomic parts. ImageIDs, refers to individual or
combination of numbers that are assigned by us. RFCs refers to the states
of parts based on RFC standards.
iGEM both provides individual, atomic parts and pre-combined constructs
such as devices and systems. Availability of combined constructs is
important to the researchers as combining individual bio-bricks one
at a time will be very time consuming. These previously merged constructs,
serve as the repository for puzzle and they can be used for different
purposes. Up to date the largest and most trustworthy source, for synthetic
biology and its components, is iGEM’s parts registry. In 2010, iGEM
provided over 1000 parts that have initiated many projects. Having more
atomic parts available in the iGEM’s repository, will lead to the design
of more complex and robust constructs, and we would have a better chance
to design different constructs for unique purposes. Also, for the parts
that are already available, extra steps needs to be taken for the quality
control and surveillance of these products. The quality control of the
information for the parts is essential for the future of iGEM and synthetic
biology. Even though we have found pre-determined RFC standards useful
and included those to our standardized template, some individual parts
still requires re-organization of the information as RFC standards alone
for the functionality of parts, does not satisfy the needs for wet lab
biologists.
Without a question there is an urgent need to build a distinct and
specific database well organized with its own standards for synthetic
biology; however, development of such a database is not an easy task.
Contact Information of Part Owners and Qualitative Group Comments
about Parts
=========================================
Designers: Mail:
GroupFavorite:
StarRating:
Parameters:
=========================================
Table 2: The above table simply depicts information about possessors
of parts and their contact information and the popularity of the parts
for groups. Parameters heading, refers distinctive experimental details
unique to the usage of parts which should be decided by groups.
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Input and Output Characteristics of Parts
=========================================
Parameters:
-Input:
• Promoter:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Promoter2:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Output:
• Reporter:
• Reporter2:
• Regulator:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Regulator2:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Working Condition:
=========================================
Table 3: The table above elaborately describes the input relations
based on promoters and the output products based on the functional genes
and RNAs which are included within the parts. Working condition simply
describes any influencing factor or circumstance which is directly related
with the functional properties of parts.
Third part of our standardization template includes parameters of
contingent input and output elements. These parameters are classified
into two groups for simplicity as presented on Table 3. This final part
of the standardization template includes the upmost important information
about the Biobricks that are required for the BioGuide Software to run
its searching algorithm.
Briefly, BioGuide application is designed to catch the input and
output relations of individual parts to examine possible Biobricks pathways
for specific input and output queries. In other words, at pre-experimental
stage, it helps wet lab biologists to design their unique constructs
by revealing possible alternative options for pre-determined purposes,
along with the primary paths. Our ultimate goal is to improve the algorithm
designed for iGEM 2010 and present a new version of the BioGuide in
iGEM 2011, which will provide optimum design of constructs for predetermined
parameters.
Most of the parts are composed of functional and nonfunctional constructs
which are formed by atomic parts. And every part should carry the information
for all of its atomic parts within itself. The “input” heading actually
stands for promoters. Parts with one or more promoters can be found
at iGEM’s Parts Registry. Along with the information on which and how
many promoters a part might have, the activity level of promoters are
also important to distinguish between a constitutively active promoter
or a promoter activated by specific physiological processes or states
etc. This information was crucial for us to dissect in order to run
our algorithm as it directly affects which inputs can activate the devices
or the systems.
Throughout our investigations on the Parts Registry, we found out
that much of the terminology was being used ambiguously. Although this
might not be vital for synthetic biologists, it is still endeavoring
to understand the function of certain regulatory elements which also
becomes a time consuming task for the researcher. Thus, we recommend
that the explanations of certain regulatory elements should be redefined
and fixed especially for synthetic biology for easy communication, sharing
and searching of information.
Common misuses of the terminology can guide us to figure out how
to construct a standard nomenclature for synthetic biology. We claim
that a standard nomenclature is urgently needed for synthetic biology
for the following reasons. First of all, synthetic biology is an emerging
research discipline and an industrial application area which is highly
promising. Secondly, redefinition of the terminology to build a standard
nomenclature is needed as some of the terms are prone to be used instead
of another causing problems related to misuse for the global communication
about synthetic biology. Lastly, the nomenclature has major importance
for the construction of a persistent and trustworthy database for synthetic
biology which serves for the information exhibition and exchange globally.
For instance, there are obvious misunderstandings about the words which
are predominantly used for regulation process. We have noticed that,
the terms “inhibitor” and “repressor” are being used as equivocally
in the part information pages. Like the lactose inhibitor protein, a
widely used DNA-binding transcriptional repressor, that have been labeled
both as “inhibitor” and “repressor” at iGEM’s Parts Registry. Similar
problems resulting from ambiguous use of terminology also observed with
regulatory elements. To sum up, we investigated all input elements for
promoters and classify these elements in terms of their function, affect
and required input element for them. So, we suggest that terminology
used for regulation of transcription should be defined clearly on iGEM’s
website and correct use of terminology should be enforced.
The second group of parameters was collected under the title “Output”,
which refers to products of functional genes. In contradiction, the
term “reporter” has also been described within the same list. Reporters
are also genes whose products, can be used for screening as an output.
According to our group, the usage of the term “reporter” for genes is
unnecessary and cause extra complexity for information distribution
and gives rise to discrepancies. Instead of using the term “reporter”,
predefined “gene” description should be used for genes, which can function
as reporters. The special information which is related with the characteristic
of that gene should also be presented on part info web page.
Furthermore, the same terminology “reporter” was used for both atomic
parts and composite bio-bricks. Also the overall image descriptions
for these were defined as “reporters”. We want to point out that using
same nomenclature for both atomic genes and for whole functional constructs
contributes to the complexity and makes specific explorations difficult
through the Parts Registry. So, assigning “reporter” for both atomic
parts and for whole constructs is not a good practice. Instead, we are
suggesting the usage of other available terminology for the parts listed
as reporters, which most of the constructs, now known as reporters,
can be grouped into, such as “protein generators”, “composite parts”
or “inverters”.
Devices are whole constructs which are functional and have specific
and distinct functions. But, as we have observed, unfortunately, the
term “device” is also being used for parts which are not functional
and do not have specific functional at all. Moreover, within the classification
of devices, we argue that some terms are also being used unnecessarily
and ambiguously. Devices are classified into five types which are protein
generators, reporters, inverters, receivers and senders, measurement
devices. For example iGEM defines protein generators as:
Protein generator = promoter + rbs +gene + terminator
Though we accept the definition for protein generators, we observed
that there exist numerous parts which are defined as protein generators
but actually most of them do not fit to the definition provided above.
Although some parts are not functional and do not generate proteins
at all, they are classified as protein generators, which makes searching
for the parts difficult in the registry. Furthermore, there are also
numerous parts which are defined as “composite parts” but actually they
fit to the same definition with protein generators. In order to overcome
the problem of misuse of device type we have extracted related image
ID information for the composite parts. Image ID information helped
us to correctly categorize composite parts depending on its individual
atomic parts and identify the ones with more than one function, such
as being both inhibitor and activator. In other words, we used image
and part IDs in order to merge an input for its outputs.
Subtitle working conditions, includes all the detailed information
about the experimental properties of parts, and the details about the
working process of individual parts and complete devices. Additionally,
we marked the subtitle “Working Condition” in our standardization template
as potentially the most important title that helps synthetic biologist
to better understand the parts functions at iGEM’s part registry database.
The main problem we have encounter with the subtitle “working condition”
is within most of the parts the details about working process is not
enough and not provided regularly.
Examples of Misuse of Terminology:
For Composite Parts:
PartID: BBa_S04055
PartName: Synthetic lacYZ operon
This part is functional and responsible for the production of LacY
and LacZ proteins. This part partially fits the definition for “composite
part” but actually should be a protein generator as it fits fully to
the definition of “protein generators”.
For Protein Generators:
PartID: BBa_J45299
PartName: PchA & PchB enzyme generator
The part which is illustrated above actually fits the definition
for “composite part” but in part registry it is classified as protein
generator. This part can be functional but it needs a promoter. Even
though this part is not functional and is not capable of producing protein,
part registry assigns this product as protein generator. We suggest
that all parts in the registry, which are composed of more than one
atomic part and which are not functional on their own but can be functional,
should be classified as “composite parts”.
For Reporters:
PartID: BBa_J04451
PartName: RFP Coding Device with an LVA tag
This functional part is classified as “Reporter” in the parts registry
database. It is very clear that this part fits the same description
as Protein Generator in Biobrick part registry standards. Although,
this part has specific and known functional role, characterizing this
part as a reporter is unnecessary and contributes to the level of complexity
of information provided. Instead, we suggest that this part should be
classified as “protein generator” and related detailed information about
the specific function of this part, should be provided in the part information
page.
In conclusion, as mentioned above we tried to reorganize and normalize
the information about parts which is provided in part registry for 2010
in order to develop our algorithm for the BioGuide application. During
this process, we encountered some inconsistencies and misuses of the
terminology being used and also inadequacies about the information provided
about parts. First of all, we claim that a standard nomenclature should
be constituted for future use in the field of synthetic biology. Based
on the information gathered according to new nomenclature a professional
database should be constructed to address the needs of synthetic biology.
This will enable easy information exchange and exhibition globally.
Secondly, although there are enough information about parts exists on
parts registry database, the information which is provided for parts
need to be ordered urgently. Furthermore, there should be new experimental
standards which must be introduced to groups in the part submission
process for the subtitle “working condition”. These experimental standards
will be important because the experimental details about parts are not
satisfying the needs of wet-lab biologists for the design and the construction
of new Biobricks.
Contact
Algorithm
In this section, the step by step functioning of our application,
along with the encapsulation of the algorithmic concepts of ‘standardization’
of functional iGEM devices are depicted in pictorial forms called flowcharts.
Rectangular boxes represent the encapsulation of implementations of
the computer programs to perform the particular tasks stated in that
box on the flowcharts. These boxes are sometimes called subprograms,
objects or packages in Object Oriented software Engineering context.
The diamonds represent decision branching and they are found between
two rectangular boxes. The arrows show the direction in which subprograms
work and communicate. The subprogram at the head of the arrow starts
executing after the termination of the subprogram at the tail of the
arrow. Following flowcharts are the high level representations of our
algorithms developed for the BioGuide software.
1
Diagram 1. Flowchart of collection, formatting and storage
of devices data algorithm
Information about the iGEM parts had to be collected in a standardized
format for our application to function properly. Following data collection
custom subprograms is needed to parse and forward the data the application’s
database. In order to achieve this we have designed and implemented
the algorithm shown in diagram 1. In this algorithm, the first stage
was to find the list of part IDs of devices which were supplied by iGEM
in Spring 2010 distribution. This information has been collected from
two sources 1) plate files in excel format which was available online
2) device data provided in xml format, both provided by iGEM. The last
step in the algorithm was to send the collected partID data to the application’s
database.
2.
Diagram 2. Flowchart for BioGuide execution before and during
user interaction
Diagram 2 presents the main algorithm, which shows how BioGuide application
works. In BioGuide the major components are device and Biobrick graphs.
While the device graph represents input-output (promoter-regulator)
compatibility combination of iGEM devices, the Biobrick graph represents
combinations of atomic parts assembled in a device or system. The flowchart
shows how these graphs are created and embedded into the program, which
displays both of the graphs to the user when launched. Application presents
few interactive options to the user when started, which were shown on
the flowchart under the horizontal, bolded line. As shown on the diagram
2, there are four interactive tasks BioGuide can do, where the device
and Biobricks graphs are utilized. Upon clicking a node on a devices
or Biobricks graph, that node changes in size and color and the various
functions shown on the flowchart can be performed then after.
Modeling
Graphical Modeling for Bio-Guide
Introduction
Graphical Modeling Theory has been applied to construct four different
graphs where relations of atomic parts, devices and systems and the
functional combinations that can build new constructs are presented
for the iGEMs parts registry database. Three graphs are composed of
iGEM devices and one graph is based on Biobricks. Each graph comprises
a set of vertices or nodes and a set of edges. In the set of nodes each
node represents a device, while in the set of edges each edge represents
the input-output combination of the nodes. These graphs are directed
graphs as the edges are created according to input-output combination.
All compatibilities between a regulator and a promoter of an edge is
created, where the source of this edge is the device with the corresponding
regulator and target of the edge is the device with the promoter in
concern.
Fig. 1: A node representing a device
Fig. 2: Arrow representing an edge between two nodes
The atomic structures used in our graphical model have been represented
in Figures 1 and 2. A node is represented with a solid circle where
the label, the part/device ID according to iGEM standards, of the device
is marked on the foreground. The blue arrows between nodes connect the
related devices, representing the input-output connectivity. End style
of the arrow helps us to determine the direction of the node, like in
Figure 2 where the node labeled BBa_S03520 is the source and BBa_JO9250
is the target.
Directivity
All the four constructed graphs build for BioGuide are directed graphs.
So that, for every edge there must be a single source and a target.
There is no single edge which is bidirectional. In mathematical form
this can be represented as:
If an edge e has node v as source and node w as target then the edge
can be expressed as
For a directed graph the combination (v, w) is totally different
from (w, v). Therefore,
The direction of the edges has been represented with the arrows,
as explained in Figure 2.
Connectivity
The nodes forming their own sub-graphs disconnected from the rest
of the nodes have been recognized, which showed us the presence of incompatibility
between few regulators and promoters of the devices. We have observed
this disconnection in all four of our graphs. The basis of the disconnection
has been shown in Figure 3, where the two sub-graphs without any edge
that connects them to the main graph has been presented on the right
hand side of the diagram. These features classify our graphs as disconnected
graphs [1].
Fig. 3: A zoomed in screenshot showing two sub-graphs within
the disconnected graph.
"Semi-Simplicity"
A simple graph is a graph in which no more than one edge contains
the same set of nodes. So, in a simple graph it is not possible to find
more than one edge with the same source and the same target. Additionally,
an edge with the same source and target, forming a loop is not allowed.
But, in synthetic biology it is possible to construct a device consisting
of devices or bio bricks of the same species or type. Accordingly, our
graphs are simple graphs with an exception of possible self-containing
loops, where the edge starts from and ends on the same node. Our graphs
have an exception of having loops and due to this permitted flexibility
our graphs are "semi-simple".
Without a question there is an urgent need to build a distinct and
specific database well organized with its own standards for synthetic
biology; however, development of such a database is not an easy task.
Contact Information of Part Owners and Qualitative Group Comments
about Parts
=========================================
Designers: Mail:
GroupFavorite:
StarRating:
Parameters:
=========================================
Table 2: The above table simply depicts information about possessors
of parts and their contact information and the popularity of the parts
for groups. Parameters heading, refers distinctive experimental details
unique to the usage of parts which should be decided by groups.
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Table 3: The table above elaborately describes the input relations
based on promoters and the output products based on the functional genes
and RNAs which are included within the parts. Working condition simply
describes any influencing factor or circumstance which is directly related
with the functional properties of parts.
Third part of our standardization template includes parameters of
contingent input and output elements. These parameters are classified
into two groups for simplicity as presented on Table 3. This final part
of the standardization template includes the upmost important information
about the Biobricks that are required for the BioGuide Software to run
its searching algorithm.
Briefly, BioGuide application is designed to catch the input and
output relations of individual parts to examine possible Biobricks pathways
for specific input and output queries. In other words, at pre-experimental
stage, it helps wet lab biologists to design their unique constructs
by revealing possible alternative options for pre-determined purposes,
along with the primary paths. Our ultimate goal is to improve the algorithm
designed for iGEM 2010 and present a new version of the BioGuide in
iGEM 2011, which will provide optimum design of constructs for predetermined
parameters.
Most of the parts are composed of functional and nonfunctional constructs
which are formed by atomic parts. And every part should carry the information
for all of its atomic parts within itself. The “input” heading actually
stands for promoters. Parts with one or more promoters can be found
at iGEM’s Parts Registry. Along with the information on which and how
many promoters a part might have, the activity level of promoters are
also important to distinguish between a constitutively active promoter
or a promoter activated by specific physiological processes or states
etc. This information was crucial for us to dissect in order to run
our algorithm as it directly affects which inputs can activate the devices
or the systems.
Throughout our investigations on the Parts Registry, we found out
that much of the terminology was being used ambiguously. Although this
might not be vital for synthetic biologists, it is still endeavoring
to understand the function of certain regulatory elements which also
becomes a time consuming task for the researcher. Thus, we recommend
that the explanations of certain regulatory elements should be redefined
and fixed especially for synthetic biology for easy communication, sharing
and searching of information.
Common misuses of the terminology can guide us to figure out how
to construct a standard nomenclature for synthetic biology. We claim
that a standard nomenclature is urgently needed for synthetic biology
for the following reasons. First of all, synthetic biology is an emerging
research discipline and an industrial application area which is highly
promising. Secondly, redefinition of the terminology to build a standard
nomenclature is needed as some of the terms are prone to be used instead
of another causing problems related to misuse for the global communication
about synthetic biology. Lastly, the nomenclature has major importance
for the construction of a persistent and trustworthy database for synthetic
biology which serves for the information exhibition and exchange globally.
For instance, there are obvious misunderstandings about the words which
are predominantly used for regulation process. We have noticed that,
the terms “inhibitor” and “repressor” are being used as equivocally
in the part information pages. Like the lactose inhibitor protein, a
widely used DNA-binding transcriptional repressor, that have been labeled
both as “inhibitor” and “repressor” at iGEM’s Parts Registry. Similar
problems resulting from ambiguous use of terminology also observed with
regulatory elements. To sum up, we investigated all input elements for
promoters and classify these elements in terms of their function, affect
and required input element for them. So, we suggest that terminology
used for regulation of transcription should be defined clearly on iGEM’s
website and correct use of terminology should be enforced.
The second group of parameters was collected under the title “Output”,
which refers to products of functional genes. In contradiction, the
term “reporter” has also been described within the same list. Reporters
are also genes whose products, can be used for screening as an output.
According to our group, the usage of the term “reporter” for genes is
unnecessary and cause extra complexity for information distribution
and gives rise to discrepancies. Instead of using the term “reporter”,
predefined “gene” description should be used for genes, which can function
as reporters. The special information which is related with the characteristic
of that gene should also be presented on part info web page.
Furthermore, the same terminology “reporter” was used for both atomic
parts and composite bio-bricks. Also the overall image descriptions
for these were defined as “reporters”. We want to point out that using
same nomenclature for both atomic genes and for whole functional constructs
contributes to the complexity and makes specific explorations difficult
through the Parts Registry. So, assigning “reporter” for both atomic
parts and for whole constructs is not a good practice. Instead, we are
suggesting the usage of other available terminology for the parts listed
as reporters, which most of the constructs, now known as reporters,
can be grouped into, such as “protein generators”, “composite parts”
or “inverters”.
Devices are whole constructs which are functional and have specific
and distinct functions. But, as we have observed, unfortunately, the
term “device” is also being used for parts which are not functional
and do not have specific functional at all. Moreover, within the classification
of devices, we argue that some terms are also being used unnecessarily
and ambiguously. Devices are classified into five types which are protein
generators, reporters, inverters, receivers and senders, measurement
devices. For example iGEM defines protein generators as:
Protein generator = promoter + rbs +gene + terminator
Though we accept the definition for protein generators, we observed
that there exist numerous parts which are defined as protein generators
but actually most of them do not fit to the definition provided above.
Although some parts are not functional and do not generate proteins
at all, they are classified as protein generators, which makes searching
for the parts difficult in the registry. Furthermore, there are also
numerous parts which are defined as “composite parts” but actually they
fit to the same definition with protein generators. In order to overcome
the problem of misuse of device type we have extracted related image
ID information for the composite parts. Image ID information helped
us to correctly categorize composite parts depending on its individual
atomic parts and identify the ones with more than one function, such
as being both inhibitor and activator. In other words, we used image
and part IDs in order to merge an input for its outputs.
Subtitle working conditions, includes all the detailed information
about the experimental properties of parts, and the details about the
working process of individual parts and complete devices. Additionally,
we marked the subtitle “Working Condition” in our standardization template
as potentially the most important title that helps synthetic biologist
to better understand the parts functions at iGEM’s part registry database.
The main problem we have encounter with the subtitle “working condition”
is within most of the parts the details about working process is not
enough and not provided regularly.
Examples of Misuse of Terminology:
For Composite Parts:
PartID: BBa_S04055
PartName: Synthetic lacYZ operon
<img src="" />
This part is functional and responsible for the production of LacY
and LacZ proteins. This part partially fits the definition for “composite
part” but actually should be a protein generator as it fits fully to
the definition of “protein generators”.
For Protein Generators:
PartID: BBa_J45299
PartName: PchA & PchB enzyme generator
<img src="" />
The part which is illustrated above actually fits the definition
for “composite part” but in part registry it is classified as protein
generator. This part can be functional but it needs a promoter. Even
though this part is not functional and is not capable of producing protein,
part registry assigns this product as protein generator. We suggest
that all parts in the registry, which are composed of more than one
atomic part and which are not functional on their own but can be functional,
should be classified as “composite parts”.
For Reporters:
PartID: BBa_J04451
PartName: RFP Coding Device with an LVA tag
<img src="" />
This functional part is classified as “Reporter” in the parts registry
database. It is very clear that this part fits the same description
as Protein Generator in Biobrick part registry standards. Although,
this part has specific and known functional role, characterizing this
part as a reporter is unnecessary and contributes to the level of complexity
of information provided. Instead, we suggest that this part should be
classified as “protein generator” and related detailed information about
the specific function of this part, should be provided in the part information
page.
In conclusion, as mentioned above we tried to reorganize and normalize
the information about parts which is provided in part registry for 2010
in order to develop our algorithm for the BioGuide application. During
this process, we encountered some inconsistencies and misuses of the
terminology being used and also inadequacies about the information provided
about parts. First of all, we claim that a standard nomenclature should
be constituted for future use in the field of synthetic biology. Based
on the information gathered according to new nomenclature a professional
database should be constructed to address the needs of synthetic biology.
This will enable easy information exchange and exhibition globally.
Secondly, although there are enough information about parts exists on
parts registry database, the information which is provided for parts
need to be ordered urgently. Furthermore, there should be new experimental
standards which must be introduced to groups in the part submission
process for the subtitle “working condition”. These experimental standards
will be important because the experimental details about parts are not
satisfying the needs of wet-lab biologists for the design and the construction
of new Biobricks.
</div>
</div>
</div>
Contact
Algorithm
In this section, the step by step functioning of our application,
along with the encapsulation of the algorithmic concepts of ‘standardization’
of functional iGEM devices are depicted in pictorial forms called flowcharts.
Rectangular boxes represent the encapsulation of implementations of
the computer programs to perform the particular tasks stated in that
box on the flowcharts. These boxes are sometimes called subprograms,
objects or packages in Object Oriented software Engineering context.
The diamonds represent decision branching and they are found between
two rectangular boxes. The arrows show the direction in which subprograms
work and communicate. The subprogram at the head of the arrow starts
executing after the termination of the subprogram at the tail of the
arrow. Following flowcharts are the high level representations of our
algorithms developed for the BioGuide software.
1
<img src="" />
Diagram 1. Flowchart of collection, formatting and storage
of devices data algorithm
Information about the iGEM parts had to be collected in a standardized
format for our application to function properly. Following data collection
custom subprograms is needed to parse and forward the data the application’s
database. In order to achieve this we have designed and implemented
the algorithm shown in diagram 1. In this algorithm, the first stage
was to find the list of part IDs of devices which were supplied by iGEM
in Spring 2010 distribution. This information has been collected from
two sources 1) plate files in excel format which was available online
2) device data provided in xml format, both provided by iGEM. The last
step in the algorithm was to send the collected partID data to the application’s
database.
2.
<img src="" />
Diagram 2. Flowchart for BioGuide execution before and during
user interaction
Diagram 2 presents the main algorithm, which shows how BioGuide application
works. In BioGuide the major components are device and Biobrick graphs.
While the device graph represents input-output (promoter-regulator)
compatibility combination of iGEM devices, the Biobrick graph represents
combinations of atomic parts assembled in a device or system. The flowchart
shows how these graphs are created and embedded into the program, which
displays both of the graphs to the user when launched. Application presents
few interactive options to the user when started, which were shown on
the flowchart under the horizontal, bolded line. As shown on the diagram
2, there are four interactive tasks BioGuide can do, where the device
and Biobricks graphs are utilized. Upon clicking a node on a devices
or Biobricks graph, that node changes in size and color and the various
functions shown on the flowchart can be performed then after.
Modeling
Graphical Modeling for Bio-Guide
Introduction
Graphical Modeling Theory has been applied to construct four different
graphs where relations of atomic parts, devices and systems and the
functional combinations that can build new constructs are presented
for the iGEMs parts registry database. Three graphs are composed of
iGEM devices and one graph is based on Biobricks. Each graph comprises
a set of vertices or nodes and a set of edges. In the set of nodes each
node represents a device, while in the set of edges each edge represents
the input-output combination of the nodes. These graphs are directed
graphs as the edges are created according to input-output combination.
All compatibilities between a regulator and a promoter of an edge is
created, where the source of this edge is the device with the corresponding
regulator and target of the edge is the device with the promoter in
concern.
<img src="" />
Fig. 1: A node representing a device
<img src="" />
Fig. 2: Arrow representing an edge between two nodes
The atomic structures used in our graphical model have been represented
in Figures 1 and 2. A node is represented with a solid circle where
the label, the part/device ID according to iGEM standards, of the device
is marked on the foreground. The blue arrows between nodes connect the
related devices, representing the input-output connectivity. End style
of the arrow helps us to determine the direction of the node, like in
Figure 2 where the node labeled BBa_S03520 is the source and BBa_JO9250
is the target.
Directivity
All the four constructed graphs build for BioGuide are directed graphs.
So that, for every edge there must be a single source and a target.
There is no single edge which is bidirectional. In mathematical form
this can be represented as:
If an edge e has node v as source and node w as target then the edge
can be expressed as
<img src="" />
For a directed graph the combination (v, w) is totally different
from (w, v). Therefore,
<img src="" />
The direction of the edges has been represented with the arrows,
as explained in Figure 2.
Connectivity
The nodes forming their own sub-graphs disconnected from the rest
of the nodes have been recognized, which showed us the presence of incompatibility
between few regulators and promoters of the devices. We have observed
this disconnection in all four of our graphs. The basis of the disconnection
has been shown in Figure 3, where the two sub-graphs without any edge
that connects them to the main graph has been presented on the right
hand side of the diagram. These features classify our graphs as disconnected
graphs [1].
<img src="" />
Fig. 3: A zoomed in screenshot showing two sub-graphs within
the disconnected graph.
"Semi-Simplicity"
A simple graph is a graph in which no more than one edge contains
the same set of nodes. So, in a simple graph it is not possible to find
more than one edge with the same source and the same target. Additionally,
an edge with the same source and target, forming a loop is not allowed.
But, in synthetic biology it is possible to construct a device consisting
of devices or bio bricks of the same species or type. Accordingly, our
graphs are simple graphs with an exception of possible self-containing
loops, where the edge starts from and ends on the same node. Our graphs
have an exception of having loops and due to this permitted flexibility
our graphs are "semi-simple".
Without a question there is an urgent need to build a distinct and
specific database well organized with its own standards for synthetic
biology; however, development of such a database is not an easy task.
Contact Information of Part Owners and Qualitative Group Comments
about Parts
=========================================
Designers: Mail:
GroupFavorite:
StarRating:
Parameters:
=========================================
Table 2: The above table simply depicts information about possessors
of parts and their contact information and the popularity of the parts
for groups. Parameters heading, refers distinctive experimental details
unique to the usage of parts which should be decided by groups.
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Input and Output Characteristics of Parts
=========================================
Parameters:
-Input:
• Promoter:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Promoter2:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Output:
• Reporter:
• Reporter2:
• Regulator:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Regulator2:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Working Condition:
=========================================
Table 3: The table above elaborately describes the input relations
based on promoters and the output products based on the functional genes
and RNAs which are included within the parts. Working condition simply
describes any influencing factor or circumstance which is directly related
with the functional properties of parts.
Third part of our standardization template includes parameters of
contingent input and output elements. These parameters are classified
into two groups for simplicity as presented on Table 3. This final part
of the standardization template includes the upmost important information
about the Biobricks that are required for the BioGuide Software to run
its searching algorithm.
Briefly, BioGuide application is designed to catch the input and
output relations of individual parts to examine possible Biobricks pathways
for specific input and output queries. In other words, at pre-experimental
stage, it helps wet lab biologists to design their unique constructs
by revealing possible alternative options for pre-determined purposes,
along with the primary paths. Our ultimate goal is to improve the algorithm
designed for iGEM 2010 and present a new version of the BioGuide in
iGEM 2011, which will provide optimum design of constructs for predetermined
parameters.
Most of the parts are composed of functional and nonfunctional constructs
which are formed by atomic parts. And every part should carry the information
for all of its atomic parts within itself. The “input” heading actually
stands for promoters. Parts with one or more promoters can be found
at iGEM’s Parts Registry. Along with the information on which and how
many promoters a part might have, the activity level of promoters are
also important to distinguish between a constitutively active promoter
or a promoter activated by specific physiological processes or states
etc. This information was crucial for us to dissect in order to run
our algorithm as it directly affects which inputs can activate the devices
or the systems.
Throughout our investigations on the Parts Registry, we found out
that much of the terminology was being used ambiguously. Although this
might not be vital for synthetic biologists, it is still endeavoring
to understand the function of certain regulatory elements which also
becomes a time consuming task for the researcher. Thus, we recommend
that the explanations of certain regulatory elements should be redefined
and fixed especially for synthetic biology for easy communication, sharing
and searching of information.
Common misuses of the terminology can guide us to figure out how
to construct a standard nomenclature for synthetic biology. We claim
that a standard nomenclature is urgently needed for synthetic biology
for the following reasons. First of all, synthetic biology is an emerging
research discipline and an industrial application area which is highly
promising. Secondly, redefinition of the terminology to build a standard
nomenclature is needed as some of the terms are prone to be used instead
of another causing problems related to misuse for the global communication
about synthetic biology. Lastly, the nomenclature has major importance
for the construction of a persistent and trustworthy database for synthetic
biology which serves for the information exhibition and exchange globally.
For instance, there are obvious misunderstandings about the words which
are predominantly used for regulation process. We have noticed that,
the terms “inhibitor” and “repressor” are being used as equivocally
in the part information pages. Like the lactose inhibitor protein, a
widely used DNA-binding transcriptional repressor, that have been labeled
both as “inhibitor” and “repressor” at iGEM’s Parts Registry. Similar
problems resulting from ambiguous use of terminology also observed with
regulatory elements. To sum up, we investigated all input elements for
promoters and classify these elements in terms of their function, affect
and required input element for them. So, we suggest that terminology
used for regulation of transcription should be defined clearly on iGEM’s
website and correct use of terminology should be enforced.
The second group of parameters was collected under the title “Output”,
which refers to products of functional genes. In contradiction, the
term “reporter” has also been described within the same list. Reporters
are also genes whose products, can be used for screening as an output.
According to our group, the usage of the term “reporter” for genes is
unnecessary and cause extra complexity for information distribution
and gives rise to discrepancies. Instead of using the term “reporter”,
predefined “gene” description should be used for genes, which can function
as reporters. The special information which is related with the characteristic
of that gene should also be presented on part info web page.
Furthermore, the same terminology “reporter” was used for both atomic
parts and composite bio-bricks. Also the overall image descriptions
for these were defined as “reporters”. We want to point out that using
same nomenclature for both atomic genes and for whole functional constructs
contributes to the complexity and makes specific explorations difficult
through the Parts Registry. So, assigning “reporter” for both atomic
parts and for whole constructs is not a good practice. Instead, we are
suggesting the usage of other available terminology for the parts listed
as reporters, which most of the constructs, now known as reporters,
can be grouped into, such as “protein generators”, “composite parts”
or “inverters”.
Devices are whole constructs which are functional and have specific
and distinct functions. But, as we have observed, unfortunately, the
term “device” is also being used for parts which are not functional
and do not have specific functional at all. Moreover, within the classification
of devices, we argue that some terms are also being used unnecessarily
and ambiguously. Devices are classified into five types which are protein
generators, reporters, inverters, receivers and senders, measurement
devices. For example iGEM defines protein generators as:
Protein generator = promoter + rbs +gene + terminator
Though we accept the definition for protein generators, we observed
that there exist numerous parts which are defined as protein generators
but actually most of them do not fit to the definition provided above.
Although some parts are not functional and do not generate proteins
at all, they are classified as protein generators, which makes searching
for the parts difficult in the registry. Furthermore, there are also
numerous parts which are defined as “composite parts” but actually they
fit to the same definition with protein generators. In order to overcome
the problem of misuse of device type we have extracted related image
ID information for the composite parts. Image ID information helped
us to correctly categorize composite parts depending on its individual
atomic parts and identify the ones with more than one function, such
as being both inhibitor and activator. In other words, we used image
and part IDs in order to merge an input for its outputs.
Subtitle working conditions, includes all the detailed information
about the experimental properties of parts, and the details about the
working process of individual parts and complete devices. Additionally,
we marked the subtitle “Working Condition” in our standardization template
as potentially the most important title that helps synthetic biologist
to better understand the parts functions at iGEM’s part registry database.
The main problem we have encounter with the subtitle “working condition”
is within most of the parts the details about working process is not
enough and not provided regularly.
Examples of Misuse of Terminology:
For Composite Parts:
PartID: BBa_S04055
PartName: Synthetic lacYZ operon
<img src="" />
This part is functional and responsible for the production of LacY
and LacZ proteins. This part partially fits the definition for “composite
part” but actually should be a protein generator as it fits fully to
the definition of “protein generators”.
For Protein Generators:
PartID: BBa_J45299
PartName: PchA & PchB enzyme generator
<img src="" />
The part which is illustrated above actually fits the definition
for “composite part” but in part registry it is classified as protein
generator. This part can be functional but it needs a promoter. Even
though this part is not functional and is not capable of producing protein,
part registry assigns this product as protein generator. We suggest
that all parts in the registry, which are composed of more than one
atomic part and which are not functional on their own but can be functional,
should be classified as “composite parts”.
For Reporters:
PartID: BBa_J04451
PartName: RFP Coding Device with an LVA tag
<img src="" />
This functional part is classified as “Reporter” in the parts registry
database. It is very clear that this part fits the same description
as Protein Generator in Biobrick part registry standards. Although,
this part has specific and known functional role, characterizing this
part as a reporter is unnecessary and contributes to the level of complexity
of information provided. Instead, we suggest that this part should be
classified as “protein generator” and related detailed information about
the specific function of this part, should be provided in the part information
page.
In conclusion, as mentioned above we tried to reorganize and normalize
the information about parts which is provided in part registry for 2010
in order to develop our algorithm for the BioGuide application. During
this process, we encountered some inconsistencies and misuses of the
terminology being used and also inadequacies about the information provided
about parts. First of all, we claim that a standard nomenclature should
be constituted for future use in the field of synthetic biology. Based
on the information gathered according to new nomenclature a professional
database should be constructed to address the needs of synthetic biology.
This will enable easy information exchange and exhibition globally.
Secondly, although there are enough information about parts exists on
parts registry database, the information which is provided for parts
need to be ordered urgently. Furthermore, there should be new experimental
standards which must be introduced to groups in the part submission
process for the subtitle “working condition”. These experimental standards
will be important because the experimental details about parts are not
satisfying the needs of wet-lab biologists for the design and the construction
of new Biobricks.
</div>
</div>
</div>
Contact
Algorithm
In this section, the step by step functioning of our application,
along with the encapsulation of the algorithmic concepts of ‘standardization’
of functional iGEM devices are depicted in pictorial forms called flowcharts.
Rectangular boxes represent the encapsulation of implementations of
the computer programs to perform the particular tasks stated in that
box on the flowcharts. These boxes are sometimes called subprograms,
objects or packages in Object Oriented software Engineering context.
The diamonds represent decision branching and they are found between
two rectangular boxes. The arrows show the direction in which subprograms
work and communicate. The subprogram at the head of the arrow starts
executing after the termination of the subprogram at the tail of the
arrow. Following flowcharts are the high level representations of our
algorithms developed for the BioGuide software.
1
<img src="" />
Diagram 1. Flowchart of collection, formatting and storage
of devices data algorithm
Information about the iGEM parts had to be collected in a standardized
format for our application to function properly. Following data collection
custom subprograms is needed to parse and forward the data the application’s
database. In order to achieve this we have designed and implemented
the algorithm shown in diagram 1. In this algorithm, the first stage
was to find the list of part IDs of devices which were supplied by iGEM
in Spring 2010 distribution. This information has been collected from
two sources 1) plate files in excel format which was available online
2) device data provided in xml format, both provided by iGEM. The last
step in the algorithm was to send the collected partID data to the application’s
database.
2.
<img src="" />
Diagram 2. Flowchart for BioGuide execution before and during
user interaction
Diagram 2 presents the main algorithm, which shows how BioGuide application
works. In BioGuide the major components are device and Biobrick graphs.
While the device graph represents input-output (promoter-regulator)
compatibility combination of iGEM devices, the Biobrick graph represents
combinations of atomic parts assembled in a device or system. The flowchart
shows how these graphs are created and embedded into the program, which
displays both of the graphs to the user when launched. Application presents
few interactive options to the user when started, which were shown on
the flowchart under the horizontal, bolded line. As shown on the diagram
2, there are four interactive tasks BioGuide can do, where the device
and Biobricks graphs are utilized. Upon clicking a node on a devices
or Biobricks graph, that node changes in size and color and the various
functions shown on the flowchart can be performed then after.
Modeling
Graphical Modeling for Bio-Guide
Introduction
Graphical Modeling Theory has been applied to construct four different
graphs where relations of atomic parts, devices and systems and the
functional combinations that can build new constructs are presented
for the iGEMs parts registry database. Three graphs are composed of
iGEM devices and one graph is based on Biobricks. Each graph comprises
a set of vertices or nodes and a set of edges. In the set of nodes each
node represents a device, while in the set of edges each edge represents
the input-output combination of the nodes. These graphs are directed
graphs as the edges are created according to input-output combination.
All compatibilities between a regulator and a promoter of an edge is
created, where the source of this edge is the device with the corresponding
regulator and target of the edge is the device with the promoter in
concern.
<img src="" />
Fig. 1: A node representing a device
<img src="" />
Fig. 2: Arrow representing an edge between two nodes
The atomic structures used in our graphical model have been represented
in Figures 1 and 2. A node is represented with a solid circle where
the label, the part/device ID according to iGEM standards, of the device
is marked on the foreground. The blue arrows between nodes connect the
related devices, representing the input-output connectivity. End style
of the arrow helps us to determine the direction of the node, like in
Figure 2 where the node labeled BBa_S03520 is the source and BBa_JO9250
is the target.
Directivity
All the four constructed graphs build for BioGuide are directed graphs.
So that, for every edge there must be a single source and a target.
There is no single edge which is bidirectional. In mathematical form
this can be represented as:
If an edge e has node v as source and node w as target then the edge
can be expressed as
<img src="" />
For a directed graph the combination (v, w) is totally different
from (w, v). Therefore,
<img src="" />
The direction of the edges has been represented with the arrows,
as explained in Figure 2.
Connectivity
The nodes forming their own sub-graphs disconnected from the rest
of the nodes have been recognized, which showed us the presence of incompatibility
between few regulators and promoters of the devices. We have observed
this disconnection in all four of our graphs. The basis of the disconnection
has been shown in Figure 3, where the two sub-graphs without any edge
that connects them to the main graph has been presented on the right
hand side of the diagram. These features classify our graphs as disconnected
graphs [1].
<img src="" />
Fig. 3: A zoomed in screenshot showing two sub-graphs within
the disconnected graph.
"Semi-Simplicity"
A simple graph is a graph in which no more than one edge contains
the same set of nodes. So, in a simple graph it is not possible to find
more than one edge with the same source and the same target. Additionally,
an edge with the same source and target, forming a loop is not allowed.
But, in synthetic biology it is possible to construct a device consisting
of devices or bio bricks of the same species or type. Accordingly, our
graphs are simple graphs with an exception of possible self-containing
loops, where the edge starts from and ends on the same node. Our graphs
have an exception of having loops and due to this permitted flexibility
our graphs are "semi-simple".
</html>
they will be providing
the short description about the part, which synthetic biologists can
immediately recognize and utilize during the construction of unique
Biobricks. Additionally unique part names will be helpful to identify
the devices with more than one Biobrick in their constructs. Assignment
of unique and distinct names for parts describing their nature and content
will be helpful to researchers for the recognition of and search for
the parts.</p>
Headings Selected From Previous Entry Forms for Indication of Standardized
Information
=========================================
PartID:
PartName:
Bricks:
BrickIDs:
ImageIDs:
RFC10:
RFC21:
RFC23:
RFC25:
=========================================
Table 1: The table above basically describes and designates
qualities of parts which identifies their compositions and demonstrates
the status of previously assigned standards. PartID refers to the unique
ID number for parts including atomic parts and assemblies. PartName
refers to the given unique names to parts. Bricks, refers to the shortcut
names which specifies atomic parts. ImageIDs, refers to individual or
combination of numbers that are assigned by us. RFCs refers to the states
of parts based on RFC standards.
iGEM both provides individual, atomic parts and pre-combined constructs
such as devices and systems. Availability of combined constructs is
important to the researchers as combining individual bio-bricks one
at a time will be very time consuming. These previously merged constructs,
serve as the repository for puzzle and they can be used for different
purposes. Up to date the largest and most trustworthy source, for synthetic
biology and its components, is iGEM’s parts registry. In 2010, iGEM
provided over 1000 parts that have initiated many projects. Having more
atomic parts available in the iGEM’s repository, will lead to the design
of more complex and robust constructs, and we would have a better chance
to design different constructs for unique purposes. Also, for the parts
that are already available, extra steps needs to be taken for the quality
control and surveillance of these products. The quality control of the
information for the parts is essential for the future of iGEM and synthetic
biology. Even though we have found pre-determined RFC standards useful
and included those to our standardized template, some individual parts
still requires re-organization of the information as RFC standards alone
for the functionality of parts, does not satisfy the needs for wet lab
biologists.
Without a question there is an urgent need to build a distinct and
specific database well organized with its own standards for synthetic
biology; however, development of such a database is not an easy task.
Contact Information of Part Owners and Qualitative Group Comments
about Parts
=========================================
Designers: Mail:
GroupFavorite:
StarRating:
Parameters:
=========================================
Table 2: The above table simply depicts information about possessors
of parts and their contact information and the popularity of the parts
for groups. Parameters heading, refers distinctive experimental details
unique to the usage of parts which should be decided by groups.
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Input and Output Characteristics of Parts
=========================================
Parameters:
-Input:
• Promoter:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Promoter2:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Output:
• Reporter:
• Reporter2:
• Regulator:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Regulator2:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Working Condition:
=========================================
Table 3: The table above elaborately describes the input relations
based on promoters and the output products based on the functional genes
and RNAs which are included within the parts. Working condition simply
describes any influencing factor or circumstance which is directly related
with the functional properties of parts.
Third part of our standardization template includes parameters of
contingent input and output elements. These parameters are classified
into two groups for simplicity as presented on Table 3. This final part
of the standardization template includes the upmost important information
about the Biobricks that are required for the BioGuide Software to run
its searching algorithm.
Briefly, BioGuide application is designed to catch the input and
output relations of individual parts to examine possible Biobricks pathways
for specific input and output queries. In other words, at pre-experimental
stage, it helps wet lab biologists to design their unique constructs
by revealing possible alternative options for pre-determined purposes,
along with the primary paths. Our ultimate goal is to improve the algorithm
designed for iGEM 2010 and present a new version of the BioGuide in
iGEM 2011, which will provide optimum design of constructs for predetermined
parameters.
Most of the parts are composed of functional and nonfunctional constructs
which are formed by atomic parts. And every part should carry the information
for all of its atomic parts within itself. The “input” heading actually
stands for promoters. Parts with one or more promoters can be found
at iGEM’s Parts Registry. Along with the information on which and how
many promoters a part might have, the activity level of promoters are
also important to distinguish between a constitutively active promoter
or a promoter activated by specific physiological processes or states
etc. This information was crucial for us to dissect in order to run
our algorithm as it directly affects which inputs can activate the devices
or the systems.
Throughout our investigations on the Parts Registry, we found out
that much of the terminology was being used ambiguously. Although this
might not be vital for synthetic biologists, it is still endeavoring
to understand the function of certain regulatory elements which also
becomes a time consuming task for the researcher. Thus, we recommend
that the explanations of certain regulatory elements should be redefined
and fixed especially for synthetic biology for easy communication, sharing
and searching of information.
Common misuses of the terminology can guide us to figure out how
to construct a standard nomenclature for synthetic biology. We claim
that a standard nomenclature is urgently needed for synthetic biology
for the following reasons. First of all, synthetic biology is an emerging
research discipline and an industrial application area which is highly
promising. Secondly, redefinition of the terminology to build a standard
nomenclature is needed as some of the terms are prone to be used instead
of another causing problems related to misuse for the global communication
about synthetic biology. Lastly, the nomenclature has major importance
for the construction of a persistent and trustworthy database for synthetic
biology which serves for the information exhibition and exchange globally.
For instance, there are obvious misunderstandings about the words which
are predominantly used for regulation process. We have noticed that,
the terms “inhibitor” and “repressor” are being used as equivocally
in the part information pages. Like the lactose inhibitor protein, a
widely used DNA-binding transcriptional repressor, that have been labeled
both as “inhibitor” and “repressor” at iGEM’s Parts Registry. Similar
problems resulting from ambiguous use of terminology also observed with
regulatory elements. To sum up, we investigated all input elements for
promoters and classify these elements in terms of their function, affect
and required input element for them. So, we suggest that terminology
used for regulation of transcription should be defined clearly on iGEM’s
website and correct use of terminology should be enforced.
The second group of parameters was collected under the title “Output”,
which refers to products of functional genes. In contradiction, the
term “reporter” has also been described within the same list. Reporters
are also genes whose products, can be used for screening as an output.
According to our group, the usage of the term “reporter” for genes is
unnecessary and cause extra complexity for information distribution
and gives rise to discrepancies. Instead of using the term “reporter”,
predefined “gene” description should be used for genes, which can function
as reporters. The special information which is related with the characteristic
of that gene should also be presented on part info web page.
Furthermore, the same terminology “reporter” was used for both atomic
parts and composite bio-bricks. Also the overall image descriptions
for these were defined as “reporters”. We want to point out that using
same nomenclature for both atomic genes and for whole functional constructs
contributes to the complexity and makes specific explorations difficult
through the Parts Registry. So, assigning “reporter” for both atomic
parts and for whole constructs is not a good practice. Instead, we are
suggesting the usage of other available terminology for the parts listed
as reporters, which most of the constructs, now known as reporters,
can be grouped into, such as “protein generators”, “composite parts”
or “inverters”.
Devices are whole constructs which are functional and have specific
and distinct functions. But, as we have observed, unfortunately, the
term “device” is also being used for parts which are not functional
and do not have specific functional at all. Moreover, within the classification
of devices, we argue that some terms are also being used unnecessarily
and ambiguously. Devices are classified into five types which are protein
generators, reporters, inverters, receivers and senders, measurement
devices. For example iGEM defines protein generators as:
Protein generator = promoter + rbs +gene + terminator
Though we accept the definition for protein generators, we observed
that there exist numerous parts which are defined as protein generators
but actually most of them do not fit to the definition provided above.
Although some parts are not functional and do not generate proteins
at all, they are classified as protein generators, which makes searching
for the parts difficult in the registry. Furthermore, there are also
numerous parts which are defined as “composite parts” but actually they
fit to the same definition with protein generators. In order to overcome
the problem of misuse of device type we have extracted related image
ID information for the composite parts. Image ID information helped
us to correctly categorize composite parts depending on its individual
atomic parts and identify the ones with more than one function, such
as being both inhibitor and activator. In other words, we used image
and part IDs in order to merge an input for its outputs.
Subtitle working conditions, includes all the detailed information
about the experimental properties of parts, and the details about the
working process of individual parts and complete devices. Additionally,
we marked the subtitle “Working Condition” in our standardization template
as potentially the most important title that helps synthetic biologist
to better understand the parts functions at iGEM’s part registry database.
The main problem we have encounter with the subtitle “working condition”
is within most of the parts the details about working process is not
enough and not provided regularly.
Examples of Misuse of Terminology:
For Composite Parts:
PartID: BBa_S04055
PartName: Synthetic lacYZ operon
<img src="" />
This part is functional and responsible for the production of LacY
and LacZ proteins. This part partially fits the definition for “composite
part” but actually should be a protein generator as it fits fully to
the definition of “protein generators”.
For Protein Generators:
PartID: BBa_J45299
PartName: PchA & PchB enzyme generator
<img src="" />
The part which is illustrated above actually fits the definition
for “composite part” but in part registry it is classified as protein
generator. This part can be functional but it needs a promoter. Even
though this part is not functional and is not capable of producing protein,
part registry assigns this product as protein generator. We suggest
that all parts in the registry, which are composed of more than one
atomic part and which are not functional on their own but can be functional,
should be classified as “composite parts”.
For Reporters:
PartID: BBa_J04451
PartName: RFP Coding Device with an LVA tag
<img src="" />
This functional part is classified as “Reporter” in the parts registry
database. It is very clear that this part fits the same description
as Protein Generator in Biobrick part registry standards. Although,
this part has specific and known functional role, characterizing this
part as a reporter is unnecessary and contributes to the level of complexity
of information provided. Instead, we suggest that this part should be
classified as “protein generator” and related detailed information about
the specific function of this part, should be provided in the part information
page.
In conclusion, as mentioned above we tried to reorganize and normalize
the information about parts which is provided in part registry for 2010
in order to develop our algorithm for the BioGuide application. During
this process, we encountered some inconsistencies and misuses of the
terminology being used and also inadequacies about the information provided
about parts. First of all, we claim that a standard nomenclature should
be constituted for future use in the field of synthetic biology. Based
on the information gathered according to new nomenclature a professional
database should be constructed to address the needs of synthetic biology.
This will enable easy information exchange and exhibition globally.
Secondly, although there are enough information about parts exists on
parts registry database, the information which is provided for parts
need to be ordered urgently. Furthermore, there should be new experimental
standards which must be introduced to groups in the part submission
process for the subtitle “working condition”. These experimental standards
will be important because the experimental details about parts are not
satisfying the needs of wet-lab biologists for the design and the construction
of new Biobricks.
</div>
</div>
</div>
Contact
Algorithm
In this section, the step by step functioning of our application,
along with the encapsulation of the algorithmic concepts of ‘standardization’
of functional iGEM devices are depicted in pictorial forms called flowcharts.
Rectangular boxes represent the encapsulation of implementations of
the computer programs to perform the particular tasks stated in that
box on the flowcharts. These boxes are sometimes called subprograms,
objects or packages in Object Oriented software Engineering context.
The diamonds represent decision branching and they are found between
two rectangular boxes. The arrows show the direction in which subprograms
work and communicate. The subprogram at the head of the arrow starts
executing after the termination of the subprogram at the tail of the
arrow. Following flowcharts are the high level representations of our
algorithms developed for the BioGuide software.
1
<img src="" />
Diagram 1. Flowchart of collection, formatting and storage
of devices data algorithm
Information about the iGEM parts had to be collected in a standardized
format for our application to function properly. Following data collection
custom subprograms is needed to parse and forward the data the application’s
database. In order to achieve this we have designed and implemented
the algorithm shown in diagram 1. In this algorithm, the first stage
was to find the list of part IDs of devices which were supplied by iGEM
in Spring 2010 distribution. This information has been collected from
two sources 1) plate files in excel format which was available online
2) device data provided in xml format, both provided by iGEM. The last
step in the algorithm was to send the collected partID data to the application’s
database.
2.
<img src="" />
Diagram 2. Flowchart for BioGuide execution before and during
user interaction
Diagram 2 presents the main algorithm, which shows how BioGuide application
works. In BioGuide the major components are device and Biobrick graphs.
While the device graph represents input-output (promoter-regulator)
compatibility combination of iGEM devices, the Biobrick graph represents
combinations of atomic parts assembled in a device or system. The flowchart
shows how these graphs are created and embedded into the program, which
displays both of the graphs to the user when launched. Application presents
few interactive options to the user when started, which were shown on
the flowchart under the horizontal, bolded line. As shown on the diagram
2, there are four interactive tasks BioGuide can do, where the device
and Biobricks graphs are utilized. Upon clicking a node on a devices
or Biobricks graph, that node changes in size and color and the various
functions shown on the flowchart can be performed then after.
Modeling
Graphical Modeling for Bio-Guide
Introduction
Graphical Modeling Theory has been applied to construct four different
graphs where relations of atomic parts, devices and systems and the
functional combinations that can build new constructs are presented
for the iGEMs parts registry database. Three graphs are composed of
iGEM devices and one graph is based on Biobricks. Each graph comprises
a set of vertices or nodes and a set of edges. In the set of nodes each
node represents a device, while in the set of edges each edge represents
the input-output combination of the nodes. These graphs are directed
graphs as the edges are created according to input-output combination.
All compatibilities between a regulator and a promoter of an edge is
created, where the source of this edge is the device with the corresponding
regulator and target of the edge is the device with the promoter in
concern.
<img src="" />
Fig. 1: A node representing a device
<img src="" />
Fig. 2: Arrow representing an edge between two nodes
The atomic structures used in our graphical model have been represented
in Figures 1 and 2. A node is represented with a solid circle where
the label, the part/device ID according to iGEM standards, of the device
is marked on the foreground. The blue arrows between nodes connect the
related devices, representing the input-output connectivity. End style
of the arrow helps us to determine the direction of the node, like in
Figure 2 where the node labeled BBa_S03520 is the source and BBa_JO9250
is the target.
Directivity
All the four constructed graphs build for BioGuide are directed graphs.
So that, for every edge there must be a single source and a target.
There is no single edge which is bidirectional. In mathematical form
this can be represented as:
If an edge e has node v as source and node w as target then the edge
can be expressed as
<img src="" />
For a directed graph the combination (v, w) is totally different
from (w, v). Therefore,
<img src="" />
The direction of the edges has been represented with the arrows,
as explained in Figure 2.
Connectivity
The nodes forming their own sub-graphs disconnected from the rest
of the nodes have been recognized, which showed us the presence of incompatibility
between few regulators and promoters of the devices. We have observed
this disconnection in all four of our graphs. The basis of the disconnection
has been shown in Figure 3, where the two sub-graphs without any edge
that connects them to the main graph has been presented on the right
hand side of the diagram. These features classify our graphs as disconnected
graphs [1].
<img src="" />
Fig. 3: A zoomed in screenshot showing two sub-graphs within
the disconnected graph.
"Semi-Simplicity"
A simple graph is a graph in which no more than one edge contains
the same set of nodes. So, in a simple graph it is not possible to find
more than one edge with the same source and the same target. Additionally,
an edge with the same source and target, forming a loop is not allowed.
But, in synthetic biology it is possible to construct a device consisting
of devices or bio bricks of the same species or type. Accordingly, our
graphs are simple graphs with an exception of possible self-containing
loops, where the edge starts from and ends on the same node. Our graphs
have an exception of having loops and due to this permitted flexibility
our graphs are "semi-simple".