Database Standardization
Two main focuses of our project was the organization of the available
information about Biobricks on iGEM’s website and development of a software
application to help synthetic biologists at the experimental set-up
level by providing all available construct combinations for any given
input and output relations ,which they can utilize for their own project.
Normalization and re-organization of the part information at iGEM’s
web site was needed in order to develop our application, which will
automatically search the possible construct combinations. For the organization
and analysis of the Biobricks, we used part info for Spring 2010 distribution.
The information on all three 384 well plates distributed by iGEM scrutinized
and checked individually to specify the standards available and needed.
iGEM is providing so many parts within a hierarchical way, but there
is no order in the information flow and no common standards. Furthermore,
the information bulk is being used in an ineffective manner. Some of
the parts distributed are known to be nonfunctional. Web pages for parts
contain lots of information, but majority of them, are again not ordered.
Moreover, some additional information had to be removed or replaced
in such a way that the information for parts can be used effectively.
And removal of the redundant bulk information related with parts at
iGEM’s web site had been recommended for future.
Although, the final standardization, which we have suggested is not
for general public use and it was urgently needed in order to satisfy
the needs of our algorithm. But, still it will be a valuable resource,
since it summarizes the basic information about the parts.
As the first step to build the proposed standardization template,
the headings selected related to parts are listed on Table 1. Submission
of part IDs for individual parts is an accepted and quite valuable way
of tracking information. Although, every part has unique partID, for
every part there is a need to assign unique part names as official iGEM
names. Part names will have an important role as they will be providing
the short description about the part, which synthetic biologists can
immediately recognize and utilize during the construction of unique
Biobricks. Additionally unique part names will be helpful to identify
the devices with more than one Biobrick in their constructs. Assignment
of unique and distinct names for parts describing their nature and content
will be helpful to researchers for the recognition of and search for
the parts.
Headings Selected From Previous Entry Forms for Indication of Standardized
Information
=========================================
PartID:
PartName:
Bricks:
BrickIDs:
ImageIDs:
RFC10:
RFC21:
RFC23:
RFC25:
=========================================
Table 1: The table above basically describes and designates
qualities of parts which identifies their compositions and demonstrates
the status of previously assigned standards. PartID refers to the unique
ID number for parts including atomic parts and assemblies. PartName
refers to the given unique names to parts. Bricks, refers to the shortcut
names which specifies atomic parts. ImageIDs, refers to individual or
combination of numbers that are assigned by us. RFCs refers to the states
of parts based on RFC standards.
iGEM both provides individual, atomic parts and pre-combined constructs
such as devices and systems. Availability of combined constructs is
important to the researchers as combining individual bio-bricks one
at a time will be very time consuming. These previously merged constructs,
serve as the repository for puzzle and they can be used for different
purposes. Up to date the largest and most trustworthy source, for synthetic
biology and its components, is iGEM’s parts registry. In 2010, iGEM
provided over 1000 parts that have initiated many projects. Having more
atomic parts available in the iGEM’s repository, will lead to the design
of more complex and robust constructs, and we would have a better chance
to design different constructs for unique purposes. Also, for the parts
that are already available, extra steps needs to be taken for the quality
control and surveillance of these products. The quality control of the
information for the parts is essential for the future of iGEM and synthetic
biology. Even though we have found pre-determined RFC standards useful
and included those to our standardized template, some individual parts
still requires re-organization of the information as RFC standards alone
for the functionality of parts, does not satisfy the needs for wet lab
biologists.
Without a question there is an urgent need to build a distinct and
specific database well organized with its own standards for synthetic
biology; however, development of such a database is not an easy task.
Contact Information of Part Owners and Qualitative Group Comments
about Parts
=========================================
Designers: Mail:
GroupFavorite:
StarRating:
Parameters:
=========================================
Table 2: The above table simply depicts information about possessors
of parts and their contact information and the popularity of the parts
for groups. Parameters heading, refers distinctive experimental details
unique to the usage of parts which should be decided by groups.
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Second step for building the standardized template was to get the
phylogenic information about the parts development process which includes
the name of the group, designer and contact information, along with
the comments from the group on the parts they have submitted. Contact
information is especially important for iGEM as other groups who need
extra information about the available part can reach to the required
information. Even though contacting with the designers of the individual
parts which are available is highly encouraged by iGEM, unavailability
of contact information points at out the fact that iGEM’s parts registry
needs strong re-organization in order to serve to the synthetic biology
community properly.
Additionally, the “group favorite” and “starRating” fields are also
important for individual evaluation of the parts, which doesn’t get
the deserved attention from the iGEM groups. “Group Favorite” defines
the confidence on the part by the designer group. “StarRating” defines
the related part in terms of popularity and usage efficiency among the
groups. According to our observations, most groups are not aware of
either of the fields or they are used incorrectly or ineffectively.
For example for a part with a full reporter which is known to be functional
and gives precise and expected results the StarRating should be at least
2 stars, but for most of the parts in 2010 distribution, it is very
difficult to observe a part whose “StarRating” is above one. For quick
determination of functionality of the parts these two evaluations are
important so they have been included in the proposed standardization
template. But, as they were not properly used up to now for the re-organization
of the parts information during the development of our software application
we had to include all parts to our queries regardless of their evaluations
based on “Group Favorites” and “ StarRatings”
Input and Output Characteristics of Parts
=========================================
Parameters:
-Input:
• Promoter:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Promoter2:
• Activity:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Output:
• Reporter:
• Reporter2:
• Regulator:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
• Regulator2:
• Inducer:
• Activator:
• Repressor:
• Inhibitor:
-Working Condition:
=========================================
Table 3: The table above elaborately describes the input relations
based on promoters and the output products based on the functional genes
and RNAs which are included within the parts. Working condition simply
describes any influencing factor or circumstance which is directly related
with the functional properties of parts.
Third part of our standardization template includes parameters of
contingent input and output elements. These parameters are classified
into two groups for simplicity as presented on Table 3. This final part
of the standardization template includes the upmost important information
about the Biobricks that are required for the BioGuide Software to run
its searching algorithm.
Briefly, BioGuide application is designed to catch the input and
output relations of individual parts to examine possible Biobricks pathways
for specific input and output queries. In other words, at pre-experimental
stage, it helps wet lab biologists to design their unique constructs
by revealing possible alternative options for pre-determined purposes,
along with the primary paths. Our ultimate goal is to improve the algorithm
designed for iGEM 2010 and present a new version of the BioGuide in
iGEM 2011, which will provide optimum design of constructs for predetermined
parameters.
Most of the parts are composed of functional and nonfunctional constructs
which are formed by atomic parts. And every part should carry the information
for all of its atomic parts within itself. The “input” heading actually
stands for promoters. Parts with one or more promoters can be found
at iGEM’s Parts Registry. Along with the information on which and how
many promoters a part might have, the activity level of promoters are
also important to distinguish between a constitutively active promoter
or a promoter activated by specific physiological processes or states
etc. This information was crucial for us to dissect in order to run
our algorithm as it directly affects which inputs can activate the devices
or the systems.
Throughout our investigations on the Parts Registry, we found out
that much of the terminology was being used ambiguously. Although this
might not be vital for synthetic biologists, it is still endeavoring
to understand the function of certain regulatory elements which also
becomes a time consuming task for the researcher. Thus, we recommend
that the explanations of certain regulatory elements should be redefined
and fixed especially for synthetic biology for easy communication, sharing
and searching of information.
Common misuses of the terminology can guide us to figure out how
to construct a standard nomenclature for synthetic biology. We claim
that a standard nomenclature is urgently needed for synthetic biology
for the following reasons. First of all, synthetic biology is an emerging
research discipline and an industrial application area which is highly
promising. Secondly, redefinition of the terminology to build a standard
nomenclature is needed as some of the terms are prone to be used instead
of another causing problems related to misuse for the global communication
about synthetic biology. Lastly, the nomenclature has major importance
for the construction of a persistent and trustworthy database for synthetic
biology which serves for the information exhibition and exchange globally.
For instance, there are obvious misunderstandings about the words which
are predominantly used for regulation process. We have noticed that,
the terms “inhibitor” and “repressor” are being used as equivocally
in the part information pages. Like the lactose inhibitor protein, a
widely used DNA-binding transcriptional repressor, that have been labeled
both as “inhibitor” and “repressor” at iGEM’s Parts Registry. Similar
problems resulting from ambiguous use of terminology also observed with
regulatory elements. To sum up, we investigated all input elements for
promoters and classify these elements in terms of their function, affect
and required input element for them. So, we suggest that terminology
used for regulation of transcription should be defined clearly on iGEM’s
website and correct use of terminology should be enforced.
The second group of parameters was collected under the title “Output”,
which refers to products of functional genes. In contradiction, the
term “reporter” has also been described within the same list. Reporters
are also genes whose products, can be used for screening as an output.
According to our group, the usage of the term “reporter” for genes is
unnecessary and cause extra complexity for information distribution
and gives rise to discrepancies. Instead of using the term “reporter”,
predefined “gene” description should be used for genes, which can function
as reporters. The special information which is related with the characteristic
of that gene should also be presented on part info web page.
Furthermore, the same terminology “reporter” was used for both atomic
parts and composite bio-bricks. Also the overall image descriptions
for these were defined as “reporters”. We want to point out that using
same nomenclature for both atomic genes and for whole functional constructs
contributes to the complexity and makes specific explorations difficult
through the Parts Registry. So, assigning “reporter” for both atomic
parts and for whole constructs is not a good practice. Instead, we are
suggesting the usage of other available terminology for the parts listed
as reporters, which most of the constructs, now known as reporters,
can be grouped into, such as “protein generators”, “composite parts”
or “inverters”.
Devices are whole constructs which are functional and have specific
and distinct functions. But, as we have observed, unfortunately, the
term “device” is also being used for parts which are not functional
and do not have specific functional at all. Moreover, within the classification
of devices, we argue that some terms are also being used unnecessarily
and ambiguously. Devices are classified into five types which are protein
generators, reporters, inverters, receivers and senders, measurement
devices. For example iGEM defines protein generators as:
Protein generator = promoter + rbs +gene + terminator
Though we accept the definition for protein generators, we observed
that there exist numerous parts which are defined as protein generators
but actually most of them do not fit to the definition provided above.
Although some parts are not functional and do not generate proteins
at all, they are classified as protein generators, which makes searching
for the parts difficult in the registry. Furthermore, there are also
numerous parts which are defined as “composite parts” but actually they
fit to the same definition with protein generators. In order to overcome
the problem of misuse of device type we have extracted related image
ID information for the composite parts. Image ID information helped
us to correctly categorize composite parts depending on its individual
atomic parts and identify the ones with more than one function, such
as being both inhibitor and activator. In other words, we used image
and part IDs in order to merge an input for its outputs.
Subtitle working conditions, includes all the detailed information
about the experimental properties of parts, and the details about the
working process of individual parts and complete devices. Additionally,
we marked the subtitle “Working Condition” in our standardization template
as potentially the most important title that helps synthetic biologist
to better understand the parts functions at iGEM’s part registry database.
The main problem we have encounter with the subtitle “working condition”
is within most of the parts the details about working process is not
enough and not provided regularly.
Examples of Misuse of Terminology:
For Composite Parts:
PartID: BBa_S04055
PartName: Synthetic lacYZ operon
This part is functional and responsible for the production of LacY
and LacZ proteins. This part partially fits the definition for “composite
part” but actually should be a protein generator as it fits fully to
the definition of “protein generators”.
For Protein Generators:
PartID: BBa_J45299
PartName: PchA & PchB enzyme generator
The part which is illustrated above actually fits the definition
for “composite part” but in part registry it is classified as protein
generator. This part can be functional but it needs a promoter. Even
though this part is not functional and is not capable of producing protein,
part registry assigns this product as protein generator. We suggest
that all parts in the registry, which are composed of more than one
atomic part and which are not functional on their own but can be functional,
should be classified as “composite parts”.
For Reporters:
PartID: BBa_J04451
PartName: RFP Coding Device with an LVA tag
This functional part is classified as “Reporter” in the parts registry
database. It is very clear that this part fits the same description
as Protein Generator in Biobrick part registry standards. Although,
this part has specific and known functional role, characterizing this
part as a reporter is unnecessary and contributes to the level of complexity
of information provided. Instead, we suggest that this part should be
classified as “protein generator” and related detailed information about
the specific function of this part, should be provided in the part information
page.
In conclusion, as mentioned above we tried to reorganize and normalize
the information about parts which is provided in part registry for 2010
in order to develop our algorithm for the BioGuide application. During
this process, we encountered some inconsistencies and misuses of the
terminology being used and also inadequacies about the information provided
about parts. First of all, we claim that a standard nomenclature should
be constituted for future use in the field of synthetic biology. Based
on the information gathered according to new nomenclature a professional
database should be constructed to address the needs of synthetic biology.
This will enable easy information exchange and exhibition globally.
Secondly, although there are enough information about parts exists on
parts registry database, the information which is provided for parts
need to be ordered urgently. Furthermore, there should be new experimental
standards which must be introduced to groups in the part submission
process for the subtitle “working condition”. These experimental standards
will be important because the experimental details about parts are not
satisfying the needs of wet-lab biologists for the design and the construction
of new Biobricks.