Team:VT-ENSIMAG/Introduction

From 2010.igem.org

(Difference between revisions)
 
(One intermediate revision not shown)
Line 8: Line 8:
Gene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and  iGEM, possible.  
Gene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and  iGEM, possible.  
-
Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes,  they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated ([https://static.igem.org/mediawiki/2010/f/f2/918_flu_paper.pdf 1]). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for bio terrorists to harm the Public. According to a 2004 report by the U.S. National Intelligence Council, its greatest security concern over the coming years is that terrorists will acquire biological agents for use as weapons of mass destruction (NIC, 2004).
+
Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes,  they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated ([https://static.igem.org/mediawiki/2010/f/f2/918_flu_paper.pdf 1]). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for bio terrorists to harm the Public. According to the 2004 "Mapping the Global Future" report published by the U.S. National Intelligence Council, its greatest security concern over the coming years is that terrorists will acquire biological agents for use as weapons of mass destruction.
Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order.
Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order.
-
The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. As part of our iGEM 2010 project, we are implementing the draft Government guidance for sequence screening, characterizing its performance, and suggesting improvements.  
+
The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. Our 2010 iGEM project has focused on the development of GenoTHREAT, an effective sequence screening software. To the best of our knowledge, this software is the first implementation of the suggestions put forward in the draft Government guidance for sequence screening.
-
 
+
[[#top|top]]
[[#top|top]]
==Sequence Alignment: BLAST==
==Sequence Alignment: BLAST==
-
In order to make a sequence screening software, we have to perform many sequence alignment. It consists in comparing sequences in order to tell how near they are. The tool we used for that, is, as suggested in the federal guideline, BLAST: Basic Local Alignement Search Tool.
+
Any effective sequence screening software must be able to effectively assess the potential danger posed by a given gene or gene fragment. GenoTHREAT faces this task by performing a sequence alignment of the query gene with all sequences contained in NCBI-GenBank. The tool we used to align sequence, is, as suggested in the federal guidance, BLAST: Basic Local Alignement Search Tool.
Blast is a software available on the ncbi web site ([http://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast website]). BLAST performed local alignments on a query sequence against the Genbank database. Given an input sequence, BLAST give us a list of the most similar known sequences, with statistical scores to measure how near are the matching sequences.  
Blast is a software available on the ncbi web site ([http://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast website]). BLAST performed local alignments on a query sequence against the Genbank database. Given an input sequence, BLAST give us a list of the most similar known sequences, with statistical scores to measure how near are the matching sequences.  
 +
Further information related to BLAST can be found in the following sources:
 +
-http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html
 +
-http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html
 +
-http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs
[[#top|top]]
[[#top|top]]
Line 26: Line 29:
[[Image:IASB.png|left|150px]]
[[Image:IASB.png|left|150px]]
[[Image:IGSC.png|right|100px]]
[[Image:IGSC.png|right|100px]]
-
The sequence screening protocol is nowadays far from harmonized among gene synthesis companies. It was left to them to decide what should be done to secure their customer’s orders. In order to control it, two consortium, the IASB ([http://www.ia-sb.eu/go/synthetic-biology/ International Association Synthetic Biology]) and IGSC ([http://www.genesynthesisconsortium.org/Gene_Synthesis_Consortium/Home.html International Gene Synthesis Consortium]) had delivered their own standards to be followed ([http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf 1] and [http://www.genesynthesisconsortium.org/Harmonized_Screening_Protocol_files/IGSC%20Harmonized%20Screening%20Protocol.pdf 2]).
+
A standardized screening protocol applied universally across the industry has yet to be developed. So far, most of the decisions concerning how sequences should be screened is left up to individual companies. That said, two consortia, the IASB ([http://www.ia-sb.eu/go/synthetic-biology/ International Association Synthetic Biology]) and IGSC ([http://www.genesynthesisconsortium.org/Gene_Synthesis_Consortium/Home.html International Gene Synthesis Consortium]) have developed their own screening standards to be followed by member organizations ([http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf 1] and [http://www.genesynthesisconsortium.org/Harmonized_Screening_Protocol_files/IGSC%20Harmonized%20Screening%20Protocol.pdf 2]). But these guidelines leave too many questions unanswered. Between the two protocols, these questions include\, but are not limited to: should both amino acid and nucleotide sequences be screened? What bioinformatics tool should be used to screen sequences? If results from the screening process are automatically screened to identify those dangerous sequences well aligned to the query sequence, how should the automation be designed? If a query is compared tto a regulated database of dangerous sequences, which sequences should be included in said database? What metrics are used to assess whether a query sequence is closely related enough to a gene derived from, or encoding for, a regulated biological pathogen to constitute a potential danger?
-
But these guidelines leave too many questions unanswered.
+
 
 +
In order to harmonize screening across the industry, the United States Government has published a draft version of a "Screening Framework Guidance for Synthetic Double-Stranded DNA Providers" ([http://www.gpo.gov/fdsys/pkg/FR-2009-11-27/pdf/E9-28328.pdf 3]). This guidance first details the steps to be followed in order to verify customer identity. Then, the guidance advises companies to implement an automated sequence screening method in order to save time and reduce the risk of an human error. The guideline gives a general algorithm to be followed. The main points of the guidance are as follows:
 +
-Screen both nucleotide and resulting translated amino acid sequences
 +
-Segment sequences into 200bp/66aa fragments
 +
-Align sequences against entire GenBank using BLAST
 +
-Use the Best Match Method on results of alignment to identify if the sequence is uniquely encoding for or derived from a Select Agent or Toxin (selectagents.gov)
 +
 
-
In order to harmonize everything, the American government has published a draft version of a "Screening Framework Guidance for Synthetic Double-Stranded DNA Providers" ([http://www.gpo.gov/fdsys/pkg/FR-2009-11-27/pdf/E9-28328.pdf 3]). This guideline first detailed steps to be done the verification of the customer identity. Then, the guideline advices companies to implement an automatized version of the sequence screening in order to save time and reduce the risk of an human error. The guideline gives a general algorithm to be followed. The main points of it are to divide every sequences in 200bp subsequences and to look for any dangerous sequence of length greater or equal to 200bp, to screen both the nucleotide and amino-acid sequences obtained with the six-frame translation, to use BLAST to compare sequences, and finally to use a Best Match method to determine if a sequence is unique to a select agent or not. However, other points were obscures, as the definition of the best match sequences, the use of BLAST for global alignment...
+
However, other points were unclear such as the definition of the best match sequences and the use of BLAST for global alignment.
[[#top|top]]
[[#top|top]]
Line 45: Line 54:
=Conclusion=
=Conclusion=
-
We have succeed in implementing a functioning sequence screening software. The main issue raised is the cost of doing a good software, rapid and efficient. We have also shown the influence of the software parameters as the keyword list or the BLAST parameters.
+
We have succeeded in developing a prototype sequence screening software which implements the suggestions put forward in the Government guidance. The main issue raised by a characterization is the influence of the software parameters as the keyword list or the BLAST parameters.
-
The guideline we have followed was not precise enough and left a great place to interpretation. A new version must be edited in the next years. But we have showed that in following it, the result was an efficient sequence screening software, but expensive, and that must raised to many false hits.
+
The guideline we have followed was not precise enough and left several more subtle aspects of the algorithm up to interpretation. A newer version of the software will need to be developed in the coming years as the gene synthesis industry and its regulation evolve. What we have shown is that the guidance provides a framework for the development of an efficient sequence screening software which could be viable tool for both the iGEM registry and the gene synthesis industry at large.  
[[#top|top]]
[[#top|top]]
}}
}}

Latest revision as of 04:07, 19 October 2010


VT-ENSIMAG over VT campus long.png

Sequence screening




DNAside.png

Home

Our team

Sequence screening

The software: GenoTHREAT

Tests and Results

Screening of the iGEM registry

PCR fusion primer

Lab notebook

Safety

Media Links

Comments

SAIC.jpeg

Mitre.jpeg


Contents


Sequence screening: Why and How?

Introduction

Gene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and iGEM, possible.

Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes, they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated (1). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for bio terrorists to harm the Public. According to the 2004 "Mapping the Global Future" report published by the U.S. National Intelligence Council, its greatest security concern over the coming years is that terrorists will acquire biological agents for use as weapons of mass destruction.

Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order.

The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. Our 2010 iGEM project has focused on the development of GenoTHREAT, an effective sequence screening software. To the best of our knowledge, this software is the first implementation of the suggestions put forward in the draft Government guidance for sequence screening. top

Sequence Alignment: BLAST

Any effective sequence screening software must be able to effectively assess the potential danger posed by a given gene or gene fragment. GenoTHREAT faces this task by performing a sequence alignment of the query gene with all sequences contained in NCBI-GenBank. The tool we used to align sequence, is, as suggested in the federal guidance, BLAST: Basic Local Alignement Search Tool.

Blast is a software available on the ncbi web site ([http://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast website]). BLAST performed local alignments on a query sequence against the Genbank database. Given an input sequence, BLAST give us a list of the most similar known sequences, with statistical scores to measure how near are the matching sequences. Further information related to BLAST can be found in the following sources: -http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html -http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html -http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs

top

Current screening state

IASB.png
IGSC.png

A standardized screening protocol applied universally across the industry has yet to be developed. So far, most of the decisions concerning how sequences should be screened is left up to individual companies. That said, two consortia, the IASB ([http://www.ia-sb.eu/go/synthetic-biology/ International Association Synthetic Biology]) and IGSC ([http://www.genesynthesisconsortium.org/Gene_Synthesis_Consortium/Home.html International Gene Synthesis Consortium]) have developed their own screening standards to be followed by member organizations ([http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf 1] and [http://www.genesynthesisconsortium.org/Harmonized_Screening_Protocol_files/IGSC%20Harmonized%20Screening%20Protocol.pdf 2]). But these guidelines leave too many questions unanswered. Between the two protocols, these questions include\, but are not limited to: should both amino acid and nucleotide sequences be screened? What bioinformatics tool should be used to screen sequences? If results from the screening process are automatically screened to identify those dangerous sequences well aligned to the query sequence, how should the automation be designed? If a query is compared tto a regulated database of dangerous sequences, which sequences should be included in said database? What metrics are used to assess whether a query sequence is closely related enough to a gene derived from, or encoding for, a regulated biological pathogen to constitute a potential danger?

In order to harmonize screening across the industry, the United States Government has published a draft version of a "Screening Framework Guidance for Synthetic Double-Stranded DNA Providers" ([http://www.gpo.gov/fdsys/pkg/FR-2009-11-27/pdf/E9-28328.pdf 3]). This guidance first details the steps to be followed in order to verify customer identity. Then, the guidance advises companies to implement an automated sequence screening method in order to save time and reduce the risk of an human error. The guideline gives a general algorithm to be followed. The main points of the guidance are as follows: -Screen both nucleotide and resulting translated amino acid sequences -Segment sequences into 200bp/66aa fragments -Align sequences against entire GenBank using BLAST -Use the Best Match Method on results of alignment to identify if the sequence is uniquely encoding for or derived from a Select Agent or Toxin (selectagents.gov)


However, other points were unclear such as the definition of the best match sequences and the use of BLAST for global alignment.

top

GenoTHREAT, our sequence screening software

The software we have implemented is called GenoTHREAT. Given a DNA sequence, GenoTHREAT indicates if this sequence may be one of concern or not. The algorithm and the implementation of this software are detailed in Genothreat.

top

Tests and Results

In order to characterize the government guideline and the software we created, we have implemented and executed different tests. See Tests and Results.

top

Conclusion

We have succeeded in developing a prototype sequence screening software which implements the suggestions put forward in the Government guidance. The main issue raised by a characterization is the influence of the software parameters as the keyword list or the BLAST parameters. The guideline we have followed was not precise enough and left several more subtle aspects of the algorithm up to interpretation. A newer version of the software will need to be developed in the coming years as the gene synthesis industry and its regulation evolve. What we have shown is that the guidance provides a framework for the development of an efficient sequence screening software which could be viable tool for both the iGEM registry and the gene synthesis industry at large.

top

ALIEN DNA.png
VT-ENSIMAG logo.png
ALIEN DNA.png