Team:VT-ENSIMAG/Introduction
From 2010.igem.org
(8 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
{{Template:Team:VT_Ensimag_2010-Biosecurity/Templates/main|Sequence screening| content= | {{Template:Team:VT_Ensimag_2010-Biosecurity/Templates/main|Sequence screening| content= | ||
- | =Sequence | + | __TOC__ |
+ | |||
+ | =Sequence screening: Why and How?= | ||
==Introduction== | ==Introduction== | ||
- | Gene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and iGEM, possible. | + | Gene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and iGEM, possible. |
- | Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes, they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated ([https://static.igem.org/mediawiki/2010/f/f2/918_flu_paper.pdf 1]). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for | + | Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes, they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated ([https://static.igem.org/mediawiki/2010/f/f2/918_flu_paper.pdf 1]). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for bio terrorists to harm the Public. According to the 2004 "Mapping the Global Future" report published by the U.S. National Intelligence Council, its greatest security concern over the coming years is that terrorists will acquire biological agents for use as weapons of mass destruction. |
Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order. | Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order. | ||
- | The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. | + | The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. Our 2010 iGEM project has focused on the development of GenoTHREAT, an effective sequence screening software. To the best of our knowledge, this software is the first implementation of the suggestions put forward in the draft Government guidance for sequence screening. |
+ | [[#top|top]] | ||
- | ==Sequence Alignment== | + | ==Sequence Alignment: BLAST== |
- | + | Any effective sequence screening software must be able to effectively assess the potential danger posed by a given gene or gene fragment. GenoTHREAT faces this task by performing a sequence alignment of the query gene with all sequences contained in NCBI-GenBank. The tool we used to align sequence, is, as suggested in the federal guidance, BLAST: Basic Local Alignement Search Tool. | |
+ | |||
+ | Blast is a software available on the ncbi web site ([http://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast website]). BLAST performed local alignments on a query sequence against the Genbank database. Given an input sequence, BLAST give us a list of the most similar known sequences, with statistical scores to measure how near are the matching sequences. | ||
+ | Further information related to BLAST can be found in the following sources: | ||
+ | -http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html | ||
+ | -http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html | ||
+ | -http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs | ||
+ | |||
+ | [[#top|top]] | ||
==Current screening state== | ==Current screening state== | ||
- | + | [[Image:IASB.png|left|150px]] | |
- | [ | + | [[Image:IGSC.png|right|100px]] |
+ | A standardized screening protocol applied universally across the industry has yet to be developed. So far, most of the decisions concerning how sequences should be screened is left up to individual companies. That said, two consortia, the IASB ([http://www.ia-sb.eu/go/synthetic-biology/ International Association Synthetic Biology]) and IGSC ([http://www.genesynthesisconsortium.org/Gene_Synthesis_Consortium/Home.html International Gene Synthesis Consortium]) have developed their own screening standards to be followed by member organizations ([http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf 1] and [http://www.genesynthesisconsortium.org/Harmonized_Screening_Protocol_files/IGSC%20Harmonized%20Screening%20Protocol.pdf 2]). But these guidelines leave too many questions unanswered. Between the two protocols, these questions include\, but are not limited to: should both amino acid and nucleotide sequences be screened? What bioinformatics tool should be used to screen sequences? If results from the screening process are automatically screened to identify those dangerous sequences well aligned to the query sequence, how should the automation be designed? If a query is compared tto a regulated database of dangerous sequences, which sequences should be included in said database? What metrics are used to assess whether a query sequence is closely related enough to a gene derived from, or encoding for, a regulated biological pathogen to constitute a potential danger? | ||
+ | |||
+ | In order to harmonize screening across the industry, the United States Government has published a draft version of a "Screening Framework Guidance for Synthetic Double-Stranded DNA Providers" ([http://www.gpo.gov/fdsys/pkg/FR-2009-11-27/pdf/E9-28328.pdf 3]). This guidance first details the steps to be followed in order to verify customer identity. Then, the guidance advises companies to implement an automated sequence screening method in order to save time and reduce the risk of an human error. The guideline gives a general algorithm to be followed. The main points of the guidance are as follows: | ||
+ | -Screen both nucleotide and resulting translated amino acid sequences | ||
+ | -Segment sequences into 200bp/66aa fragments | ||
+ | -Align sequences against entire GenBank using BLAST | ||
+ | -Use the Best Match Method on results of alignment to identify if the sequence is uniquely encoding for or derived from a Select Agent or Toxin (selectagents.gov) | ||
+ | |||
+ | |||
+ | However, other points were unclear such as the definition of the best match sequences and the use of BLAST for global alignment. | ||
+ | |||
+ | [[#top|top]] | ||
==GenoTHREAT, our sequence screening software== | ==GenoTHREAT, our sequence screening software== | ||
- | The software we have implemented is called GenoTHREAT. Given a DNA sequence, GenoTHREAT indicates if this sequence may be one of concern or not. The algorithm and the implementation of this software are | + | The software we have implemented is called GenoTHREAT. Given a DNA sequence, GenoTHREAT indicates if this sequence may be one of concern or not. The algorithm and the implementation of this software are detailed in [[Team:VT-ENSIMAG/Genothreat| Genothreat]]. |
+ | |||
+ | [[#top|top]] | ||
==Tests and Results== | ==Tests and Results== | ||
Line 26: | Line 51: | ||
See [[Team:VT-ENSIMAG/Result|Tests and Results]]. | See [[Team:VT-ENSIMAG/Result|Tests and Results]]. | ||
- | + | [[#top|top]] | |
+ | =Conclusion= | ||
+ | We have succeeded in developing a prototype sequence screening software which implements the suggestions put forward in the Government guidance. The main issue raised by a characterization is the influence of the software parameters as the keyword list or the BLAST parameters. | ||
+ | The guideline we have followed was not precise enough and left several more subtle aspects of the algorithm up to interpretation. A newer version of the software will need to be developed in the coming years as the gene synthesis industry and its regulation evolve. What we have shown is that the guidance provides a framework for the development of an efficient sequence screening software which could be viable tool for both the iGEM registry and the gene synthesis industry at large. | ||
+ | [[#top|top]] | ||
}} | }} |
Latest revision as of 04:07, 19 October 2010
Sequence screening
|
Sequence screening: Why and How?IntroductionGene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and iGEM, possible. Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes, they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated (1). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for bio terrorists to harm the Public. According to the 2004 "Mapping the Global Future" report published by the U.S. National Intelligence Council, its greatest security concern over the coming years is that terrorists will acquire biological agents for use as weapons of mass destruction. Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order. The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. Our 2010 iGEM project has focused on the development of GenoTHREAT, an effective sequence screening software. To the best of our knowledge, this software is the first implementation of the suggestions put forward in the draft Government guidance for sequence screening. top Sequence Alignment: BLASTAny effective sequence screening software must be able to effectively assess the potential danger posed by a given gene or gene fragment. GenoTHREAT faces this task by performing a sequence alignment of the query gene with all sequences contained in NCBI-GenBank. The tool we used to align sequence, is, as suggested in the federal guidance, BLAST: Basic Local Alignement Search Tool. Blast is a software available on the ncbi web site ([http://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast website]). BLAST performed local alignments on a query sequence against the Genbank database. Given an input sequence, BLAST give us a list of the most similar known sequences, with statistical scores to measure how near are the matching sequences. Further information related to BLAST can be found in the following sources: -http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html -http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html -http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs Current screening stateA standardized screening protocol applied universally across the industry has yet to be developed. So far, most of the decisions concerning how sequences should be screened is left up to individual companies. That said, two consortia, the IASB ([http://www.ia-sb.eu/go/synthetic-biology/ International Association Synthetic Biology]) and IGSC ([http://www.genesynthesisconsortium.org/Gene_Synthesis_Consortium/Home.html International Gene Synthesis Consortium]) have developed their own screening standards to be followed by member organizations ([http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf 1] and [http://www.genesynthesisconsortium.org/Harmonized_Screening_Protocol_files/IGSC%20Harmonized%20Screening%20Protocol.pdf 2]). But these guidelines leave too many questions unanswered. Between the two protocols, these questions include\, but are not limited to: should both amino acid and nucleotide sequences be screened? What bioinformatics tool should be used to screen sequences? If results from the screening process are automatically screened to identify those dangerous sequences well aligned to the query sequence, how should the automation be designed? If a query is compared tto a regulated database of dangerous sequences, which sequences should be included in said database? What metrics are used to assess whether a query sequence is closely related enough to a gene derived from, or encoding for, a regulated biological pathogen to constitute a potential danger? In order to harmonize screening across the industry, the United States Government has published a draft version of a "Screening Framework Guidance for Synthetic Double-Stranded DNA Providers" ([http://www.gpo.gov/fdsys/pkg/FR-2009-11-27/pdf/E9-28328.pdf 3]). This guidance first details the steps to be followed in order to verify customer identity. Then, the guidance advises companies to implement an automated sequence screening method in order to save time and reduce the risk of an human error. The guideline gives a general algorithm to be followed. The main points of the guidance are as follows: -Screen both nucleotide and resulting translated amino acid sequences -Segment sequences into 200bp/66aa fragments -Align sequences against entire GenBank using BLAST -Use the Best Match Method on results of alignment to identify if the sequence is uniquely encoding for or derived from a Select Agent or Toxin (selectagents.gov)
GenoTHREAT, our sequence screening softwareThe software we have implemented is called GenoTHREAT. Given a DNA sequence, GenoTHREAT indicates if this sequence may be one of concern or not. The algorithm and the implementation of this software are detailed in Genothreat. Tests and ResultsIn order to characterize the government guideline and the software we created, we have implemented and executed different tests. See Tests and Results. ConclusionWe have succeeded in developing a prototype sequence screening software which implements the suggestions put forward in the Government guidance. The main issue raised by a characterization is the influence of the software parameters as the keyword list or the BLAST parameters. The guideline we have followed was not precise enough and left several more subtle aspects of the algorithm up to interpretation. A newer version of the software will need to be developed in the coming years as the gene synthesis industry and its regulation evolve. What we have shown is that the guidance provides a framework for the development of an efficient sequence screening software which could be viable tool for both the iGEM registry and the gene synthesis industry at large. |