Team:VT-ENSIMAG/Introduction
From 2010.igem.org
Sequence screening
|
Sequence screening: Why and How?IntroductionGene synthesis technology gives scientists an unparalleled capability to manipulate genomes. Over the past several decades, an entire commercial industry has developed to inexpensively produce genes on a large scale. It is this industry which provides the manufactured genes and standardized parts to make synthetic biology, and iGEM, possible. Synthetic genomics, like synthetic biology, has the potential to act as both a great benefit and a great detriment to public health and national security. A precedence for the dual use of synthetic genomics is the reconstruction of the virus responsible for the pandemic 1918 Spanish Flu in 2005 by researchers at the CDC ([http://en.wikipedia.org/wiki/1918_flu_pandemic See more]). This highly infectious strain is estimated to have killed as many as 50,000,000 people worldwide. Although the 1918 Spanish flu genes were synthesized for legitimate research purposes, they could have just as easily been used to reconstruct a biological weapon. It should be noted that the reconstructed strain was partially attenuated (1). This, however, does not preclude the possibility of more virulent forms being engineered in the future. Although such engineering is difficult at the moment, advances in this technology over the next decade could make it easier for bio terrorists to harm the Public. According to the 2004 "Mapping the Global Future" report published by the U.S. National Intelligence Council, its greatest security concern over the coming years is that terrorists will acquire biological agents for use as weapons of mass destruction. Many nucleotide sequences encoding for or derived from dangerous toxins or pathogens can be freely accessed on the U.S. National Center for Biotechnology Information GenBank (NCBI-GenBank). The ease with which dangerous sequences can be located and synthesized presents novel threats to both public health and national security. To prevent illicit activities by end users of de novo synthesized genes, it is crucial to stop their manufacture at the source: gene synthesis companies. Therefore, effective and efficient screening measures must be developed to identify sequences of concern within a synthesis order. The United States government recognizes its responsibility to protect the public and in November, 2009, published a draft guidance for sequence screening. Our 2010 iGEM project has focused on the development of GenoTHREAT, an effective sequence screening software. To the best of our knowledge, this software is the first implementation of the suggestions put forward in the draft Government guidance for sequence screening. top Sequence Alignment: BLASTAny effective sequence screening software must be able to effectively assess the potential danger posed by a given gene or gene fragment. GenoTHREAT faces this task by performing a sequence alignment of the query gene with all sequences contained in NCBI-GenBank. The tool we used to align sequence, is, as suggested in the federal guidance, BLAST: Basic Local Alignement Search Tool. Blast is a software available on the ncbi web site ([http://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast website]). BLAST performed local alignments on a query sequence against the Genbank database. Given an input sequence, BLAST give us a list of the most similar known sequences, with statistical scores to measure how near are the matching sequences. Current screening stateThe sequence screening protocol is nowadays far from harmonized among gene synthesis companies. It was left to them to decide what should be done to secure their customer’s orders. In order to control it, two consortium, the IASB ([http://www.ia-sb.eu/go/synthetic-biology/ International Association Synthetic Biology]) and IGSC ([http://www.genesynthesisconsortium.org/Gene_Synthesis_Consortium/Home.html International Gene Synthesis Consortium]) had delivered their own standards to be followed ([http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf 1] and [http://www.genesynthesisconsortium.org/Harmonized_Screening_Protocol_files/IGSC%20Harmonized%20Screening%20Protocol.pdf 2]). But these guidelines leave too many questions unanswered. In order to harmonize everything, the American government has published a draft version of a "Screening Framework Guidance for Synthetic Double-Stranded DNA Providers" ([http://www.gpo.gov/fdsys/pkg/FR-2009-11-27/pdf/E9-28328.pdf 3]). This guideline first detailed steps to be done the verification of the customer identity. Then, the guideline advices companies to implement an automatized version of the sequence screening in order to save time and reduce the risk of an human error. The guideline gives a general algorithm to be followed. The main points of it are to divide every sequences in 200bp subsequences and to look for any dangerous sequence of length greater or equal to 200bp, to screen both the nucleotide and amino-acid sequences obtained with the six-frame translation, to use BLAST to compare sequences, and finally to use a Best Match method to determine if a sequence is unique to a select agent or not. However, other points were obscures, as the definition of the best match sequences, the use of BLAST for global alignment... GenoTHREAT, our sequence screening softwareThe software we have implemented is called GenoTHREAT. Given a DNA sequence, GenoTHREAT indicates if this sequence may be one of concern or not. The algorithm and the implementation of this software are detailed in Genothreat. Tests and ResultsIn order to characterize the government guideline and the software we created, we have implemented and executed different tests. See Tests and Results. ConclusionWe have succeed in implementing a functioning sequence screening software. The main issue raised is the cost of doing a good software, rapid and efficient. We have also shown the influence of the software parameters as the keyword list or the BLAST parameters. The guideline we have followed was not precise enough and left a great place to interpretation. A new version must be edited in the next years. But we have showed that in following it, the result was an efficient sequence screening software, but expensive, and that must raised to many false hits. |