Team:VT-ENSIMAG/Registry

From 2010.igem.org

(Difference between revisions)
 
(3 intermediate revisions not shown)
Line 4: Line 4:
Screening the iGEM registry:
Screening the iGEM registry:
-
This summer we also screened the entire iGEM registry. This process was motivated by two reasons:
+
This summer we also screened the entire iGEM registry. This screen had a two-fold purpose:
-test our program on a registry that would simulate the DNA orders got by a Gene Synthesis company.
-test our program on a registry that would simulate the DNA orders got by a Gene Synthesis company.
-contribute to iGEM by verifying that the sequences present in the registry aren't dangerous
-contribute to iGEM by verifying that the sequences present in the registry aren't dangerous
 +
1)Real World Gene Order simulation:
1)Real World Gene Order simulation:
Line 12: Line 13:
In the case of a real-life situation, such as a gene synthesis company processing thousand of sequences a day, the aim is not only to correctly detect dangerous sequences, but also to avoid having too many false hits. Indeed, each hit means that a human has to go manually look at the reason why the hit was raised, which costs a lot of money.
In the case of a real-life situation, such as a gene synthesis company processing thousand of sequences a day, the aim is not only to correctly detect dangerous sequences, but also to avoid having too many false hits. Indeed, each hit means that a human has to go manually look at the reason why the hit was raised, which costs a lot of money.
To simulate the orders that a gene synthesis company could get, we decided to screen the iGEM registry, which is a table of sequences completed each year by iGEM teams. It contained about 10,000 sequences; we screened them all.
To simulate the orders that a gene synthesis company could get, we decided to screen the iGEM registry, which is a table of sequences completed each year by iGEM teams. It contained about 10,000 sequences; we screened them all.
-
After screening the first 1724 first sequences of this registry, the hit rate was at 6.5%, which is way too high in an industrial environment.
+
After screening the first 1724 first sequences of this registry, the hit rate was reduced from 6.5% to 2.9%, by altering certain program parameters. This result highlights the program's customizable nature and potential for optimization.
-
We look at where all those hits were coming from and, based on that, made a few minor changes to our program:
+
We looked into the causes of all those hits, and as mentioned above made a few minor changes to our program:
-first of all, there were a few issues with the keyword list, that we simply corrected right away
-first of all, there were a few issues with the keyword list, that we simply corrected right away
-often, some results of BLAST should have been contained, in theory, in the BLAST results, but were not because the fact of taking only the BLAST results that had a query coverage of 100% in the Best Matches was too restrictive. Therefore, we decided to put in the Best Matches, all of BLAST results which had a query coverage of 95% or greater
-often, some results of BLAST should have been contained, in theory, in the BLAST results, but were not because the fact of taking only the BLAST results that had a query coverage of 100% in the Best Matches was too restrictive. Therefore, we decided to put in the Best Matches, all of BLAST results which had a query coverage of 95% or greater
-
-sometimes the BLAST results contained matches of 100% query coverage that were Select Agents, and that lead to hits. But the percent identity of these matches was so low (often around 45% for protein sequences) that the similarity was too low to be considered serious. Therefore we added a condition for a BLAST result to be a Best Match: the percent identity needs to be at least 60%.
+
-sometimes the BLAST results contained matches of 100% query coverage that were Select Agents, and that led to hits. But the percent identity of these matches was so low (often around 45% for protein sequences) that the similarity was too low to be considered serious. Therefore we added a condition for a BLAST result to be a Best Match: the percent identity needs to be at least 60%.
After making these changes and screening the entire iGEM registry, the hit rate went down to 2.9%. After making these changes to the program, we also reran some of the previous modified sequences, and we got results similar to before: so the detection capabilities of our program were not affected.
After making these changes and screening the entire iGEM registry, the hit rate went down to 2.9%. After making these changes to the program, we also reran some of the previous modified sequences, and we got results similar to before: so the detection capabilities of our program were not affected.
 +
2) Contribution to iGEM
2) Contribution to iGEM
 +
Hopefully, there are not 2.9% of the sequences in the registry that are dangerous. After manually looking at the result, we find that many of these hits were false hits. Indeed, it’s mainly the keyword list and the way we use this keyword list that needs to be improved to decrease the number of hits.
Hopefully, there are not 2.9% of the sequences in the registry that are dangerous. After manually looking at the result, we find that many of these hits were false hits. Indeed, it’s mainly the keyword list and the way we use this keyword list that needs to be improved to decrease the number of hits.
The hits left are also often due to one sub-sequence of one amino-acid frame that leads to a hit. These kinds of sub-sequences just happened to be there because of random chance. We suspect that in a lot of cases, the amino-acid frame where the hit happens isn’t the one that the final user is intending to use. We are really wondering
The hits left are also often due to one sub-sequence of one amino-acid frame that leads to a hit. These kinds of sub-sequences just happened to be there because of random chance. We suspect that in a lot of cases, the amino-acid frame where the hit happens isn’t the one that the final user is intending to use. We are really wondering
Line 27: Line 30:
So after getting rid of the sequences where only one subsequence of one frame lead to a hit and the sequences where keyword issues lead to a hit, we ended up with one true hit:
So after getting rid of the sequences where only one subsequence of one frame lead to a hit and the sequences where keyword issues lead to a hit, we ended up with one true hit:
It’s the sequence: BBa_I10020.  
It’s the sequence: BBa_I10020.  
-
Hopefully, iGEM was already aware that the sequence was dangerous. Indeed, on the main page of this sequence, there is a big WARNING indicating the fact that this sequence is potentially dangerous.
+
 
 +
Luckily, iGEM was already aware that the sequence was dangerous. Indeed, on the main page of this sequence, there is a big WARNING indicating the fact that this sequence is potentially dangerous. This result shows that GenoTHREAT is capable of identifying potentially dangerous sequences in the registry. Therefore, it could be used as a tool to screen incoming sequences in order to ensure the safety of the students using the registry as well as those around them.  
<br>
<br>
<br>
<br>
-
 
-
[[Image:VTENSI_registry.png|frame|100px|center|<br> ''Number of sequences in the iGEM registry sorted by length'']]
 
}}
}}

Latest revision as of 04:18, 19 October 2010


VT-ENSIMAG over VT campus long.png

Screening of the iGEM registry




DNAside.png

Home

Our team

Sequence screening

The software: GenoTHREAT

Tests and Results

Screening of the iGEM registry

PCR fusion primer

Lab notebook

Safety

Media Links

Comments

SAIC.jpeg

Mitre.jpeg



Screening the iGEM registry:

This summer we also screened the entire iGEM registry. This screen had a two-fold purpose: -test our program on a registry that would simulate the DNA orders got by a Gene Synthesis company. -contribute to iGEM by verifying that the sequences present in the registry aren't dangerous


1)Real World Gene Order simulation:

In the case of a real-life situation, such as a gene synthesis company processing thousand of sequences a day, the aim is not only to correctly detect dangerous sequences, but also to avoid having too many false hits. Indeed, each hit means that a human has to go manually look at the reason why the hit was raised, which costs a lot of money. To simulate the orders that a gene synthesis company could get, we decided to screen the iGEM registry, which is a table of sequences completed each year by iGEM teams. It contained about 10,000 sequences; we screened them all. After screening the first 1724 first sequences of this registry, the hit rate was reduced from 6.5% to 2.9%, by altering certain program parameters. This result highlights the program's customizable nature and potential for optimization.

We looked into the causes of all those hits, and as mentioned above made a few minor changes to our program: -first of all, there were a few issues with the keyword list, that we simply corrected right away -often, some results of BLAST should have been contained, in theory, in the BLAST results, but were not because the fact of taking only the BLAST results that had a query coverage of 100% in the Best Matches was too restrictive. Therefore, we decided to put in the Best Matches, all of BLAST results which had a query coverage of 95% or greater -sometimes the BLAST results contained matches of 100% query coverage that were Select Agents, and that led to hits. But the percent identity of these matches was so low (often around 45% for protein sequences) that the similarity was too low to be considered serious. Therefore we added a condition for a BLAST result to be a Best Match: the percent identity needs to be at least 60%. After making these changes and screening the entire iGEM registry, the hit rate went down to 2.9%. After making these changes to the program, we also reran some of the previous modified sequences, and we got results similar to before: so the detection capabilities of our program were not affected.


2) Contribution to iGEM

Hopefully, there are not 2.9% of the sequences in the registry that are dangerous. After manually looking at the result, we find that many of these hits were false hits. Indeed, it’s mainly the keyword list and the way we use this keyword list that needs to be improved to decrease the number of hits. The hits left are also often due to one sub-sequence of one amino-acid frame that leads to a hit. These kinds of sub-sequences just happened to be there because of random chance. We suspect that in a lot of cases, the amino-acid frame where the hit happens isn’t the one that the final user is intending to use. We are really wondering if in that case, we should raise a hit or not.

So after getting rid of the sequences where only one subsequence of one frame lead to a hit and the sequences where keyword issues lead to a hit, we ended up with one true hit: It’s the sequence: BBa_I10020.

Luckily, iGEM was already aware that the sequence was dangerous. Indeed, on the main page of this sequence, there is a big WARNING indicating the fact that this sequence is potentially dangerous. This result shows that GenoTHREAT is capable of identifying potentially dangerous sequences in the registry. Therefore, it could be used as a tool to screen incoming sequences in order to ensure the safety of the students using the registry as well as those around them.



ALIEN DNA.png
VT-ENSIMAG logo.png
ALIEN DNA.png