Latest revision as of 12:49, 12 October 2010

Tests and Results

Home

Our team

Sequence screening

The software: GenoTHREAT

Tests and Results

Screening of the iGEM registry

PCR fusion primer

Lab notebook

Safety

Media Links

Comments

We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more sophisticated tests as test on length, keywords...

Database

Select Agent And Toxins and non-Select Agents and Toxins
Both Select Agent and Toxins and non-Select Agent and Toxins were added to the DB for test sequences. Click here to find out more about Select Agent and Toxins.

The screening of these sequences showed the need to screen both amino-acid and nucleotide sequences. Screening amino-acid could add false positive result, but each type of screening (amino-acid or nucleotide) flagged dangerous sequences that are not flagged by the other one.

Housekeeping
Housekeeping genes were added to the database for both Select and non-Select Agents and Toxins to determine whether the screening software can effectively screen them. Click here to find out more about housekeeping genes.

The Housekeeping genes were considered as without sensitive effect on the efficiency of the screening.

Modifications

Intervening Sequence
An intervening sequence is an SAT sequence hidden within a larger, benign sequence. These modified sequences were constructed to determine whether the program can find hidden sequences. Click here to learn more about how these sequences were made.

Nearly all the sequences with over 200bp of dangerous sequence were flagged, which show the robustness of our software to find hidden sequences. Moreover, when the hidden part was under 150bps, the sequences were not usually flagged, but the rate of wrongly flagged sequences is a little higher, which showed that our software risks to make too many false hits. Nonetheless, that's made our software more secure.

Random Mutations
A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. Click here to learn more about how these sequences were made.

As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same behavior.

Degeneracy
Degenerate, or synonymous, sequences have different nucleotide sequences but encode the same protein. These sequences were screened to determine whether it is really necessary to align both the nucleotide and translated amino acids. Click here to learn more about how these sequences were made.

The degenerate sequences, in which the nucleotide sequences were changed while keeping the primary reading frame of the sequence the same, were all properly screened as SAT sequences. Of the 20 sequences the amino acids, which remained the same as the original sequences, all were identified as SAT sequences while the nucleotides, which were all changed from the original sequences, were undetected as SAT sequences. This shows the importance of screening the amino-acid too. The algorithm is so robust for codon optimization.

Tests

Keyword List
The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. Click here to see more about how the keyword list and test.

This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list from the CCL list, missed some dangerous sequences and have more false positive. We observed that the antikeyword list has not a great impact on the screening. It just corrected some false positive hits, which is nonetheless a good point as the weakness of our software is the number of false positives raised.

On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended keyword-list doesn't add much false positive compared to the limited one, so a keyword list as detailed as our keyword list (or more) should be used for a sequence screening software. The construction of this list is so a crucial point for the operation of the software.

BLAST Parameters
BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. Click here to learn about the various BLAST parameters and how they may affect sequence screening.

The tests were ran separately for amino-acid and nucleotides as they didn't have the same behavior. Indeed, the parameter sets have a greater effect on nucleotide than on amino-acid. For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences.

Sequence Length
Sequence length has the potential to affect our programs speed. This test tests the impact of sequence length on the programs speed. Click here to learn more about how the test was performed.

Of course, the screening time increases with the increasing of the sequence length. We observed a big improvement according to the used version. The online was the slowest one, as the parallelisation of the call to Blast was not possible. Then, the local version was already quite good, with an average of 6 min for 2000bp sequences, and 25min for 10 000bp sequences. The version on Sirion was far the fastest version. A sequence of 2000 bps will take only 3 min to be screened, and one of 10000 bps will be screened in 12 mins. If the sequence are mainly under 10000bps, a system as Sirion will allow the gene synthesis company to screen hundred of sequences a day.

We are in the process of writing a journal manuscript about GenoTHREAT. Much more detailed results are given in this manuscript. If you want to know more about it, you're invited to look at it when it will be published.

@@ Line 2: / Line 2: @@
 <br>
-We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more sophiscated tests as test on length, keywords...
+We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more sophisticated tests as test on length, keywords...
 <br>
@@ Line 15: / Line 15: @@
 <br>
 Both Select Agent and Toxins and non-Select Agent and Toxins were added to the DB for test sequences. [[Team:VT-ENSIMAG/Table1|Click here]] to find out more about Select Agent and Toxins.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultS|Results]]
+The screening of these sequences showed the need to screen both amino-acid and nucleotide sequences. Screening amino-acid could add false positive result, but each type of screening (amino-acid or nucleotide) flagged dangerous sequences that are not flagged by the other one.
 <tr>
 <td style="border-style: solid; border-width: 1px 1px 1px 1px">
@@ Line 22: / Line 23: @@
 <br>
 Housekeeping genes were added to the database for both Select and non-Select Agents and Toxins to determine whether the screening software can effectively screen them. [[Team:VT-ENSIMAG/Table2|Click here]] to find out more about housekeeping genes.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultH|Results]]
+The Housekeeping genes were considered as without sensitive effect on the efficiency of the screening.
 <tr>
@@ Line 33: / Line 35: @@
 An intervening sequence is an SAT sequence hidden within a larger, benign sequence. These modified sequences were constructed to determine whether the program can find hidden sequences. [[Team:VT-ENSIMAG/Table3|Click here]] to learn more about how these sequences were made.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" align="center">
+Nearly all the sequences with over 200bp of dangerous sequence were flagged, which show the robustness of our software to find hidden sequences. Moreover, when the hidden part was under 150bps, the sequences were not usually flagged, but the rate of wrongly flagged sequences is a little higher, which showed that our software risks to make too many false hits. Nonetheless, that's made our software more secure.
-[[Team:VT-ENSIMAG/ResultI|Results]]
 <tr>
@@ Line 41: / Line 44: @@
 <br>
 A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. [[Team:VT-ENSIMAG/Table4|Click here]] to learn more about how these sequences were made.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultR|Results]]
+As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same behavior.
 <tr>
@@ Line 49: / Line 53: @@
 <br>
 Degenerate, or synonymous, sequences have different nucleotide sequences but encode the same protein. These sequences were screened to determine whether it is really necessary to align both the nucleotide and translated amino acids. [[Team:VT-ENSIMAG/Table5|Click here]] to learn more about how these sequences were made.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultD|Results]]
+The degenerate sequences, in which the nucleotide sequences were changed while keeping the primary reading frame of the sequence the same, were all properly screened as SAT sequences. Of the 20 sequences the amino acids, which remained the same as the original sequences, all were identified as SAT sequences while the nucleotides, which were all changed from the original sequences, were undetected as SAT sequences.
+This shows the importance of screening the amino-acid too. The algorithm is so robust for codon optimization.
 <tr>
@@ Line 59: / Line 64: @@
 <br>
 The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. [[Team:VT-ENSIMAG/Table6|Click here]] to see more about how the keyword list and test.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" align="center">
-[[Team:VT-ENSIMAG/ResultK|Results]]
+This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list from the CCL list, missed some dangerous sequences and have more false positive.
+We observed that the antikeyword list has not a great impact on the screening. It just corrected some false positive hits, which is nonetheless a good point as the weakness of our software is the number of false positives raised.
+On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended keyword-list doesn't add much false positive compared to the limited one, so a keyword list as detailed as our keyword list (or more) should be used for a sequence screening software. The construction of this list is so a crucial point for the operation of the software.
 <tr>
 <td style="border-style: solid; border-width: 1px 1px 1px 1px">
@@ Line 66: / Line 75: @@
 <br>
 BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. [[Team:VT-ENSIMAG/Table7|Click here]] to learn about the various BLAST parameters and how they may affect sequence screening.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultB|Results]]
+The tests were ran separately for amino-acid and nucleotides as they didn't have the same behavior. Indeed, the parameter sets have a greater effect on nucleotide than on amino-acid.
+For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences.
 <tr>
@@ Line 74: / Line 84: @@
 <br>
 Sequence length has the potential to affect our programs speed. This test tests the impact of sequence length on the programs speed. [[Team:VT-ENSIMAG/Table8|Click here]] to learn more about how the test was performed.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultSL|Results]]
-</table>
+Of course, the screening time increases with the increasing of the sequence length.
+We observed a big improvement according to the used version. The online was the slowest one, as the parallelisation of the call to Blast was not possible. Then, the local version was already quite good, with an average of 6 min for 2000bp sequences, and 25min for 10 000bp sequences. The version on Sirion was far the fastest version. A sequence of 2000 bps will take only 3 min to be screened, and one of 10000 bps will be screened in 12 mins. If the sequence are mainly under 10000bps, a system as Sirion will allow the gene synthesis company to screen hundred of sequences a day.
+</table>
+We are in the process of writing a journal manuscript about GenoTHREAT. Much more detailed results are given in this manuscript. If you want to know more about it, you're invited to look at it when it will be published.
 }}