Revision as of 14:54, 27 September 2010

Tests and Results

Home

Our team

Sequence screening

The software: GenoTHREAT

Tests and Results

Screening of the iGEM registry

PCR fusion primer

Lab notebook

Safety

Media Links

Comments

We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more sophiscated tests as test on length, keywords...

Database

Select Agent And Toxins and non-Select Agents and Toxins
Both Select Agent and Toxins and non-Select Agent and Toxins were added to the DB for test sequences. Click here to find out more about Select Agent and Toxins.

The screening of these sequences showed the need to screen both amino-acid and nucleotide sequences. Screening amino-acid could add false positive result, but each type of screening (amino-acid or nucleotide) flagged dangerous sequences that are not flagged by the other one.

Housekeeping
Housekeeping genes were added to the database for both Select and non-Select Agents and Toxins to determine whether the screening software can effectively screen them. Click here to find out more about housekeeping genes.

The Housekeeping genes were considered as without sensitive effect on the efficiency of the screening.

Modifications

Intervening Sequence
An intervening sequence is an SAT sequence hidden within a larger, benign sequence. These modified sequences were constructed to determine whether the program can find hidden sequences. Click here to learn more about how these sequences were made.

Nearly all the sequences with over 200bp of dangerous sequence were flagged, which show the robustness of our software to find hidden sequences. Moreover, when the hidden part was under 150bps, the sequences were not usually flagged, but the rate of wrongly flagged sequences is a little higher, which showed that our software risks to make too many false hits. Nonetheless, that's made our software more secure.

Random Mutations
A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. Click here to learn more about how these sequences were made.

As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same behaviour.

Degeneracy
Degenerate, or synonymous, sequences have different nucleotide sequences but encode the same protein. These sequences were screened to determine whether it is really necessary to align both the nucleotide and translated amino acids. Click here to learn more about how these sequences were made.

The degenerate sequences, in which the nucleotide sequences were changed while keeping the primary reading frame of the sequence the same, were all properly screened as SAT sequences. Of the 20 sequences the amino acids, which remained the same as the original sequences, all were identified as SAT sequences while the nucleotides, which were all changed from the original sequences, were undetected as SAT sequences. This shows the importance of screening the amino-acid too. The algorithm is so robust for codon optimization.

Tests

Keyword List
The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. Click here to see more about how the keyword list and test.

This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list fron the ccl list, missed some dangerous sequences and have more false postive. We constated that the antikeyword list has not a great impact on the screening. It just corrected some false postive hits, which is nonetheless a good point as the weakness of our software is the number of false positives raised.

On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended keywordlist doesn't add much false postive compared to the limited one, so a keyword list as detailed as our keyword list (or more) should be used for a sequence screening software. The construction of this list is so a crucial point for the operation of the software.

BLAST Parameters
BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. Click here to learn about the various BLAST parameters and how they may affect sequence screening.

The tests were runned separatly for amino-acid and nucleotides as they didn't have the same behavior. Indeed, the parameter sets have a greater effect on nucleotide than on amino-acid. For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences.

Sequence Length
Sequence length has the potential to affect our programs speed. This test tests the impact of sequence length on the programs speed. Click here to learn more about how the test was performed.

Of course, the screening time increases with the increasing of the sequence length. We constated a big improvement according to the used version. The online was the slowest one, as the parallelisation of the call to Blast was not possible. Then, the local version was already quite good, with an average of 6 min for 2000bp sequences, and 25min for 10 000bp sequences. The version on Sirion was far the fastest version. A sequence of 2000 bps will take only 3 min to be screened, and one of 10000 bps will be screened in 12 mins. If the sequence are mainly under 10000bps, a system as sirion will allow the gene sythesis company to screen hundred of sequences a day.

@@ Line 17: / Line 17: @@
 The screening of these sequences showed the need to screen both amino-acid and nucleotide sequences. Screening amino-acid could add false positive result, but each type of screening (amino-acid or nucleotide) flagged dangerous sequences that are not flagged by the other one.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultS|Done]]
 <tr>
 <td style="border-style: solid; border-width: 1px 1px 1px 1px">
@@ Line 26: / Line 25: @@
 The Housekeeping genes were considered as without sensitive effect on the efficiency of the screening.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultH|Done]]
 <tr>
@@ Line 39: / Line 37: @@
 Nearly all the sequences with over 200bp of dangerous sequence were flagged, which show the robustness of our software to find hidden sequences. Moreover, when the hidden part was under 150bps, the sequences were not usually flagged, but the rate of wrongly flagged sequences is a little higher, which showed that our software risks to make too many false hits. Nonetheless, that's made our software more secure.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" align="center">
-[[Team:VT-ENSIMAG/ResultI|Done]]
 <tr>
@@ Line 49: / Line 46: @@
 As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same behaviour.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultR|Done]]
 <tr>
@@ Line 60: / Line 56: @@
 The degenerate sequences, in which the nucleotide sequences were changed while keeping the primary reading frame of the sequence the same, were all properly screened as SAT sequences. Of the 20 sequences the amino acids, which remained the same as the original sequences, all were identified as SAT sequences while the nucleotides, which were all changed from the original sequences, were undetected as SAT sequences.
 This shows the importance of screening the amino-acid too. The algorithm is so robust for codon optimization.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultD|Done]]
 <tr>
@@ Line 75: / Line 69: @@
 On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended keywordlist doesn't add much false postive compared to the limited one, so a keyword list as detailed as our keyword list (or more) should be used for a sequence screening software. The construction of this list is so a crucial point for the operation of the software.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" align="center">
-[[Team:VT-ENSIMAG/ResultK|Done]]
 <tr>
 <td style="border-style: solid; border-width: 1px 1px 1px 1px">
@@ Line 85: / Line 78: @@
 The tests were runned separatly for amino-acid and nucleotides as they didn't have the same behavior. Indeed, the parameter sets have a greater effect on nucleotide than on amino-acid.
 For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultB|Done]]
 <tr>
@@ Line 96: / Line 87: @@
 Of course, the screening time increases with the increasing of the sequence length.
 We constated a big improvement according to the used version. The online was the slowest one, as the parallelisation of the call to Blast was not possible. Then, the local version was already quite good, with an average of 6 min for 2000bp sequences, and 25min for 10 000bp sequences. The version on Sirion was far the fastest version. A sequence of 2000 bps will take only 3 min to be screened, and one of 10000 bps will be screened in 12 mins. If the sequence are mainly under 10000bps, a system as sirion will allow the gene sythesis company to screen hundred of sequences a day.
-<td style="border-style: solid; border-width: 1px 1px 1px 1px" width="90" align="center">
-[[Team:VT-ENSIMAG/ResultSL|Results]]
 </table>