Team:VT-ENSIMAG/Result
From 2010.igem.org
(3 intermediate revisions not shown) | |||
Line 2: | Line 2: | ||
<br> | <br> | ||
- | We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more | + | We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more sophisticated tests as test on length, keywords... |
<br> | <br> | ||
Line 17: | Line 17: | ||
The screening of these sequences showed the need to screen both amino-acid and nucleotide sequences. Screening amino-acid could add false positive result, but each type of screening (amino-acid or nucleotide) flagged dangerous sequences that are not flagged by the other one. | The screening of these sequences showed the need to screen both amino-acid and nucleotide sequences. Screening amino-acid could add false positive result, but each type of screening (amino-acid or nucleotide) flagged dangerous sequences that are not flagged by the other one. | ||
- | + | ||
- | + | ||
<tr> | <tr> | ||
<td style="border-style: solid; border-width: 1px 1px 1px 1px"> | <td style="border-style: solid; border-width: 1px 1px 1px 1px"> | ||
Line 26: | Line 25: | ||
The Housekeeping genes were considered as without sensitive effect on the efficiency of the screening. | The Housekeeping genes were considered as without sensitive effect on the efficiency of the screening. | ||
- | + | ||
- | + | ||
<tr> | <tr> | ||
Line 39: | Line 37: | ||
Nearly all the sequences with over 200bp of dangerous sequence were flagged, which show the robustness of our software to find hidden sequences. Moreover, when the hidden part was under 150bps, the sequences were not usually flagged, but the rate of wrongly flagged sequences is a little higher, which showed that our software risks to make too many false hits. Nonetheless, that's made our software more secure. | Nearly all the sequences with over 200bp of dangerous sequence were flagged, which show the robustness of our software to find hidden sequences. Moreover, when the hidden part was under 150bps, the sequences were not usually flagged, but the rate of wrongly flagged sequences is a little higher, which showed that our software risks to make too many false hits. Nonetheless, that's made our software more secure. | ||
- | + | ||
- | + | ||
<tr> | <tr> | ||
Line 48: | Line 45: | ||
A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. [[Team:VT-ENSIMAG/Table4|Click here]] to learn more about how these sequences were made. | A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. [[Team:VT-ENSIMAG/Table4|Click here]] to learn more about how these sequences were made. | ||
- | As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same | + | As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same behavior. |
- | + | ||
- | + | ||
<tr> | <tr> | ||
Line 60: | Line 56: | ||
The degenerate sequences, in which the nucleotide sequences were changed while keeping the primary reading frame of the sequence the same, were all properly screened as SAT sequences. Of the 20 sequences the amino acids, which remained the same as the original sequences, all were identified as SAT sequences while the nucleotides, which were all changed from the original sequences, were undetected as SAT sequences. | The degenerate sequences, in which the nucleotide sequences were changed while keeping the primary reading frame of the sequence the same, were all properly screened as SAT sequences. Of the 20 sequences the amino acids, which remained the same as the original sequences, all were identified as SAT sequences while the nucleotides, which were all changed from the original sequences, were undetected as SAT sequences. | ||
This shows the importance of screening the amino-acid too. The algorithm is so robust for codon optimization. | This shows the importance of screening the amino-acid too. The algorithm is so robust for codon optimization. | ||
- | |||
- | |||
<tr> | <tr> | ||
Line 71: | Line 65: | ||
The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. [[Team:VT-ENSIMAG/Table6|Click here]] to see more about how the keyword list and test. | The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. [[Team:VT-ENSIMAG/Table6|Click here]] to see more about how the keyword list and test. | ||
- | This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list | + | This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list from the CCL list, missed some dangerous sequences and have more false positive. |
- | We | + | We observed that the antikeyword list has not a great impact on the screening. It just corrected some false positive hits, which is nonetheless a good point as the weakness of our software is the number of false positives raised. |
+ | |||
+ | On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended keyword-list doesn't add much false positive compared to the limited one, so a keyword list as detailed as our keyword list (or more) should be used for a sequence screening software. The construction of this list is so a crucial point for the operation of the software. | ||
- | |||
- | |||
- | |||
<tr> | <tr> | ||
<td style="border-style: solid; border-width: 1px 1px 1px 1px"> | <td style="border-style: solid; border-width: 1px 1px 1px 1px"> | ||
Line 83: | Line 76: | ||
BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. [[Team:VT-ENSIMAG/Table7|Click here]] to learn about the various BLAST parameters and how they may affect sequence screening. | BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. [[Team:VT-ENSIMAG/Table7|Click here]] to learn about the various BLAST parameters and how they may affect sequence screening. | ||
- | The tests were | + | The tests were ran separately for amino-acid and nucleotides as they didn't have the same behavior. Indeed, the parameter sets have a greater effect on nucleotide than on amino-acid. |
For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences. | For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences. | ||
- | |||
- | |||
<tr> | <tr> | ||
Line 95: | Line 86: | ||
Of course, the screening time increases with the increasing of the sequence length. | Of course, the screening time increases with the increasing of the sequence length. | ||
- | We | + | We observed a big improvement according to the used version. The online was the slowest one, as the parallelisation of the call to Blast was not possible. Then, the local version was already quite good, with an average of 6 min for 2000bp sequences, and 25min for 10 000bp sequences. The version on Sirion was far the fastest version. A sequence of 2000 bps will take only 3 min to be screened, and one of 10000 bps will be screened in 12 mins. If the sequence are mainly under 10000bps, a system as Sirion will allow the gene synthesis company to screen hundred of sequences a day. |
- | + | ||
- | + | ||
</table> | </table> | ||
- | + | We are in the process of writing a journal manuscript about GenoTHREAT. Much more detailed results are given in this manuscript. If you want to know more about it, you're invited to look at it when it will be published. | |
}} | }} |
Latest revision as of 12:49, 12 October 2010
Tests and Results
|
We are in the process of writing a journal manuscript about GenoTHREAT. Much more detailed results are given in this manuscript. If you want to know more about it, you're invited to look at it when it will be published. |