Team:VT-ENSIMAG/Result
From 2010.igem.org
Line 2: | Line 2: | ||
<br> | <br> | ||
- | We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more | + | We have performed different tests constructed in order to test one particular point each time. The first tests were basic tests to check the efficiency of the software. Then we conducted more sophisticated tests as test on length, keywords... |
<br> | <br> | ||
Line 45: | Line 45: | ||
A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. [[Team:VT-ENSIMAG/Table4|Click here]] to learn more about how these sequences were made. | A random mutation sequence contains a defined number of mutations to resemble single nucleotide polymorphisms (SNPs). They were designed to test how effectively the software screens when sequence alignment parameters are varied. [[Team:VT-ENSIMAG/Table4|Click here]] to learn more about how these sequences were made. | ||
- | As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same | + | As expected, the number of hits decreased as the number of mutations increased. Both amino-acid and nucleotide has this same behavior. |
Line 65: | Line 65: | ||
The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. [[Team:VT-ENSIMAG/Table6|Click here]] to see more about how the keyword list and test. | The keyword list is used to determine if the “Best Match” is to an SAT or not. This test is designed to show how the keyword lists affects the outcome of our program. [[Team:VT-ENSIMAG/Table6|Click here]] to see more about how the keyword list and test. | ||
- | This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list | + | This test showed us the efficiency of our keyword list. Indeed, the limited keyword list, which is the more natural keyword list from the CCL list, missed some dangerous sequences and have more false positive. |
- | We | + | We observed that the antikeyword list has not a great impact on the screening. It just corrected some false positive hits, which is nonetheless a good point as the weakness of our software is the number of false positives raised. |
- | On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended | + | On the other hand, this showed the importance of the construction of the keyword list. The number of sequences correctly flagged with the limited keyword list is only half the number of the sequences correctly flagged with the extended keyword list. Moreover, the extended keyword-list doesn't add much false positive compared to the limited one, so a keyword list as detailed as our keyword list (or more) should be used for a sequence screening software. The construction of this list is so a crucial point for the operation of the software. |
<tr> | <tr> | ||
Line 76: | Line 76: | ||
BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. [[Team:VT-ENSIMAG/Table7|Click here]] to learn about the various BLAST parameters and how they may affect sequence screening. | BLAST has parameters that affect the resulting sequence alignments. This test is used to identify the affects of various BLAST parameter combinations. [[Team:VT-ENSIMAG/Table7|Click here]] to learn about the various BLAST parameters and how they may affect sequence screening. | ||
- | The tests were | + | The tests were ran separately for amino-acid and nucleotides as they didn't have the same behavior. Indeed, the parameter sets have a greater effect on nucleotide than on amino-acid. |
For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences. | For the nucleotide sequences, as the time spent screening increased, there was a corresponding increase in the number of hits. The amino acid sequences did not exhibit a similar trend. For the nucleotide sequences screened, nucleotide parameter 3 allowed the program to find significantly more hits as the number of mutations increased as compared to the nucleotide default parameters. Compared to the nucleotide sequences, no set of BLAST parameters tested affected the program’s ability to find hits as a function of time for amino acid sequences. | ||
Line 86: | Line 86: | ||
Of course, the screening time increases with the increasing of the sequence length. | Of course, the screening time increases with the increasing of the sequence length. | ||
- | We | + | We observed a big improvement according to the used version. The online was the slowest one, as the parallelisation of the call to Blast was not possible. Then, the local version was already quite good, with an average of 6 min for 2000bp sequences, and 25min for 10 000bp sequences. The version on Sirion was far the fastest version. A sequence of 2000 bps will take only 3 min to be screened, and one of 10000 bps will be screened in 12 mins. If the sequence are mainly under 10000bps, a system as Sirion will allow the gene synthesis company to screen hundred of sequences a day. |
</table> | </table> | ||
- | We are in the process of writing a journal manuscript about GenoTHREAT. Much more | + | We are in the process of writing a journal manuscript about GenoTHREAT. Much more detailed results are given in this manuscript. If you want to know more about it, you're invited to look at it when it will be published. |
}} | }} |
Latest revision as of 12:49, 12 October 2010
Tests and Results
|
We are in the process of writing a journal manuscript about GenoTHREAT. Much more detailed results are given in this manuscript. If you want to know more about it, you're invited to look at it when it will be published. |