Team:VT-ENSIMAG/Table6

From 2010.igem.org

(Difference between revisions)
Line 1: Line 1:
{{Template:Team:VT_Ensimag_2010-Biosecurity/Templates/main|Keyword List| content=
{{Template:Team:VT_Ensimag_2010-Biosecurity/Templates/main|Keyword List| content=
__NOTOC__
__NOTOC__
 +
An integral feature of the sequence screening process is the keyword and anti-keyword lists used to identify whether the “Best Match” is to a Select Agent or Toxin. Two different keyword lists and one anti-keyword list were developed to determine how different lists, working in tandem, affect the program’s ability to recognize sequences of concern in an order. The more limited key word list is only composed of words found on the CDC Select Agent and Toxin List.  The more extensive, second list includes alternative names for, and words related to, the entries on the CDC Select Agent and Toxin List.  In the case of toxins, related words include the names of enzymes which are intimately associated with the toxin’s production and function as well as organisms which directly produce the toxin. For organism and virus entries, related words include the names of diseases associated with the entries in addition to any toxins or pathogenic agents uniquely produced by the entry.  Discretion was used when developing the more extensive key world list because an overly inclusive list could increase the number of false positive results.   
An integral feature of the sequence screening process is the keyword and anti-keyword lists used to identify whether the “Best Match” is to a Select Agent or Toxin. Two different keyword lists and one anti-keyword list were developed to determine how different lists, working in tandem, affect the program’s ability to recognize sequences of concern in an order. The more limited key word list is only composed of words found on the CDC Select Agent and Toxin List.  The more extensive, second list includes alternative names for, and words related to, the entries on the CDC Select Agent and Toxin List.  In the case of toxins, related words include the names of enzymes which are intimately associated with the toxin’s production and function as well as organisms which directly produce the toxin. For organism and virus entries, related words include the names of diseases associated with the entries in addition to any toxins or pathogenic agents uniquely produced by the entry.  Discretion was used when developing the more extensive key world list because an overly inclusive list could increase the number of false positive results.   
 +
[[Image:VTIMAG_Keyword.png|center|frame|<br>''Extract of the Keyword database'']]
<br>
<br>
 +
 +
Select strains or forms of Select Agents or Toxins have been recognized by the Government as not harmful to public or environmental health. These strains or forms are considered exclusions from the keyword list. The anti-keyword list contains terms uniquely related to strains or forms excluded from the Select Agents or Toxins. If a GenBank entry of the “Best Match” contains a keyword, then it is cross referenced with the anti-keyword list. If the GenBank entry is found to contain an anti-keyword, then the “Best Match” is not considered as a sequence of concern.
Select strains or forms of Select Agents or Toxins have been recognized by the Government as not harmful to public or environmental health. These strains or forms are considered exclusions from the keyword list. The anti-keyword list contains terms uniquely related to strains or forms excluded from the Select Agents or Toxins. If a GenBank entry of the “Best Match” contains a keyword, then it is cross referenced with the anti-keyword list. If the GenBank entry is found to contain an anti-keyword, then the “Best Match” is not considered as a sequence of concern.
 +
[[Image:VTIMAG_Antikeyword.png|center|frame|<br>''Extract of the AntiKeyword database'']]
<br>
<br>
 +
Since the keyword list identifies dangerous sequences as such, the success of the screening software relies heavily upon its content. An advantage keyword finding is that it can be automated and the keyword list can be refined over time. A drawback is that it sees in black and white; it cannot make judgment calls as a human can. To test the effects of the keyword list, different combinations of the limited and extensive keyword lists with the anti-keyword lists were set as parameters for the program. To see outcome of this experiment, see our results section.  
Since the keyword list identifies dangerous sequences as such, the success of the screening software relies heavily upon its content. An advantage keyword finding is that it can be automated and the keyword list can be refined over time. A drawback is that it sees in black and white; it cannot make judgment calls as a human can. To test the effects of the keyword list, different combinations of the limited and extensive keyword lists with the anti-keyword lists were set as parameters for the program. To see outcome of this experiment, see our results section.  
<br>
<br>
<br>
<br>
-
[[Team:VT-ENSIMAG/Methods|Go back to methods page]]
+
[[Team:VT-ENSIMAG/Result|Go back to results page]]
}}
}}

Revision as of 15:29, 18 August 2010


VT-ENSIMAG over VT campus long.png

Keyword List




DNAside.png

Home

Our team

Sequence screening

The software: GenoTHREAT

Tests and Results

Screening of the iGEM registry

PCR fusion primer

Lab notebook

Safety

Media Links

Comments

SAIC.jpeg

Mitre.jpeg




An integral feature of the sequence screening process is the keyword and anti-keyword lists used to identify whether the “Best Match” is to a Select Agent or Toxin. Two different keyword lists and one anti-keyword list were developed to determine how different lists, working in tandem, affect the program’s ability to recognize sequences of concern in an order. The more limited key word list is only composed of words found on the CDC Select Agent and Toxin List. The more extensive, second list includes alternative names for, and words related to, the entries on the CDC Select Agent and Toxin List. In the case of toxins, related words include the names of enzymes which are intimately associated with the toxin’s production and function as well as organisms which directly produce the toxin. For organism and virus entries, related words include the names of diseases associated with the entries in addition to any toxins or pathogenic agents uniquely produced by the entry. Discretion was used when developing the more extensive key world list because an overly inclusive list could increase the number of false positive results.


Extract of the Keyword database



Select strains or forms of Select Agents or Toxins have been recognized by the Government as not harmful to public or environmental health. These strains or forms are considered exclusions from the keyword list. The anti-keyword list contains terms uniquely related to strains or forms excluded from the Select Agents or Toxins. If a GenBank entry of the “Best Match” contains a keyword, then it is cross referenced with the anti-keyword list. If the GenBank entry is found to contain an anti-keyword, then the “Best Match” is not considered as a sequence of concern.


Extract of the AntiKeyword database


Since the keyword list identifies dangerous sequences as such, the success of the screening software relies heavily upon its content. An advantage keyword finding is that it can be automated and the keyword list can be refined over time. A drawback is that it sees in black and white; it cannot make judgment calls as a human can. To test the effects of the keyword list, different combinations of the limited and extensive keyword lists with the anti-keyword lists were set as parameters for the program. To see outcome of this experiment, see our results section.



Go back to results page

ALIEN DNA.png
VT-ENSIMAG logo.png
ALIEN DNA.png