Team:Paris Liliane Bettencourt/Project/SIP/Downloads
From 2010.igem.org
(Difference between revisions)
Line 53: | Line 53: | ||
<i><b>Warning :</b> Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.<br /> | <i><b>Warning :</b> Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.<br /> | ||
You can also use html2text, but if the software find special character, it don't remove the html.<br /> | You can also use html2text, but if the software find special character, it don't remove the html.<br /> | ||
- | Also, I know there're some few bugs like in 2009's files : Illinois-tools is not downloaded... But unfortunately, I didn't fix them before the deadline.</i> | + | Also, I know there're some few bugs like in 2009's files : Illinois-tools is not downloaded... But unfortunately, I didn't fix them before the deadline. |
+ | <br /> | ||
+ | <br /> | ||
+ | Notes about filters : In these files, there're no filters but you can make what you want : remove common name, keep only [http://www.nlm.nih.gov/mesh/ MeSH] terms etc. See what you need!</i> | ||
<br /> | <br /> |
Revision as of 15:33, 24 October 2010
Team List
- Team list 2007 (UNIX) | Team list 2007 (WIN32)
- Team list 2008 (UNIX) | Team list 2008 (WIN32)
- Team list 2009 (UNIX) | Team list 2009 (WIN32)
- [# Wiki data 2007 (ZIP)]
- [# Wiki data 2008 (ZIP)]
- [# Wiki data 2009 (ZIP)]
- SIP words database 2007
- [# SIP words database 2008]
- [# SIP words database 2009]
To read databases, use [http://www.sqlite.org/ sqlite3].
Warning : Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.
You can also use html2text, but if the software find special character, it don't remove the html.
Also, I know there're some few bugs like in 2009's files : Illinois-tools is not downloaded... But unfortunately, I didn't fix them before the deadline.
Notes about filters : In these files, there're no filters but you can make what you want : remove common name, keep only [http://www.nlm.nih.gov/mesh/ MeSH] terms etc. See what you need!