Team:Paris Liliane Bettencourt/Project/SIP/Downloads

From 2010.igem.org

(Difference between revisions)
Line 40: Line 40:
* [[Media:Teamlist_2009-unix.txt|Team list 2009 (UNIX)]] <html><a href="#"> |</a></html> [[Media:Teamlist_2009-win32.txt|Team list 2009 (WIN32)]]
* [[Media:Teamlist_2009-unix.txt|Team list 2009 (UNIX)]] <html><a href="#"> |</a></html> [[Media:Teamlist_2009-win32.txt|Team list 2009 (WIN32)]]
'''Wiki Data'''
'''Wiki Data'''
-
* [# Wiki data 2007 (ZIP)]
+
* [http://www.lsdlive.org/misc/wdata_2007.zip Wiki data 2007 (ZIP)]
-
* [# Wiki data 2008 (ZIP)]
+
* [http://www.lsdlive.org/misc/wdata_2008.zip Wiki data 2008 (ZIP)]
-
* [# Wiki data 2009 (ZIP)]
+
* [http://www.lsdlive.org/misc/wdata_2009.zip Wiki data 2009 (ZIP)]
'''SIP Database'''
'''SIP Database'''
-
* [[Media:Wsip_2007.db.zip|SIP words database 2007]]
+
* [[Media:Wsip_2007.db.zip|SIP words database 2007 (SQLITE3)]]
-
* [# SIP words database 2008]
+
* [http://www.lsdlive.org/misc/wsip_2008.db.zip SIP words database 2008 (SQLITE3)]
-
* [# SIP words database 2009]
+
* [http://www.lsdlive.org/misc/wsip_2009.db.zip SIP words database 2009 (SQLITE3)]
<br />
<br />

Revision as of 17:23, 26 October 2010



SIP Wiki Analyser : Downloads





Team List

Wiki Data SIP Database
To read databases, use sqlite3.

Warning : Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.
You can also use html2text, but if the software find special character, it don't remove the html.
Also, I know there're some few bugs like in 2009's files : Illinois-tools is not downloaded... But unfortunately, I didn't fix them before the deadline.

Notes about filters : In these files, there're no filters but you can make what you want : remove common name, keep only MeSH terms etc. See what you need!