Team:Paris Liliane Bettencourt/Project/SIP/Downloads

From 2010.igem.org

(Difference between revisions)
Line 53: Line 53:
<i><b>Warning :</b> Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.<br />
<i><b>Warning :</b> Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.<br />
You can also use html2text, but if the software find special character, it don't remove the html.<br />
You can also use html2text, but if the software find special character, it don't remove the html.<br />
-
Also, I know there're some few bugs like in 2009's files : Illinois-tools is not downloaded... But unfortunately, I didn't fix them before the deadline.
+
Also, some team missed in databases (28 for 2009, and 10 for 2008) : to make the database, I reverse '-' char by '_', cause sqlite3 don't work with this char, but I forgot to change the name to download the team, so the url was bad, and files were not downloaded. Each team with '-' char was not downloaded. I can't re-gen the database, cause it take me a lot of time to compute that, so I've just recompute team with '-' char (dictionary table is unchanged).
<br />
<br />
<br />
<br />

Revision as of 08:19, 27 October 2010



SIP Wiki Analyser : Downloads





Team List

Wiki Data SIP Database
To read databases, use sqlite3.

Warning : Notice these files are generated using "links -dump" to remove html, to speed the process, but you can do without that, because SIP will remove them later. With links, some pages with special characters like '(' ')' and ':' in their name are not converted, we consider it's not very important, because it's a small number of pages, but you can re-gen the database without the html parse step.
You can also use html2text, but if the software find special character, it don't remove the html.
Also, some team missed in databases (28 for 2009, and 10 for 2008) : to make the database, I reverse '-' char by '_', cause sqlite3 don't work with this char, but I forgot to change the name to download the team, so the url was bad, and files were not downloaded. Each team with '-' char was not downloaded. I can't re-gen the database, cause it take me a lot of time to compute that, so I've just recompute team with '-' char (dictionary table is unchanged).

Notes about filters : In these files, there're no filters but you can make what you want : remove common name, keep only MeSH terms etc. See what you need!