Team:St Andrews/project/ethics/communication

From 2010.igem.org

(Difference between revisions)
(Premise)
(Premise)
Line 4: Line 4:
Realtime Internet communication is incrasingly common, the so called facebook generation are growing up aquainted with a dizzying array of instantanous comunication methods. The inception of email was hearlded as a revolution in communication, today the quantity of email traffic is at an all time low. In place of email instance messaging and social network messaging have come to precidence. Combined with the vast quanitites of blogs, forum posts, wikis and other forms of user generated content the volume of publically acessible communications is immense. From a human practices perspective this provides a vast and frequently changing dataset which gives insight into how people communicate.  
Realtime Internet communication is incrasingly common, the so called facebook generation are growing up aquainted with a dizzying array of instantanous comunication methods. The inception of email was hearlded as a revolution in communication, today the quantity of email traffic is at an all time low. In place of email instance messaging and social network messaging have come to precidence. Combined with the vast quanitites of blogs, forum posts, wikis and other forms of user generated content the volume of publically acessible communications is immense. From a human practices perspective this provides a vast and frequently changing dataset which gives insight into how people communicate.  
-
Before one can reap the benefits of having access to such a great pool of data one must answer the challenge of collecting this data. The web stores exobytes of data hence collecting and parsing the entirety of the available data is simply not an option. However this is not required, when interested in a set of related terms such as {synthetic biology, synbio, igem} one can disregard large portions of the web. Furthermore if one is considering gathering data relating to social commuinication then a number of start points quickly become apparent. Firstly serveral social networks offer a fairly standard XML based API and secondly virtually every so called web 2.0 site organises data via some form of chronological hierarchy (be it through metadata or simply via the removal of old articles from the home page of the site). These two features of the web allow for us to deduce a simple algorithm for collecting data. This algorithm would start at a number of popular hubs of communication where news
+
=Technical Solution=
 +
Before one can reap the benefits of having access to such a great pool of data one must answer the challenge of collecting this data. The web stores exobytes of data hence collecting and parsing the entirety of the available data is simply not an option. However this is not required, when interested in a set of related terms such as {synthetic biology, synbio, igem} one can disregard large portions of the web. Furthermore if one is considering gathering data relating to social commuinication then a number of start points quickly become apparent. Firstly serveral social networks offer a fairly standard XML based API and secondly virtually every so called web 2.0 site organises data via some form of chronological hierarchy (be it through metadata or simply via the removal of old articles from the home page of the site). These two features of the web allow for us to deduce a simple algorithm for collecting data. This algorithm would start at a number of popular hubs of discourse (such as large news sites, social networks, newspapers, journals, blogs etc) and procede to continue through every site linked from each site so long as a term of interest is found.
  crawler(searchterm):
  crawler(searchterm):

Revision as of 14:53, 6 September 2010


St Andrews from East Sands

University of St Andrews iGEM 2010

Welcome!

The Saints

University of St Andrews iGEM 2010

Our first year at iGEM!

Communicaiton

Premise

Realtime Internet communication is incrasingly common, the so called facebook generation are growing up aquainted with a dizzying array of instantanous comunication methods. The inception of email was hearlded as a revolution in communication, today the quantity of email traffic is at an all time low. In place of email instance messaging and social network messaging have come to precidence. Combined with the vast quanitites of blogs, forum posts, wikis and other forms of user generated content the volume of publically acessible communications is immense. From a human practices perspective this provides a vast and frequently changing dataset which gives insight into how people communicate.

Technical Solution

Before one can reap the benefits of having access to such a great pool of data one must answer the challenge of collecting this data. The web stores exobytes of data hence collecting and parsing the entirety of the available data is simply not an option. However this is not required, when interested in a set of related terms such as {synthetic biology, synbio, igem} one can disregard large portions of the web. Furthermore if one is considering gathering data relating to social commuinication then a number of start points quickly become apparent. Firstly serveral social networks offer a fairly standard XML based API and secondly virtually every so called web 2.0 site organises data via some form of chronological hierarchy (be it through metadata or simply via the removal of old articles from the home page of the site). These two features of the web allow for us to deduce a simple algorithm for collecting data. This algorithm would start at a number of popular hubs of discourse (such as large news sites, social networks, newspapers, journals, blogs etc) and procede to continue through every site linked from each site so long as a term of interest is found.

crawler(searchterm):
 links = Stack S
 S = {facebook, twitter, bbcnews, cnn, foxnews, guardian, times, nytimes .. myawesomeblog}
 while S is not empty or arbritary threshold: 
   crawlerparser(S, searchterm)
crawlerparser(links, searchterm):
 for each link in links
   results = results += link containing searchterm
   for each result in results
     if result is old 
       disregard
     else 
       add all hyperlinks in result to S
       print result