The whole idea of the internet is that it’s transient, distributed, and decentralized. What if you don’t want something archived? Who should pay for the machine that does the archiving of this exponentially astronomically large data set?
If you want a reliable archive of something, you’d better download it :)
It would be an interesting project, however, to track just certain pieces of it. Get top search data, and the pages that show up in the first few results of each of those searches, and crawl those pages a level or two in. You’d probably get some of the most “relevant” pages, so the record of what was tracked for any given time would itself be interesting sociological information.
1 Comment so far
Leave a comment
The whole idea of the internet is that it’s transient, distributed, and decentralized. What if you don’t want something archived? Who should pay for the machine that does the archiving of this exponentially astronomically large data set?
If you want a reliable archive of something, you’d better download it :)
It would be an interesting project, however, to track just certain pieces of it. Get top search data, and the pages that show up in the first few results of each of those searches, and crawl those pages a level or two in. You’d probably get some of the most “relevant” pages, so the record of what was tracked for any given time would itself be interesting sociological information.
By Isaac Z. Schlueter on 06.10.08 3:31 pm
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>