Saturday, November 20, 2010

Unit 11, Web Search & OAI Protocol: Reading Notes

Web Search Engines: Parts 1 & 2

A very interesting pair of articles. Part 1 is about the infrastructure and crawling algorithms of search engines; part 2 discusses indexing and query processing algorithms. I enjoyed reading about spam rejection in the first article. Spamming a search engine is such a strange concept. Though I can, conceptually speaking, understand why it happens, exactly what would motivate a person to be so deceptive is a mystery to me. Do these websites really benefit so much from misleading search engines? And the tactics they use are very sneaky but also a little silly. Really, white text on a white background? That’s like using lemon juice as invisible ink.

I also enjoyed reading about the huge range of vocabulary on the Web. The way that technology shapes language is fascinating. I’d always thought of it as limiting our vocabulary to shorter, more widely-known words. It makes sense, though, that a technology that’s had such a profound affect on how we communicate would also add to the languages we use to speak to each other.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting

This article describes the Open Archives Initiative for Metadata Harvesting. In part, it discusses how data providers and service providers interact through standardized metadata (Dublin Core). The OAI is a really interesting example of how open source affects library science. Different organizations that relate to a huge range of disciplines use the OAI but must find a common, controlled vocabulary and willingly use a set of standards in order for the community to thrive. This sort of cooperation seems like it would be difficult to manage but have rewarding outcomes.

The Deep Web: Surfacing Hidden Value

I’m completely fascinated by the deep web. It’s a topic that has come up in many of my classes this semester but I think that this article has done the best job of explaining why it exists and what it includes. The dissatisfaction most people feel using web search engines could probably be solved if those search engines were able to navigate the deep web. The list of the 60 largest deep web sites was very helpful to my understanding of what exactly the term “deep web” means. These are all really interesting, popular websites. I think most people assume they can access the information they contain through Google, even though that’s not really true.

No comments:

Post a Comment