Marketing for Libraries Logo

Information Retrieval

Learning Objectives
• Define information retrieval.
• Define online information retrieval (IR) and provide examples.
• Define web search engine, web meta-engine, and web directory.
• Identify and describe web search engine specific tools.
• Define open access and provide examples.
• Define the deep web or “hidden web” and provide examples.
• Evaluate web-based resources.
• Identify fake websites and approaches to check authenticity.
• Define ontology and Semantic Web.

Information Retrieval (IR) involves obtaining relevant information resources to address a particular information need.

The term has become associated with online information, but can be applied to any type of information resource.

Web search engines like Google are the most visible information retrieval systems. However, there are many other options that provide better access to specific types of information.

A finding tool is any resource used to locate information in a library. These tools may include the library catalog, bibliographic databases, indexes, and abstracting services.

Information Retrieval

Online information retrieval involves identifying relevant information for an information need using a device connected to the Internet.

A user enters a query into the system. The system provides objects that match the query. In most cases the results are organized by their degree of relevancy.

An object is something that is represented by information in a database such as a text document, image, audio, video, or other item. In some cases, the objects are stored directly on the IR system. However, they are often simply represented with document surrogates or metadata. For instance, rather than playing the mp3 file for you, the system might display the title, type of file, length, and description.

hunger games

Web Search Engine

A web search engine is a web-based software tool designed to search for information on the Internet.

The results of a keyword query are presented as Search Engine Result Pages (SERPs). The results often include a list of items including the title, URL, and description or snippet. It’s important to me use of the snippet provided. Users often overlook the valuable information a snippet provides about an item that will help determine whether the item should be kept of ignored.

Below is an example of a Google SERP.

alaska

Although search engines may be maintained by human editors, they are generally run by an algorithm on a web crawler.

A web crawler is an automated system that browses the Web indexing web pages. The most popular search engines use a combination of page content along with web page metadata tags.

Popular Search Tools

Google is the most popular search engine. Others include Bing and Yahoo!.

Some search engines display results in interesting ways. For instance Instagrok is a more visual tool.

Some search engines are focused on particular formats such as images, maps, audio, or video. Google Images is an example.

Some search engines provide results for a particular specialty area such as law or medicine.

Try It!
Conduct practice searches using the search tools on this page. Compare the results. Try searches for concrete objects such as frogs and bananas. Also, try searching for abstract concepts such as freedom and poverty. Compare the results.

Watch!
Watch the video How Search Works at YouTube.

A web meta-engine provides results from a number of popular search engines.

For instance, Dogpile includes results from Google, Yahoo!, and Yandex in their searches.

A web directory is a directory of resources found on the Web.

Most web directories are edited by humans who organize links into
categories and subcategories.

The DMOZ.org open directory project is an example.

Search Engine Search Strategies

Try It!
Go to Google: Inside Search’s How Search Works. Explore this engaging, dynamic infographic.

Web search engines provide both basic and advanced searches.

Advanced searches allow users to either make choices such as limiting a range of dates or choosing formats of interest.

Advanced searches usually allow the use of search operators to refine a search. Search operators involve the addition of symbols or words to a search statement to gain more control over the results.

For instance, some websites have poor search engines. It’s possible to use Google to search a specific website by using the site: search operator. To see more examples of search operators, go to Google Search Operators.

site

Google's Advanced Search

Go to Google Search Features. Explore the wide range of search options. Then, read the examples below:

In addition to finding information, Google is useful for answering basic questions.

Beyond Text Input

When you think of conducting a search, your first thought is probably to type words in the search box. However there are other ways to input information for a search.

In Google, users can speak their search. This is useful for times when you don’t know how to spell a word or don’t want to type on your smartphone. It’s also helpful for those with disabilities. Just click on the microphone and begin speaking. Go to Google and give it a try!

far

When you’re in the Google Images Search, you have the option to search using a image. For instance, let’s say you find an image and aren’t sure who is in the photo. Drag the file such as rose.jpg into the search box. If it recognizes the image, it sometimes even adds a search term as it did with Helen Keller. Go to Google Images Search and give it a try!

keller

keller

Open Access

Once you begin locating information, you’ll discover that not all information is available to the public for free.

Open access is information made available for free through the Internet. Access to content may be available through a publisher’s online archive or through an open access repository.

Open access is also the practice of allowing unrestricted access to peer-reviewed scholarly research. This approach reduces costs for libraries and allows authors to maintain copyright and distribution rights on their research.

Deep or Hidden Web

hiddenThe deep web is the part of the World Wide Web that is not indexed by standard search engines. The deep web is also called the hidden web or invisible web.

Most of the resources found on the web are from “dragging a net across the surface” according to Mike Bergman founder of BrightPlanet. It’s only by digging deep that it’s possible to find these hidden resources.

Don’t confuse the deep, hidden, or invisible web for the darknet. The darknet refers to anonymous networks where IP addresses are not publicly shared. Sometimes referred to as the underground net, it’s full of illegal activity and dissident political communications.

Deep Web Examples

Over the past few years, Google has continued to work on providing access to more and more of the lesser knowledge collections and resources. However here are a few examples of places where you can sometimes do a more effective search within the website to access lesser known resources. You can still get to many of these resources through Google, however they may not come up high in the search engine.

Send some time exploring search tools, directories, and sources of deep web access. Compare your results with what you find with a search engine like Google:

Searching the hidden web is particularly helpful for professionals in areas such as medical research, law, and government work.

Try It!
Explore some of the resources above. Compare your results with Google.

Website Evaluation

abcdOnce web-based resources have been identified, they must be evaluated. This involves much more than a surface level examination.

Rather than lumping all web-based resources together, consider the different types of online content.

When working with patrons use the ABCDs to help library users remember:

Authority and Accuracy

Use the following questions when evaluating the authority and accuracy of a website:

Example: The NOAA government website has an ABOUT page with information about the agency and the website.

noaa

Balance

Use the following questions when evaluating the balance of a website:

Example: The Greenpeace website advocates for a particular point of view. When using a website that clearly represents a specific perspective, it’s important to check facts and be aware of the bias. Simply representing a point of view is not a reason to eliminate the site from use. Instead, it’s a reason to use caution.

greenpeace

Currency

Use the following questions when evaluating the currency of a website:

Although currency can be important in some cases, use of historical materials may also be important. The Wayback Machine allows users to go back and look at websites from earlier times. Try the Wayback Machine machine from Archive.org.

Example: The United States Census Bureau maintains the latest statistics from the US Census.

census

Depth and Detail

Use the following questions when evaluating the depth and detail of a website:

While Wikipedia can provide a useful overview of information, use the external links or references at the bottom of a wikipedia page for more in-depth information and access to original sources of information.

Example: The All About Birds website from The Cornell Lab of Ornithology provides in-depth information about birds.

birds

Fact, Fiction, Fake

It’s important that library users can distinguish fact from fiction from fake websites.

Many library users are simply looking for the answer to a question and my receive
misinformation if they don’t know how to identify fake sites.

Google is known for their April Fool’s fake websites such as Google Gulp and Google Nose.

Visit some of the following fake websites. Besides using common sense, what are some ways you can identify a fakes?

Use the link: search operator in Google to search for what websites link to the possible fake website.

If you’re not sure about the content of any website or email you receive, check Snopes. This website keeps track of mischief on the Web such as urban legend and rumors.

Try It!
Explore Snopes looking for examples of online mischief.

Ontology

In information science, ontology represents a set of knowledge related to a particular domain that shares a vocabulary.

Ontologies are frameworks for organizing data within a particular domain such as library science or the Semantic Web.

Most ontologies can be organized into individuals (e.g., items, objects, instances), classes (e.g., collections, sets, concepts); attributes (e.g., features, characteristics, properties); and relations (e.g., connections among individuals and classes).

Ontologies are useful in creating shared understandings of the structure of information. They are also helpful for knowledge analysis.

Dublin Core is a simple ontology for documents.

The Semantic Web

go pubThe Semantic Web is a movement led by the World Wide Web Consortium (W3C) that also develops standards for HTML, CSS, and XML.

The Semantic Web is based on W3C’s Resource Description Framework (RDF). This family of specifications began with a model for metadata data, but has been extended to a wide range of web resources and activities. The vision is that well-constructed data can be interpreted by computers lessening the need for humans effort.

Semantics is the study of meaning, specifically understanding human expressions. The Semantic Web approach includes semantic (meaningful) content into web pages in an attempt to create a web of data that is more structured and useful.

The purpose of the Semantic Web is to help users find, combine, and share information more easily, effectively, and efficiently.

GoPubMed is a project that attempts to create Semantic Web solutions for biomedical texts. The combination of Gene Ontology and Medical Subject Headings (MeSH) helps structure millions of articles for instant access. It’s the next step in PubMed.

Conclusion

Online Information Retrieval (IR) involves much more than conducting a basic web search.

Many web search engines, meta engines, and directories can be used to locate information on the Web.

The key to effective and efficient searching is the use of search operators and the advanced features of web search engines.

When evaluating websites, remember the ABCDs of evaluation.

Ontology and the study of the Semantic Web are two emerging areas in LIS.


| eduscapes | IUPUI Online Courses | Contact Us | 2014 Annette Lamb (Adapted from earlier s401 materials)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.