• Define information retrieval.
• Define online information retrieval (IR) and provide examples.
• Define web search engine, web meta-engine, and web directory.
• Identify and describe web search engine specific tools.
• Define open access and provide examples.
• Define the deep web or “hidden web” and provide examples.
• Evaluate web-based resources.
• Identify fake websites and approaches to check authenticity.
• Define ontology and Semantic Web.
Information Retrieval (IR) involves obtaining relevant information resources to address a particular information need.
The term has become associated with online information, but can be applied to any type of information resource.
Web search engines like Google are the most visible information retrieval systems. However, there are many other options that provide better access to specific types of information.
A finding tool is any resource used to locate information in a library. These tools may include the library catalog, bibliographic databases, indexes, and abstracting services.
Online information retrieval involves identifying relevant information for an information need using a device connected to the Internet.
A user enters a query into the system. The system provides objects that match the query. In most cases the results are organized by their degree of relevancy.
An object is something that is represented by information in a database such as a text document, image, audio, video, or other item. In some cases, the objects are stored directly on the IR system. However, they are often simply represented with document surrogates or metadata. For instance, rather than playing the mp3 file for you, the system might display the title, type of file, length, and description.
Web Search Engine
A web search engine is a web-based software tool designed to search for information on the Internet.
The results of a keyword query are presented as Search Engine Result Pages (SERPs). The results often include a list of items including the title, URL, and description or snippet. It’s important to me use of the snippet provided. Users often overlook the valuable information a snippet provides about an item that will help determine whether the item should be kept of ignored.
Below is an example of a Google SERP.
Although search engines may be maintained by human editors, they are generally run by an algorithm on a web crawler.
A web crawler is an automated system that browses the Web indexing web pages. The most popular search engines use a combination of page content along with web page metadata tags.
Popular Search Tools
Some search engines display results in interesting ways. For instance Instagrok is a more visual tool.
Some search engines are focused on particular formats such as images, maps, audio, or video. Google Images is an example.
Some search engines provide results for a particular specialty area such as law or medicine.
Conduct practice searches using the search tools on this page. Compare the results. Try searches for concrete objects such as frogs and bananas. Also, try searching for abstract concepts such as freedom and poverty. Compare the results.
Watch the video How Search Works at YouTube.
A web meta-engine provides results from a number of popular search engines.
A web directory is a directory of resources found on the Web.
Most web directories are edited by humans who organize links into
categories and subcategories.
The DMOZ.org open directory project is an example.
Search Engine Search Strategies
Go to Google: Inside Search’s How Search Works. Explore this engaging, dynamic infographic.
Web search engines provide both basic and advanced searches.
Advanced searches allow users to either make choices such as limiting a range of dates or choosing formats of interest.
Advanced searches usually allow the use of search operators to refine a search. Search operators involve the addition of symbols or words to a search statement to gain more control over the results.
For instance, some websites have poor search engines. It’s possible to use Google to search a specific website by using the site: search operator. To see more examples of search operators, go to Google Search Operators.
Google's Advanced Search
Go to Google Search Features. Explore the wide range of search options. Then, read the examples below:
- site:jstor.org invasive species to search the JSTOR site.
- Use site:gov cancer research to only locate government resources.
- Use the ~tilde for similar words such as ~healthy recipes
- Use quotes for an exact word or phrase such as “to be or not to be”
- Use a dash to exclude a word such as thunderbird -car or board games - chess - checkers
or vikings -Minnesota or depression - great
- Use related: to locate websites that are similar to another site such as related:nytimes.com short stories
- Use link:eduscapes.com to see who linking to eduscapes.com.
This is useful when searching for fake sites. Try link:zapatopi.net/treeoctopus/
- Use a number range such as superbowl winners 1960...1970
- Use book Lord of the Rings if you’re seeking a book.
- Use define: technology if you’re seeking a definition on any topic.
- Use population or unemployment if you’re seeking statistics such as population Seattle.
- Use weather, time, sunrise, or sunset such as time Seattle.
- Use tectonic plates filetype:ppt if you’re looking for PowerPoint presentations.
- Use location:australia when searching by location
- Use places such as parks 50322 for maps in a zipcode.
In addition to finding information, Google is useful for answering basic questions.
- You can enter an equation like 3*4+6/7 or a conversion like 5 miles to km or flight information like flight from Seattle to Denver or a map like map 50322.
- Some terms will automatically provide information rather than results such as Claritin provides medication information, GOOG provides stock quotes and a tracking number provides tracking information.
- Some search operators assist with specialty search engines such as Google’s Image Search.
- Use frog imagesize: 150x150 to search for a particular image size.
- Use frog filetype: jpg to search for a particular filetype.
- Use astronaut source:life to look in a specific source such as the Life photo collection.
- Many more options are available when you choose SEARCH TOOLS within the search results.
Beyond Text Input
When you think of conducting a search, your first thought is probably to type words in the search box. However there are other ways to input information for a search.
In Google, users can speak their search. This is useful for times when you don’t know how to spell a word or don’t want to type on your smartphone. It’s also helpful for those with disabilities. Just click on the microphone and begin speaking. Go to Google and give it a try!
When you’re in the Google Images Search, you have the option to search using a image. For instance, let’s say you find an image and aren’t sure who is in the photo. Drag the file such as rose.jpg into the search box. If it recognizes the image, it sometimes even adds a search term as it did with Helen Keller. Go to Google Images Search and give it a try!
Once you begin locating information, you’ll discover that not all information is available to the public for free.
Open access is information made available for free through the Internet. Access to content may be available through a publisher’s online archive or through an open access repository.
Open access is also the practice of allowing unrestricted access to peer-reviewed scholarly research. This approach reduces costs for libraries and allows authors to maintain copyright and distribution rights on their research.
Deep or Hidden Web
The deep web is the part of the World Wide Web that is not indexed by standard search engines. The deep web is also called the hidden web or invisible web.
Most of the resources found on the web are from “dragging a net across the surface” according to Mike Bergman founder of BrightPlanet. It’s only by digging deep that it’s possible to find these hidden resources.
Don’t confuse the deep, hidden, or invisible web for the darknet. The darknet refers to anonymous networks where IP addresses are not publicly shared. Sometimes referred to as the underground net, it’s full of illegal activity and dissident political communications.
Deep Web Examples
Over the past few years, Google has continued to work on providing access to more and more of the lesser knowledge collections and resources. However here are a few examples of places where you can sometimes do a more effective search within the website to access lesser known resources. You can still get to many of these resources through Google, however they may not come up high in the search engine.
Send some time exploring search tools, directories, and sources of deep web access. Compare your results with what you find with a search engine like Google:
- Catalog of US Government Publications
- Directory to Open Access Journals
- Virtual Library
Searching the hidden web is particularly helpful for professionals in areas such as medical research, law, and government work.
Explore some of the resources above. Compare your results with Google.
Once web-based resources have been identified, they must be evaluated. This involves much more than a surface level examination.
Rather than lumping all web-based resources together, consider the different types of online content.
When working with patrons use the ABCDs to help library users remember:
- Authority & Accuracy
- Depth & Detail
Authority and Accuracy
Use the following questions when evaluating the authority and accuracy of a website:
- Who wrote the information?
- Who sponsors the information?
- What does the URL tell you about the sponsor?
- What is the author or publisher's reputation?
- Are they credible?
- What makes the author an expert in this area: education, profession, hobby?
- What else has the author written?
- Can you find others who have cited this author?
- Is the content accurate?
- Is the site technically correct in terms of both technical and grammatical errors?
- Does the ABOUT THIS SITE page provide useful information about the authority?
Example: The NOAA government website has an ABOUT page with information about the agency and the website.
Use the following questions when evaluating the balance of a website:
- Is the function of the website clear?
- Does it serve it’s purpose?
- Is the website objective?
- What's the purpose of the content? Is it for public service, education, or promotion?
- How do sources compare: do they complement or conflict with each other?
- Are various perspectives provided?
- Are the sources likely to be biased? How?
- Is the content balanced or does it advocate a particular perspective?
Example: The Greenpeace website advocates for a particular point of view. When using a website that clearly represents a specific perspective, it’s important to check facts and be aware of the bias. Simply representing a point of view is not a reason to eliminate the site from use. Instead, it’s a reason to use caution.
Use the following questions when evaluating the currency of a website:
- Is the date of the content or website update clearly stated?
- How timely is the information based on the copyright or the dates cited in the article?
- How current is the information in comparison to other sources?
- Is currency important with this topic? Why or why not?
- How likely that information on this topic has changed recently or will change in the near future?
Although currency can be important in some cases, use of historical materials may also be important. The Wayback Machine allows users to go back and look at websites from earlier times. Try the Wayback Machine machine from Archive.org.
Example: The United States Census Bureau maintains the latest statistics from the US Census.
Depth and Detail
Use the following questions when evaluating the depth and detail of a website:
- Does the source provide an overview or surface level exploration or in-depth information?
- Are specific examples provided?
- Do the examples clarify the information?
- Are arguments provided and evidence cited?
- Are the sources for factual information cited?
- Can individual authors and cited works be verified through examining citations (e.g, Google Scholar)?
- Are the links relevant and operational?
While Wikipedia can provide a useful overview of information, use the external links or references at the bottom of a wikipedia page for more in-depth information and access to original sources of information.
Example: The All About Birds website from The Cornell Lab of Ornithology provides in-depth information about birds.
Fact, Fiction, Fake
It’s important that library users can distinguish fact from fiction from fake websites.
Many library users are simply looking for the answer to a question and my receive
misinformation if they don’t know how to identify fake sites.
Visit some of the following fake websites. Besides using common sense, what are some ways you can identify a fakes?
Use the link: search operator in Google to search for what websites link to the possible fake website.
If you’re not sure about the content of any website or email you receive, check Snopes. This website keeps track of mischief on the Web such as urban legend and rumors.
Explore Snopes looking for examples of online mischief.
In information science, ontology represents a set of knowledge related to a particular domain that shares a vocabulary.
Ontologies are frameworks for organizing data within a particular domain such as library science or the Semantic Web.
Most ontologies can be organized into individuals (e.g., items, objects, instances), classes (e.g., collections, sets, concepts); attributes (e.g., features, characteristics, properties); and relations (e.g., connections among individuals and classes).
Ontologies are useful in creating shared understandings of the structure of information. They are also helpful for knowledge analysis.
Dublin Core is a simple ontology for documents.
The Semantic Web
The Semantic Web is a movement led by the World Wide Web Consortium (W3C) that also develops standards for HTML, CSS, and XML.
The Semantic Web is based on W3C’s Resource Description Framework (RDF). This family of specifications began with a model for metadata data, but has been extended to a wide range of web resources and activities. The vision is that well-constructed data can be interpreted by computers lessening the need for humans effort.
Semantics is the study of meaning, specifically understanding human expressions. The Semantic Web approach includes semantic (meaningful) content into web pages in an attempt to create a web of data that is more structured and useful.
The purpose of the Semantic Web is to help users find, combine, and share information more easily, effectively, and efficiently.
GoPubMed is a project that attempts to create Semantic Web solutions for biomedical texts. The combination of Gene Ontology and Medical Subject Headings (MeSH) helps structure millions of articles for instant access. It’s the next step in PubMed.
Online Information Retrieval (IR) involves much more than conducting a basic web search.
Many web search engines, meta engines, and directories can be used to locate information on the Web.
The key to effective and efficient searching is the use of search operators and the advanced features of web search engines.
When evaluating websites, remember the ABCDs of evaluation.
Ontology and the study of the Semantic Web are two emerging areas in LIS.