Searching the Internet
Searching for information on the Internet can be a very frustrating
experience until you learn a few techniques to let the Internet do the
work for you. The most important tools for finding information on the
Internet is a collection of databases that are usually referred to as 'Search
Engines'.
Table of Contents
A VERY Large Haystack
The Internet is a very large information resource and it continues to
grow at a tremendous rate. Not only does the number of documents run into the
millions, this information can be found on thousands of different computers
all over the world.
This includes information developed by universities, or
government as part of a public mandate, information that has
been developed by a group or individual to benefit the community or
simply to make grandmothers favourite mitten pattern available to anyone
who likes to knit, and a rapidly growing body of information that has been
developed by business for commercial purposes.
Who's in charge?
Although it may seem strange... there really is no one in charge of
cataloguing the information available. The only practical approach is to
computerize the process and even this method is so costly that most cataloguing
projects that have been developed exist as partnerships between
universities, computer hardware manufacturers and now frequently business.
These projects are research projects for the universities, proving
grounds for the hardware manufacturers and an electronic billboard upon
which business can advertise its product.
There are a couple of dozen catalogues that make a serious attempt
to provide a broad index of materials on the Internet. Some of the first
of these were Webcrawler and Yahoo! but there are now many others with new
projects being formed on a regular basis.
Each project has tended to focused on a slightly different way of
collecting and cataloguing the data or focused on different resources. As
you become more familiar with individual search tools you may find
yourself using specific tools for specific searches.
How do 'Search Engines' work?
Gathering the information
The cataloguing projects tend to operate either on the basis that:
- they use software (often called spiders or robots) to search out
information. This approach usually leads to an index of many resources.
- they accept 'nominations', allowing groups or individuals to submit
the address of resources that should be included. This approach generally
results in the creation of a directory with resources listed within various
categories.
There is a third group that provide 'reviews' of selected sites but also offer
a general index of sites that have not yet been reviewed.
Each of these general approaches has it's strength (and weakness) with
all projects struggling to find better ways to help you find the
information you're looking for.
Introduction to using a search engine
Search engines generally involve the development of a database and a
interface that allows users to submit a request. The easiest access to these
'search engines' is usually through a data input form that you fill out and
submit over the World Wide Web.
Most 'search engines' employ forms that are similar in many respects. They
ask you to enter one or more words that will be used to define the search,
and they provide a 'button' that is used to "Submit" the request.
After you submit a request the search engine will produce a list of documents
that meet the search that you have specified.
Meta-search engines, are resources that allow you to submit an inquiry to more
than one search engine. Sometimes these will allow you to select specific
search engines or a type of search engine.
Controlling the search
Most search engines offer you an opportunity to define the search in ways that
place some restrictions on the documents that are returned. This is used to
make the search more specific and eliminate unwanted items.
Simple Controls - AND, OR, NOT
The simplest way of controlling a search is to combine two words, effective
saying... "Find all documents that contain both the word "electric" and
the word "train". This should eliminate documents that refer to
electric toothbrush, (unless someone has a document that talks about
training someone to use an electric toothbrush).
This type of search uses the word AND as a "logical operator" to control the
search. "AND" is one of several operators that can be very valuable when you
are searching for information. Others include "OR" and "NOT".
electric OR train effectively asks for all documents that contain
either the word "electric" OR the word "train". Usually this type of search
would find many more documents that match the search parameters.
electric NOT train asks for documents that contain the word "electric"
but not the word "train".
Caution!
Although there are is a group of basic 'operators' that are supported by most
search engines, there may be subtle (but important) differences in the way
they are used, and... some search engines may support a different selection of operators.
An example of the differences may be in whether the search engine is expecting
the operator to be in CAPITAL LETTERS. Sometimes this is important!
Complex Searches
Some search engines will allow you to combine search terms and operators to
create complex inquiries. As an example... Webcrawler will accept
Homer NOT (Simpson OR Alaska)
In this case parentheses are used to combine terms and operators into a
complex inquiry.
TIP - you should always review the on-line Help/Tip for a search engine
before you attempt to start a complex search using that resource.
Search Options
In addition, most search engines allow you to changes certain aspects of
the search to suite your purposes. These options often include:
- 1) The ability to specify the number of references returned...
- A broad search on a common word or phrase may find hundreds,
thousands or even hundreds of thousands of documents match the search
parameters. Usually search engines will return between 15-25
references unless you request more (or less).
- Controlling the ranking...
- Some 'search engines' attempt to find the 'most relevant' documents and will
usually sort the list to place the most relevant documents at the top of the
list. Determining the most relevant documents is not always easy to do and
sometimes a document will appear more relevant (perhaps because of the number
of times a key word or phrase appears) when in fact the program that assessed
these documents was fooled by a paragraph that contained you keyword twenty or
thirty times. Some search engines allow you to direct the ranking by adding a
secondary word or phrase.
- 2) One or more way of displaying the results...
- Some search engines will product results in more than one format. Usually
this relates to how much information is provided about the document that
matches the search parameters. As an example a search engine may provide a way
of selecting "Brief, Normal or Verbose" output.
You should experiment with different search options to determine which
(combination of) options product the desired output.
Factors That Affect the Results
Controlling the number of documents found
There is nothing like experience to improve your search results. Experimenting
with various search terms and search options is the best way to improve your
search results.
You can expect that at first you will either get thousands and thousands of
"document/search matches"... OR... almost nothing at all... ;-(.
As you become more familiar with a particular search engine you should find
that you can conduct a search in only one or two inquiries. Sometimes you will
find that your search is quicker if you plan you search in two or three
inquiries. Experienced searchers often try a very specific search first and
then broaden the search if they don't get the desired results.
Picking the right search engine
As noted perviously, not all search engines are created equal.... get to know
the personality of the various search engines.
Speed
A phrase that turns up on the Internet is "Your Mileage May Vary". There are
three main factors that will affect the speed of an Internet search.
- How many people are using the search engine...
- How busy the connection is between your computer and the search engine...
- How complex a search you are requesting...
New & Hot Usually = Busy!
There is often a direct relationship between the first two items (above) although
with many people using the latest and hotest search engine the older services
will at least in the short term seem a little quicker.
If possible... use a search engine that specializes in the resource you are
interested in. (News, software, etc..)
Off-Peak Hours are Fastest!
There is a tremendous difference in search speeds between peak usage and
off-peak hours. The morning hours are far better if you are using a search
engine in North America! As the day progresses, the Internet becomes very busy
and data transmission can slow noticeably.
Complex Searches
Although a complex search will take a little longer, this is probably not as
significant factor as the time of day when you do the search.
The more familiar you are with search engines easier it will become to find
material. Experiment with different search engines, run the same search on
several and compare the results.
Happy hunting!