Search Engines, Directories, Etc.
By
James Harvey Stout (deceased). This material is now in the public
domain. The complete collection of Mr. Stout's writing is now at
http://stout.mybravenet.com/public_html/h/
>
Jump to the following topics:
- What are
search engines, directories, etc.?
- The
various types of search engines, directories, etc.
- Registering
with search engines, etc.
- Keywords.
- Meta tags.
- Other
suggestions for search engines, directories, etc.
What are
search engines, directories, etc.? They are collections of
information about websites; the information includes the URL and
description of each site. People visit these search engines, etc., to
find websites regarding a particular topic, e.g., "internet
marketing" or "baseball." We want to list our website in these search
engines, directories, etc., because most people use these resources
to find websites regarding a topic. And we want to be ranked among
the top 20 in our category, so that the people will see our listing;
if we are ranked #31,546, we might as well not be listed at all. This
chapter explains the fundamentals for gaining a high ranking;
however, it does not attempt to give details for each search engine,
etc., because (1) each one has its own criteria for determining the
ranking of sites; (2) those criteria are constantly changing; (3)
some of the criteria are kept secret; (4) this constantly changing
information can be delivered by experts who actively track the
changes in search engines and provide the data through consultations,
websites, and mailing lists. The following information is
generalized; for each search engine, etc., you need to decide which
of the ideas will apply.
The
various types of search engines, directories, etc.
- Search engines. These are collections of links which gather
their information via two means: (1) application forms where we
can describe our site, and (2) "robots" or "spiders" or
"crawlers," which are software programs which search the web for
new sites (and they also search for changes in existing sites).
Therefore, our site might be listed even if we do not submit an
application. Some of the biggest search engines are HotBot, Lycos,
InfoSeek, and WebCrawler.
- Directories. These are collections of links, but they gather
their information only through application forms; they do not use
spiders. Therefore, our site will not be listed unless we submit
an application. The biggest directory is Yahoo.
- Specialized search engines and directories. These sites focus
on a particular topic, e.g., a particular type of business,
particular industries, particular geographical areas, etc.
- Free-for-all sites and classified-ad sites. These sites accept
virtually any listing which is submitted; thus, the quality of
listings is usually low. At some sites, the listings are not even
categorized.
Registering
with search engines, etc.
- Before registering, we can make certain that our site is not
listed already. It is possible that a robot has visited our site.
- We can prepare our information in advance. Before going to the
search engines, etc., to register our site, we can write a
description of our website (and the goods or services which are
sold there). Each search engine, etc., allows a different number
of words for our description, so we can prepare descriptions of
various lengths: 20 words, 25 words, 50 words, etc. This
description is very important, so it should not be improvised;
instead, we need to give it as much care as we would give to our
most-important ad copy (with rewriting, proofreading, and
spell-checking). The description can be filled with our "keywords"
(which are explained later), but we cannot simply use a list of
keywords; instead, we need to write them into logical sentences.
We will copy-and-paste this description into each application
form.
- Read the rules for the application. Every search engine has
different rules.
- Fill in every essential field in the application. Our
application could be rejected simply for being incomplete.
- Be truthful in the application. After we submit an application
to a search engine or directory, our site will be visited by a
spider or by a human. If the website does not match our
description, the site will not be listed in the search engine.
- Submit only one application for the entire site -- or submit
each page individually. This is a gray area, depending on many
factors:
- We submit only the main page to a directory. Some
directories (e.g., Yahoo) need only the URL of the main page. A
human will visit the site, to gather information regarding the
other pages, for the indexing of our site.
- We submit only the main page to a search engine. If we
submit the main page, the robots will use our pages' links to
find (and index) the other pages of our site. However, some
robots look at only the first page, or only the first few
pages; the other pages will not be indexed. (One solution is to
have a link from our main page directly to all of our important
pages, so that the spider has to go down only one level to find
those pages.)
- We register the most-important pages. If we do this, we
increase the possibility that those pages will be indexed. Even
though our website has a particular overall theme, individual
pages address different issues which should be indexed in a
different category.
- We register all of our pages. Some people take this option.
- Don't submit too many pages (or too many in a single
day). The rules will indicate the maximum number of pages
which will be accepted.
- Indicate the pages which are not to be indexed. If there are
any pages which we do not want to be in the search engines, we can
use various means to tell the robots to skip those pages.
- A robots.txt file. This text file is put into our server's
directories to give instructions to robots; the file contains
the URLs of the pages which are not to be indexed. We have
various options.
- The file can be in the root directory, to give
instructions regarding the entire site -- or in a
subdirectory, to give instructions regarding that one
directory.
- The file can give instructions to particular robots.
(Search engines are not the only entities which operate
robots; there are also robots for link-checking software,
and for shopping agents, etc.) Our traffic software has a
list of robots which have visited our site.
- A meta tag. The four possible instructions include: index,
noindex, follow, and nofollow. We are telling the robot whether
to index the page, and whether to follow the links from that
page to other pages on our site. We use this syntax: <META
NAME="robots" CONTENT="noindex, nofollow">. This meta tag
(like other meta tags) goes within the <HEAD> tag on the
web page.
- We can re-submit pages later. As time goes by, our rank will
gradually slip, as other people's websites are submitted to the
search engines. (In some cases, a website simply "disappears" from
a search engine, for unknown reasons.) We can re-submit the site
every few months. (To avoid being accused of spamming the pages,
we should make some significant changes in them before
re-submitting these pages.)
- We can submit new pages. After our original application, we
will probably continue to expand our website. These new pages can
be submitted to the search engines.
Keywords. "Keywords" are the words which
are typed into a search engine's form to find websites on a
particular topic; for example, if we want to go to websites regarding
Corvettes, we would type in the word, "Corvettes." We need to use
keywords very carefully on our website, because these are the words
by which our potential customers will find us when they visit a
search engine. We can use these guidelines in the use of keywords.
- We use keywords in the site-description which is requested on
a search engine's submission form. In the description, the
keywords should be written into logical sentences rather than just
a list of the words.
- We can use specific terms. If we use a general term (e.g.,
"clothing"), our listing will be included in millions of other
listings in that category; virtually no one will find us in a
search. Instead, we need to use focused terms, (e.g., "leather
gloves").
- We can use keywords frequently throughout the site. "Keyword
density" refers to the number of times that our keywords appear on
the site; search engines actually count the keywords. (Some
webmasters calculate this keyword density as a percentage
of the words which are keywords; they might strive for
10%.) However, the following techniques would be considered
"spamming." (The possible penalty for this "keyword spamming": our
website can be banned from the search engine.)
- A simple list of keywords. Instead, use the keywords
in actual sentences and phrases.
- Excessive use of keywords. Each search engine has a limit
on the number of allowable repetitions of keywords; in some
cases, we will be penalized if we use a keyword more than three
or four times on a page.
- Hiding our keyword repetitions by putting the type into the
same color as the background, so that the text is not visible.
However, the search engines are aware of this trick, so they
look for it.
- We can put keywords into these locations on our website:
- The top portion of every page. Keywords which are in the
top portion will attract the most attention from search
engines. (If we have tables and graphics in the top portion, we
leave less room for keywords.) Some search engines will create
our listing simply by reading the first few sentences at the
top of a page; we need to be certain that the page doesn't
start with fluff, e.g., "Thank you for coming to my site."
- The title. This is not the heading of the page; it
is the phrase which appears above our page in the
browser's "title" field; in html, it is between the
<TITLE> and </TITLE> tags. We can use as many as 64
characters; if we use more than 64 characters, the extra
characters will be cut off in our search-engine listing. (The
title is important for another reason: these are the words
which will show up in a browser's bookmark if our visitors
bookmark the site.)
- Text. Keywords need to be used throughout the text of our
web pages, in virtually every paragraph.
- Photo captions. Keywords can be used here, too.
- Alt tags. An alt tag is the text which is displayed when a
graphic is absent from its place on the page (perhaps because
our visitors have turned off the graphics in their browsers).
Some search engines make note of the text in our alt tags, so
these tags should contain our keywords.
- Headings. These headings include the main heading on the
page, and all of the subheadings. Search engines consider
headings to be very important, so we should have some keywords
here.
- Links to external sites. If we have links to other people's
sites, the words in our link description will be picked up by
search engines. For example, I wanted to find the URL of a
software company. I had forgotten that my "freeware page" has
a link to that company. When I went to a search engine,
and typed in the name of the software, my freeware page came
up! I had not submitted that page to the search engine (nor had
the software company); apparently a robot had found the page
and had indexed it.
- Guestbooks and discussion boards. The keywords will be
indexed by robots.
- We can use variations of words:
- We can use some single words. For example, "books" or
"jewelry."
- We can use some phrases. For example, "science fiction
books" or "emerald jewelry." At search engines, most people use
two or more words in their searches. Also, phrases help to
narrow our focus; for example, instead of being virtually
invisible in a list of 10,000 jewelry dealers, we stand out in
a list of 10 emerald jewelry dealers.
- We can use plural forms of the words, if possible.
For example, if our keyword is "lamp," our site will come up in
a search of "lamp," but not "lamps." But if we use the plural,
"lamps," our site will come up in a search of "lamp" or
"lamps."
- We can use longer terms. This is similar to the idea of
using plural forms of words; for example, "lamp" shows up in a
search for "lamps" (but not vice versa) -- and "engineering"
shows up in a search for "engineer" (but not vice versa).
- We can use synonyms. For example, we might think that we
sell "sofas," but some people will search for "couches." We can
get synonyms from a thesaurus -- in a book, or in most
word-processing software.
- We can use misspellings. For example, if we sell water
faucets, we might want to add a misspelling such as "fawcets"
so that our site will come up when the word is misspelled by
people.
- We can use upper case and lower case. Some search engines
differentiate between capitalized letters and small letters;
i.e., the search engines are "case-sensitive." For example, if
our keyword is "Videocassettes," it might not show up in a
search which uses a lower-case "v": "videocassettes."
- We can use other variations. For example, if we sell water
skis, we can use the following keywords: water skis, waterskis,
waterskiing, water skiing, etc.
- We can use keywords which are based on the real-life use of
the product or service.
- Benefits and features. For example, our time-saving,
money-saving product can use these keywords: time management,
personal finances, etc.
- Our audience. For example, if we sell rototillers, one of
our keywords can be "gardeners."
- Parts or ingredients. For example, if we sell snacks, we
might use "chocolate" as a keyword.
- We can look at the keywords on our competitors' websites.
Particularly if those websites have high rankings in a search, we
can see the words, and how they are used throughout the page. We
won't plagiarize the text itself, but we can get some ideas for
our own keywords.
- We can check our traffic logs to see which keywords are being
used to find our site. We can add more of those words to our
pages.
- We should use different keywords for each page. Some keywords
will be used on virtually all of our pages, but each page has a
different emphasis. For example, if our entire site pertains to
vacation travel, one page might use "Bahamas vacation" as a
keyword phrase, while another page uses "Puerto Rico vacation" as
a keyword phrase.
- We can use the most-popular words at the search engines. Some
websites have lists of the commonly used words:
Meta tags. Meta tags are html tags which
describe a website. At a few of the major search engines, the
robots look at our meta tags, to determine the content of our
website for the purpose of indexing. Meta tags are not essential;
only two or three search engines refer to them -- and if we do not
have meta tags, those search engines will index our site by referring
to the text of the page. (Meta tags are not visible on the page, but
they can be viewed in "Page Source.")
- There are two types of meta tags.
- "Description" meta tag. This tag is a description of our
site, in 200 characters or less. We should use as many keywords
as possible -- but we are writing actual sentences, not merely
a list of keywords. The syntax is: <META NAME="description"
CONTENT="This is where we put the description of our site">
.
- "Keywords" meta tag. This is a list of keywords, in 1,000
characters or less (including punctuation and spaces). Search
engines will reject our page if we repeat a keyword too many
times in this meta tag; the limit varies, but we are safe if we
do not repeat a keyword more than three or five times. The
words are separated by commas, but no spaces. The syntax is:
<META NAME="keywords"
CONTENT="psychology,self-improvement,happiness"> .
- Meta tags are placed at the top of a page. They are between
the <HEAD> and </HEAD> tags, on every page of our
site. If we have javascript on our page, the meta tags should be
higher on the page than the javascript.
- Use different meta tags for each page of the site. Each page
is different, so it requires meta tags.
- Don't use competitors' names as keywords. We might be tempted
to use a competitor's name or product, so that a search for that
name or product would bring up our site, too. However, this
practice might be considered trademark infringement.
- Don't use keywords which are unrelated to the site. The
practice is prohibited by search engines. And it will attract
unqualified visitors, who are looking for the "sex" that we put
into our meta tags, when our site is actually about electronic
supplies.
- Put meta tags into frames, if you have frames. The tags go
into the <FRAMESET> page.
- Use the following "meta tag generators" if you need help in
creating meta tags:
- Sites which have meta tag generators.
- Free software which creates meta tags.
Other
suggestions for search engines, directories, etc.
- We can report dead links in search engines. If we are ranked
#10, but 2 of the preceding listings are dead links, we can report
those dead links to the search engine's management. After those
links are removed, we will automatically move up to #8. We are in
the same relative position, but we are reducing the frustration of
people who are using the search engine.
- We can report websites which violate the search engine's
rules. If we turn in violators who out-rank us, we move up in the
rankings. Those companies got their high ranking by cheating; that
is not fair to the honest people. (However, there is a possibility
that the company submitted that page when the current rules did
not yet exist.)
- We can check our rankings periodically. The ranking will
change as new websites are submitted, and dead ones are
eliminated. Sometimes our website will disappear entirely from the
search engine; when this happens, we have to re-submit the site.
These free services will reveal our rankings in the search
engines:
- We can increase the number of links from other people's sites
to our site. Search engines consider our "link popularity" when
determining our rank; if there are many links to our site, we
might receive a higher ranking.
- We can make certain that our site is always online. After our
site has been accepted by a search engine, the robot will return
occasionally to see whether the site is still here. (The site
might be offline if our server is down for maintenance or
repairs.) If the robot doesn't find our site, the search engine
might remove our listing.
- We can avoid "bait and switch" tactics. This is the practice
of submitting a website to the search engines, and then changing
the site immediately after the site has been indexed. (The initial
page might be designed merely for search-engine placement, while
the replacement page is designed for actual sales.) However, this
practice is not effective, because the robots can return at any
time -- and they will then index our replacement page. There are
variations of the bait-and-switch:
- Some people have tried to use cgi scripts which can detect
robots, so that the original page can be served. Apparently,
this method does not work.
- Another type of bait-and-switch (which is now prohibited)
is done with a refresh page; the visitor comes to a page which
is filled with keywords, but then the visitor is immediately
transferred from that page to the actual main page.
- We can refrain from using similar pages with different URLs.
Some people create more than one copy of their main page (or
another page); perhaps they want to tailor the page for compliance
with a particular search engine's criteria. However, this practice
is prohibited by search engines. (In particular, the search
engines look for filenames which indicate that there is more than
one index page; for example, the filenames might be index1.htm,
index2.htm, etc.)
- We can refrain from using free webhosts (e.g., geocities,
tripod). Some of the search engines reject any website which is
based at a free webhosting service.