Traffic
Analysis
By James Harvey Stout (deceased). This material
is now in the public domain. The complete collection of Mr. Stout's writing
is at
http://stout.mybravenet.com/public_html/h/
.
Jump to the following topics:
-
What is traffic analysis?
-
We can acquire many types of
data.
-
We can use different
means for traffic analysis.
-
Features of
traffic analysis software or services.
-
Factors which spoil traffic
analysis.
What is traffic analysis? It is any
means by which we acquire information about the people who visit our site.
Traffic analysis is also called tracking, or log analysis; the traffic records
are called logs, referrer logs, stats, or access stats.
We can acquire many types of
data.
-
The number of unique hits on each page.
-
This information tells us how many visitors we have received, so that we
can judge:
-
The popularity of the page.
-
The amount of money which we can charge our advertisers on the page.
-
The source of our visitors. Where were they before they came to our site?
(The information is usually in the form of a URL from which our visitor jumped
to our site.)
-
This information tells us the effectiveness of a particular banner, link,
email ad, Usenet message, or other place where our URL has appeared.
-
The path of our visitors through our website. For example, they might have
jumped from page A to page D to page H.
-
This information tells us the flow of our visitors. We might want to re-direct
that flow to a more-important page by various means (e.g., placing a larger
link button from other pages to that page).
-
The type of browser which is being used by our visitors. The information
is given in a percentage, e.g., 40% Netscape 4.x; 25% Internet Explorer 3.x,
etc.
-
This information tells us whether we need to make changes in our site to
accommodate these browsers, which might include Netscape 1.1 or non-graphical
browsers.
-
The time of day (and the day of the week) when the hits occurred.
-
With this information, we can plan our updates of new material, and our chat
sessions, and our down-time for maintenance, etc.
-
The average amount of time spent on a page (or at the entire site).
-
This information might indicate that this page is (1) uninteresting, or (2)
being visited for only one brief purpose, e.g., registering for a contest,
or clicking on a banner merely to gain money for the webmaster where our
banner appears.
-
The average number of pages visited at our site.
-
This information might indicates that the site is (1) uninteresting, or (2)
being visited for only one brief purpose, e.g., registering for a contest,
or clicking on a banner merely to gain money for the webmaster where our
banner appears.
-
The primary entrance page. Some people won't come in through our main page.
-
This information tells us that we need to put some introductory information
(and navigational aids) on this page, so that the new visitors will know
something about the website as a whole.
-
The primary exit page. Which page are our visitors viewing when they leave
to go to a different website?
-
This information indicates that this page is (1) uninteresting, or (2) containing
a banner which draws away our customers.
-
The number of new visitors and returning visitors.
-
This information tells us whether we are attracting new customers, or merely
recycling the previous ones.
-
The search engines which sent these visitors to us.
-
This information tells us that we might need to spend more time improving
our position at a particular search engine.
-
The keywords which were used to find our site at a search engine.
-
This information indicates the keywords which are most effective. We can
emphasize these keywords in our search-engine listings, and on our pages.
-
The nation of our visitors.
-
This information is useful if our advertising is targeting the people of
that nation.
We can use different
means for traffic analysis.
-
We can install software onto our server, or onto the server of our
website-hosting service. Some of the software packages are freeware.
-
We can use the software which is already available from our website-hosting
service. Many ISP's and website hosts provide counters or complete traffic
analysis.
-
We can use the traffic analysis which is provided by some banner exchanges.
In the regular business of tracking the activities of a banner, some of these
exchanges will share their traffic-analysis data with us.
-
We can get traffic analysis from a company which specializes in this service.
We pay for the stats via a monthly fee (e.g., $5/month), or by putting the
company's banner onto our site.
-
We can gather information via forms. We learn about our visitors when they
fill in forms at our website for registrations, surveys, credit-card payments,
contest entries, mailing-list membership forms, etc.
-
We can use a simple counter (from our webhost, a script on our server, a
company which manages counters for people's websites, or another source).
However, counters are not very useful for traffic analysis:
-
They give only a number. They do not give other types of information regarding
our visitors (e.g., their origin, their path through our website, etc.)
-
They are distrusted by visitors. We all know that counters can be altered,
and they can be given a "starting number" of 10,000 or more.
-
They can be embarrassing if our number is very low, and/or if it increases
very slowly day-by-day.
-
They can cause a page to load more slowly, particularly if they use Java.
-
We can use codes.
-
We can code our web pages. For example, my main page is
http://www.james-harvey-stout.com. But if I want to know how many replies
I am receiving from a particular ad, I can make a copy of that main page,
and add an extension to the URL. For example, the ad could give this URL:
http://www.james-harvey-stout.com/ad345; when I receive hits to this page,
I know that they all originated from that ad (which I have designated as
"ad345"). However, some search engines will penalize us for having more than
one copy of a page, even though these copies are not for the purpose of
search-engine spamming.
-
We can code our emails.
-
We can use different email aliases. For example, we might use
sales12@james-harvey-stout.com for our ad in one directory, and
sales13@james-harvey-stout.com for our ad in a different directory. When
we see one of these email addresses, we know where the email originated.
-
We can add an extension to our email address in an email link. The extension
is "?subject=", and it is used like this:
mailto:feedback@james-harvey-stout.com?subject=traffic-analysis. When someone
clicks on this email link, "traffic-analysis" is automatically put into the
"subject" field of the email. And when we look at the subject field, we know
the origin of the email.
Features of
traffic analysis software or services.
-
Price. We might be paying for our own traffic-analysis software, or for the
services of a company which monitors our traffic. Some counter providers
and traffic analysis services allow us to "pay" by putting their banner onto
our website.
-
Ease of use. The software should be fairly simple to learn.
-
Tech support. By phone or email. Quick responses? 24x7?
-
Variety of stats. Refer to the list above; for example, we want to know the
number of visitors, their origin, their path through our site, etc.
-
Graphical display of the stats. Our stats might be displayed in numbers only,
or they might also be displayed in tables and graphs.
-
Easy access to our stats. We might have to download the information and then
run it through some analysis software. Or the stats might be available simply
by going to the website of the traffic-analysis provider.
-
Privacy. Some counters and stats are easily accessible to our visitors (and
our competitors); others require a password.
Factors which spoil traffic
analysis.
-
Caching. When people come to our website, their ISP has to download our pages.
If dozens of the ISP's subscribers are coming to our website, the ISP might
reduce its bandwidth usage by caching our site on its own server (so that
it can be delivered to the subscribers from this internal cache instead of
the internet). When caching occurs, we do not get reliable stats; people
are accessing our website from the ISP's cache, and they aren't visiting
our site at all. We can try various solutions to prevent the unreliable stats
which are caused by caches:
-
We can ask our visitors to click on their browsers' "reload" or "refresh"
button to force a reload.
-
We can add the following code to the head of our website: <META
HTTP-EQUIV="expires" CONTENT="0"> . When this code is used, the caching
server knows that its copy has expired, so it must come back to our site
to get a new copy whenever anyone types our URL. (Unfortunately, this command
also interferes with our visitor's cache; whenever they go to another page
at our site and then they come back to this one, the entire page must reload
from our server.)
-
Fibbing on surveys. When we ask questions, some people will lie (perhaps
just for fun); for example, a young man might say that he is an 85-year-old
woman.
-
Untraceable visitors. Our traffic-analysis software might not be able to
detect some of the information regarding our visitors. For example, some
software does not indicate the keyword which our visitor used at a search
engine.