Search Engine Optimization : SEO BOOK.
Avoiding things That Search Engines
Dealing with Frames
Frames were very popular a few years ago. A framed site is one in which
the browser window is broken into two or more frames, each of which
holds a Web page. Frames cause a number of problems. Some browsers don't
handle them well - in fact, the first frame enabled browsers weren't
that enabled and often crashed when loading frames. In addition, many
designers created framed sites without properly testing them. They built
the sites on large, high-resolution screens, so they didn't realize
that they were creating sites that would be almost unusable on small,
From a search engine perspective, frames create the following problems:
· Some search engines have trouble getting through the frame
definition or frameset page to the actual web pages.
· If the search engine gets through, it indexes individual pages,
not framesets. Each page is indexed separately, so pages that make sense
only as part of the frameset end up in the search engines as independent
· You cant point to a particular page in your site. This may
be a problem in the following situations:
v Linking campaigns. Other sites can link to only the front of your
site; they cant link to specific pages during link campaigns.
v Pay-per-click campaigns. If you are running a pay-per-click campaign,
you cant link directly to a page related to a particular product.
v Placing your products in shopping directories. In this case, you need
to be able to link to a particular product page.
Search engines index URLs - single pages, By definitation, a framed
site is a collection of URLs, and as such, search engines don't know
how to properly index the pages.
The HTML Nitty-Gritty of Frames
Here's a example of a frame-definition, or frameset, document:
<FRAMESET ROWS="110, *">
This document describes how the frames should be created. It tells the
browser to create two rows, one 110 pixels high and the other * high
- That is, whatever room is left over. It also tells the browser to
grab the navbar.htm document and place it in the first frame -the top
row - and place main.htm into the bottom frame. Most of the bigger search
engines can find their way through the frameset to the navbar.htm and
main.htm documents, so Google, for instance, indexes those documents.
Some older systems may not however effectively making the site invisible
Providing Search Engines With The Necessary Information
The first thing you can do is to provide information in the frame-definition
document for the search engines to index. First, add a TITLE and your
meta tags, like this:
<TITLE>Rodent Racing - Scores, Mouse Events, Rat Events, Gerbil
Events - Everything about Rodent Racing</TITLE>
<meta name="description" content="Rodent Racing -
Score, Schedules, everything Rodent Racing. Whether you're into mouse
racing, stoat racing, rats, or gerbils, our site provides everything
you'll ever need to know about Rodent Racing and caring or your racers.">
<META NAME="keywords" CONTENT="Rodent Racing, Racing
Rodents, Gerbils, Mice, Mouse, Rodent Races, Rat Races, Mouse Races,
Stoat, Stoat Racing, Rats, Gerbils">
Then at the bottom of the FRAMESET, add <NOFRAMES> tags - <NOFRAMES>
tags were originally designed to enclose that would be displayed by
a browser that couldn't handle frames - with <BODY> tags and information
inside , like this:
</H1>Rodent Racing - Everything You Ever Wanted to Know about
Events and the Rodent Racing Lifestyle</H1>
<p>[This site frames . so if you are reading this. Your browser
doesn't handle frames]</p>
<p>This is the world's top rodent-racing Web site. You wont find
more information about the world's top rodent-racing events anywhere
The <NOFRAMES></NOFRAMES> tags were originally intended
to display text for browsers that don't handle frames. Although few
people still use such browsers, you can use the NOFRAMES tags to provide
information to the search engines that they can index. E.g., you can
take the information from main.htm and place it into the NOFRAMES area.
Provide 200 to400 words of keyword-rich text to give the search engines
something to work with. Make sure that the content between the <NOFRAMES>
tags is about your site, and is descriptive and useful to visitors.
Providing A Navigation Path
You can easily provide a navigation path in the NOFRAMES area. Simply
add links in your text to other pages in the site. Include a simple
text-link navigation system on the page and remember to link to your
Remember also todo the following:
· Give all your pages unique<TITLE> and meta tags. Many
designers don't bother to do this for pages in frames because browsers
read only the TITLE in the frame-definition document. But search engines
index these pages individually, not as part of a frameset, so they should
all have this information.
· Give all your pages simple text navigation systems so that
a search engine can find its way through your site.
You'll run into one problem using these links inside the pages. The
links will work fine for people who arrive at the page directly through
the search engines, and any link that simply points at another page
will work fine in that situation someone who arrives at your home page
and sees the pages in the frames. But any link that points at a frame-definition
document rather than another page wont work properly if someone is viewing
the page in a frame.
Opening Pages In a Frameset
Given The way search engines work, pages in a Web site will be indexed
individually. If you've created a site using frames, you want the site
displayed in frames. You don't want individual pages pulled out of the
the browser to load the frameset. Of course, this wont work for the
into each page so that when the browser loads the page, it reads the
if (top == self) self.location.href="index.html";
· The browser loads the frameset defined in index.html, which
may not include the page that was indexed in the search engine. Visitors
may have to use the navigation to find the page that, presumably, had
the content they were looking for.
· The Back button doesn't work correctly, because each time the
Another option is to have a programmer create a customized script that
loads the frame-definition document and drops the specific page into
the correct frame. If youwork for a company with a Web-development department,
this is a real possibility.
The iframes is a special type of frame. This is an Internet Explorer
feature and something that is as common as normal frames. An frame is
an inline floating frame. It allows you to grab content from one page
and drop it into another, in the same way you can grab an image and
drop it into the page. The tag looks like this:
It has the similar problem to regular frames. In particular, some search
engines don't see the content in the iframe, and the ones that do index
it separately. You can add a link within the <IFRAME> tag so that
older searchbots will find the document like this:
<iframe src="page.html"><a href="page.html"
Click here for more Rodent Racing information if your browser doesn't
display content in this internal frame.</a></iframe>
Fixing Invisible Navigation Systems
Navigation systems that never show up on search engines' radar screens
are a common problem. Many web sites use navigation systems that are
invisible to search engines. A web page is compiled in two places -
on the server and in the browser. If the navigation system is created
in the browser, its probably not visible to a search engine.e.g.
· Java applets
· Macromedia Flash
How can you tell if your navigation has this problem? If you created
the pages yourself, you probably know how you built the navigation,
although you may be using an authoring tool that did it all for you.
So if that's the case, or if you are examining a site that was built
by someone else, here a few ways to figure out how the navigation is
· If navigation is created with a Java applet, when the page
loads you probably see a gray box where the navigation sits for a moment
or two, with a message such as Loading Java Applet.
· Look in the page's source code and see how its navigation is
and then reload the page to see if the navigation is still there.
Looking At the Source Code
Take a look at the source code of the document to see how it's created.
Open the raw HTML file or choose View?Source from the browser's main
menu, and then dig through the file looking for the navigation. If the
page is large and complex, or if your HTML skills are correspondingly
small and simple, you may want to try the technique under "Turning
off scripting and Java."
Here's an example. Suppose that you find the following code where the
navigation should be:
<applet code="MenuApplet" width="160" height="400"
don't read applet files, so they wont see the navigation.
Turning Off Scripting and Java
You can also turn off scripting and java in the browser, and then look
at the pages. If the navigation has simply disappeared , or its there
but doesn't work anymore. Here's how to disable the setting in the Internet
· Choose Tools?Internet Options from the main menu.
· Click the security tab.
· Click the Custom Level button to open the Security dialog box.
· Select the Microsoft VM?Java Permissions?Disable Java Option
· Select the Active Scripting?Disable option button.
· Click the OK button, answer Yes in the message box, and click
the Ok button again in the internet Options dialog box.
Fixing the Problem
If you want , you can continue to use these invisible menus and navigation
tools. They can be attractive and effective. The search engines wont
wee them, but that's okay because you can add a secondary form of navigation
one that duplicates the top navigation. You can duplicate the navigation
structure by using simple text links at the bottom of the page. If you
have long pages or extremely cluttered HTML, you may want to place small
text links near the top of the page to make sure search engines get
to them, perhaps in the leftmost table column.
Reducing The Clutter in Your Web Pages
Simple is good; cluttered is bad. The more cluttered your pages, the
more work it is for search engines to dig through them. What do I mean
by clutter? E.g. A site having the HTML source document for the home
page had 21414 characters, of which 19418 were characters other than
spaces. However the home page did not contain a lot of text: 1196 characters,
not including the spaces between the words. So if 1196 characters were
used to create the words on the page, what were the other 18222 characters
used for ? Things like this:
· The top navigation bar: 6018 characters
· Text used to embed a Flash animation near the top of the page:
The rest is the normal clutter that you always have in HTML: tags used
to format text, create tables, and so on. The problem with this page
was that a search engine had to read 17127 characters before if ever
reached the page content. The page did not have much content, and what
was there was hidden away below all that HTML. This clutter above the
page content means that some search engines may not reach it.
should be placed in an external file - a tag in the Web page "calls"
a script that is pulled from another file on the Web server - for various
· They are actually safer outside the HTML file. They are less
likely to be damaged while making changes o the HTML.
· They are easier to manage externally. Why not have a nice library
of all the scripts in your site in one directory?
· The download time is slightly less. If you use the same script
in multiple pages, the browser downloads the script once and catches
· They are easier to reuse. You don't need to copy scripts from
one page to another and fix all the pages when you have to make a change
to the script. Just store the script externally and change the external
file to automatically change the script in any number of pages.
· Doing so removes clutter from your pages!
Use document.write to remove problem code
If you have a complicated top navigation bar - one with text colors
in the main bar that change when you point at a menu and/or drop-down
lists, also with changing colors - you can easily get code character
counts peaking at 5000 to 6000 characters./ That's a lot of characters!
Add some flash animation, and you are probably upto 7000 characters,
which can easily end up being a significant portion of the overall code
to write the text into the page. Here's how:
· In an external text file, type this text:
· Grab the entire code you want to remove from the HTML page
and then paste it between the following quotation marks: document. write("place
· Save this file and place it on your web server.
· Call the file from the HTML page by adding an scr=attribute
to your <SCRIPT> tag to refer to the external file, like this:
Use External CSS Files
surprise you that you can do the same thing - drop stuff into a file
that is then referred to in the HTML file proper - with Cascading Style
Sheets information. For reasons many designers place CSS information
directly into the page, despite the fact that the ideal use of a style
sheet is external. Just think about it - one of the basic ideas behind
style sheets is to allow you to make formatting changes to an entire
site very quickly. If you want to change the size of the body text or
the color of the heading text, you make one small change in the CSS
file, and it affects the whole site immediately. If you have your CSS
information in each page, though you have to change each and every page.
Here's how to remove CSS information from the main block of HTML code.
Simply place the targeted text in an external file - everything between
and including the <STYLE></STYLE> tags - and then call the
file in your HTML pages by using the <LINK> tag, like this:
<link rel="stylesheet" href="sites.css" type="text/css">
Move image maps to the Bottom of the page
Image maps are images that contain multiple links. One way to clean
up clutter in a page is to move the code that defines the links to the
bottom of the Web page,right before the </BODY> tag. Doing so
doesn't remove the clutter between the top o the page and the page content,
making it more likely that search engines will reach the content.
Don't copy and paste from MS word
Don't copy text directly from Microsoft word and drop it into a Web
page. You'll end up with all sorts of formatting clutter in your page!
Here's one way to get around the problem:
· Save the File as an HTML file
· In your HTML authoring program, look for a Word cleaning tool!
Word has such a bad reputation that HTML programs are now starting to
add tools to help you clean the text before you see it. Dreamweaver
has such a thing, and even Microsoft's own HTML-authoring tool, Frontpage
, has one.
Managing Dynamic Web Pages
Pages pulled from databases are known as dynamic pages, as opposed to
the normal static pages that don't come from the database. They are
dynamic because they are created on the ly, when requested. The page
doesn't exist until a browser requests it, at which point the data is
grabbed froma database and put together with a CGI, an ASP, or a PHP
program. Dynamic pages can create problems. Even the best search engines
sometimes don't read them. After the searchbot receives the page, the
page is already complete. So why don't search engines always read dynamic
pages? Because search engines don't want to read them. Here are a few
of the problems searchbots can run into reading dynamic pages:
· Dynamic pages often have only minor changes in them. A searchbot
reading these pages may end up with hundreds of pages that are almost
exactly the same, with nothing more than minor differences to distinguish
one from the other.
· The search engines are concerned that databased pages might
change frequently, making search results inaccurate.
· Searchbots sometimes get stuck in the dynamic system, going
frompage to page among tens of thousands of pages. This happens when
a web programmers hasn't properly written the link code, and the database
continually feeds data to the search engine, even crashing your server.
· Hitting a database for thousands of pages can slowdown the
server, so searchbots often avoid getting into situations in which that
is likely to happen.
· Sometimes URLs can change, so even if the search engine does
index the page, the next time someone tries to get there. It'll be gone,
and search engines don't want to index dead links.
Finding out if your dynamic site is scaring off search engines
You can often tell if search engines are likely to omit your pages just
by looking at the URL. Go deep into the site; if it's a product catalog,
then go to the furthest subcategory you can find. Then look at the URL.
Suppose you have a URL like this:
This is a normal URL that should have few problems. It's a static page-
or at least looks like a static page, which is what counts. Compare
this URL with the next one:
If you have a clean URL with no parameters, the search engines should
be able to get to it. If you have a single parameter, its probably okay
for the major search engines, though not necessarily for older systems.
If you have two parameters, it may be a problem, or it may not, although
two parameters are more likely to be a problem than a single parameters
and three parameters are certainly a problem. You can also find out
if a page in your site is indexed by using the following techniques:
· If you have the google Toolbar, open the page you want tocheck
and then click the I button and select Cached snapshot of page. Or go
to google and type cache: YourURL, where YourURL is the URL of the site
you are interested in. If Google displays a cached page, it's there
of course, If Google doesn't display it, move to the next technique.
· Go the Google and type the URL of the page into the text box
and click Search. If the page is in the index, google displays some
information about it.
· Use similar techniques with other search engines if you want
to check them for your page.
Fixing Your Dynamic Web Page Problem
So how do you get search engines to take a look at your state-of-art
dynamic Web site? Here are a few ideas:
· Find out the database program has a built-in way tocreate static
· Modify URLs so they don't look like they are pointing to dynamic
· Use a URL rewrite trick - a technique for changing the way
In other words, this technique allows you to use what appear to be static
URLs, yet still grab pages from a database. This is complicated stuff,
so if your server administrator doesn't understand it, it may take him
or her a few days to figure it all out.
· Find out it the programmer can create static pages from the
database. Rather than creating a single web page each time its requested
the database could "spit out" the entire site periodically".
· You can get your information into some search engines by using
an XML feed often known as trusted feed.
· You canget pages into search engines by using a paid-inclusion
Using Session IDs in URLs
Session Id identifies a particular person visiting the site at a particular
time, which enables the server to track what pages the visitor looks
at and what actions the visitor takes during the session. If you request
a page from a web site - by clicking a link on a web page, for instance
- the web server that has page sends it to your browser. Then if you
request another page, the server sends that page, too, but the server
doesn't know that you are the same person. If the server needs to know
who you are, it needs a way to identify you each time you request a
page,. It does that by using session IDs. Session IDs are used for a
variety of reasons, but the main purpose is to allow web developers
to create various types of interactive sites. E.g. if the developers
have created a secure environment, they may want to force visitors to
go through the home page first. Or the developers may want a way to
pick up a session where it left off. By setting cookies on the visitor's
computer containing the session ID, the developers can see where the
visitors was in the site at the end of the visitors's last session.
Session IDs are common when running a software application that has
any kind of security or needs to store variables, or wants to defeat
the browser cahe - .
Session's IDs can be created in two ways:
· They can be stored in coolies.
· They can be displayed in the URL itself.
Some systems are set up to store the session IDs in a cookie but then
use a URL session Id if the user's browser is set to not accept cookies.
If a search engine recognizes a URL as including a session ID. It probably
wont read the referenced page because the server can handle a session
ID two different ways when the searchbot returns. Each time the searchbot
returns to your site,the session ID will have expired, So the server
could do either of the following:
· Display an error page, rather than the indexed page, or perhaps
the site's default page.
· Assign a new session ID.
Dealing with session IDs like a magic trick. Sites that were invisible
to search engines suddenly become visible! When sites run through URLs
with session IDs, you can do various things:
· Instead of using session IDs in the URL, store session information
in a cookie on the user's computer. Each time a page is requested, the
server can check thecookie to see if session information is stored there.
· Get your programmer to omit session IDs if the device requesting
a Web page from the server is a searchbot. The searchbot will deliver
the same page to the searchbot but wont assign a session ID, so the
searchbot can travel throughout the site without using session IDs.
Examining Cookie-Based Navigation
Cookies - the small text files that a Web server can store on a site
visitor's hard drive - can often prove as indigestible tosearch engines
as dynamic Web pages and session IDs. Cookies are sometimes used for
navigation purposes. You may have seen crumb trails, a series of links
showing where you have been as you travel through the site. Crumb trails
look something like this:
This is generally information being stored in a cookie and is read
each time you load a new page. Or the server may read the cookie to
determine how many times you have visited the site or what you did the
last time you were on the site, and direct you to a particular page
based on that information. If you are using Internet Explorer on Microsoft
windows, follow these steps to see what these cookie files look alike.
· Choose Tools?Internet Options from the main menu.
· In the Internet Options dialog box, make sure that the General
tab is selected.
· Click The Settings button in the Temporary Internet Files area.
· In the settings dialog box, click the View Files button.
· Doubel click any of the these cookie files to view the file's
contents; a warning message appears, but ignore it and click yes.
There is nothing wrong with using cookies, unless they are required
in order to navigate through your site. A server can be set upto simply
refuse to send a Web page to a site visitor if the visitor's browser
doesn't accept cookies.
· A few browsers simply don't accept cookies.
· A small number of people have changed their browser settings
to refuse to accept cookies.
· Searchbots cant accept cookies.
That's all there is to it! The searchbot will request a page, your server
will try to set a cookie, and the searchbot wont be able to accept it.
This server wont send the page,so the searchbot wont index it. How can
you check to see if your site has this problem? Change your browser's
cookies settings and see if you can travel through the Website. Here's
· Choose Tools?Internet Options from the main menu.
· In the Internet Options dialog box, click the privacy tab.
· On the Privacy tab, click the Advanced button.
· Select the Override Automatic Cookie Handling check box - if
it's not already selected.
· Select both of the Prompt option buttons.
· Click OK to close the Advanced Privacy Settings dialog box.
· In the Internet Options dialog box, click the General tab.
· On the General tab, click the Delete Cookies button. Note that
some sites wont recognize you when you revisit them, until you log in
again and they reset their cookies.
· Click the OK button in the confirmation message box.
· Click the OK button to close the dialog box.
Now go to your web site and see what happens. Each time the site tries
to set a cookie, you see a message box. Block the cookie and then see
if you can still travel around the site. If you cant the searchbots
cant navigate it it either.
How do you fix this problem?
· Don't require cookies: Ask your site programmers to find some
other way to handle what you are doing with cookies, or do without the
fancy navigation trick.
· As with session IDs, you can use a User-Agent script that treats
searchbots differently. If the server sees a normal visitor, it requires
cookies, if it's a searchbot, it doesn't.
Fixing Bits and Pieces
Forwarded pages, image maps and special characters can also cause problems
for search engines.
Search engines don't want to index pages that automatically forward
to other pages. You've undoubtedly seen pages telling you that something
has moved to another location and that you can click a link or wait
a few seconds for the page to automatically forward the browser to another
page. This is often done with a REFRESH meta tag, like this:
<meta http-equiv="refresh" content="0; url=http://yourdomain.com">
This meta tag forwards the browser immediately to yourdomain.com. Quite
reasonably, search engines don't like these pages. Why index a page
that doesn't contain information but forwards visitors to the page with
the information? Why not index the target page? That's just what search
engine do. If you use the REFRESH meta tag, you can except search engines
to ignore the page.
An image map is an image that has multiple links. You can create the
image like this:
<img name="main" src="images/main.gif" usemap="#m_main">
The usemap=parameter refers to the map instructions. You can create
the information defining the hotspots on theimage - the individual links
- by using a <MAP> tag, like this:
<area shape="rect" coords="238,159,350,183"
<area shape="rect" cords="204,189,387,214" href="page2.html">
<area shape="rect" cords="207,245,387,343" href="page3.html">
Will search engines follow these links? Many search engines don't read
image maps . Use additional simple text links in the document.
Don't use special characters, such as accents in your text. To use unusual
characters, you have to use special codes in HTML, and the search engines
generally don't like these codes. If you want to write the word ole,
for example, you can do it in three ways:
Third method displays okay in Internet Explorer but not in a number
of other browsers. But you probably should not use any of these forms
because the Search engines don't like them, and therefore wont index
these words. Stick to basic character.