Search engines

Search engines have become a very important aspect of many people’s use of the internet, and search engine optimisation is a significant dimension of developing a web site.

Search engines have developed enormously since the early days of the world wide web. Much of the mystique around how they work comes about not primarily because people are trying to keep web developers in the dark, but because there is a process of continuous innovation.

Search engines work by using programs, called robots or spiders, to trawl the internet looking for information, which is then stored in large databases. These programs find web sites either by following links from other sites, or because web designers have specifically submitted a site for inclusion. You can’t guarantee that a site will be included in a search engine’s database. It can take a while for a web site to become included in a search engine’s database: I suspect that this is because a site which has no links to it is probably less valued than one with links to it.

Today, Google dominates search engine technology. Holding indexed records of the whole of the world wide web involves a great deal of computing, which makes it hard for new search engines to come into being.

Google’s technology looks both at the information gleaned from a site and from how the site is referenced elsewhere: part of their pagerank algorithm seems to take special notice of links with no other links around. The logic of this is that if a link is just one of many then it may not mean much, but if it is one link in a page of text then that is a strong recommendation. Google also does some sophisticated processing of words that are in close proximity: if you type something in double quotes, then that phrase exactly is looked for, otherwise it is looking for the words typed, reasonably close together. Part of the art of search engine optimisation is to ensure that the actual content of a site is clear to a search engine using this approach.

Just as it’s not normally possible to force a site to be included in Google, it’s also not normally possible to force one to be excluded, though deleted pages get dropped after a while. Google make no promises about how often sites are checked for updates, but server logs imply that the Google robot visits sites quite frequently. Careful use of Google sitemaps can encourage pages that change frequently to be indexed more often.

A major source of income for all search engines is advertising. Google’s adwords program is one of the ways in which people can pay to appear among sponsored links. Like all advertising, this can be very productive, but needs to be chosen with care.