A Helpful Guide to Understanding How Search Engines Work
Before I really get into many of the SEO guides that are forthcoming on this blog, I thought I’d take us back to basics. For SEO purposes, that means understanding search engines, their purpose, and how they are useful to us.
A Brief History
Search engines were born out of the idea that the Internet is big – very BIG! In the beginning, there was really no easy way to find information on the web, and many directories that did exist were maintained by hand.
Finding information on the Internet was very inefficient and manual, creating the need for a program that could act as a WWW resource-discovery tool to combine the three essential features of a search engine (crawling, indexing, and searching). Originally, because of the limited resources available, indexing and hence searching were limited to the titles and headings found in the web pages the crawler encountered.
One of the first “full text” crawler-based search engines was WebCrawler, which came out in 1994. Unlike its predecessors, it let users search for any word in any webpage, which has become the standard for all major search engines since. At that time, several other search engines came out and vied for popularity including: Magellan, Excite, Infoseek, Inktomi, Northern Light, AltaVista, and Yahoo.
Around 2000, Google’s search engine rose to power most through their innovative new way to achieve better results, called “PageRank.” After that, the rest is history.
How Search Engines Work
When you sit down at your computer and perform a search query in Google, Yahoo, or Bing you’re presented with a list of results from all over the web. The big questions are:
- How do search engines find web pages that match your query?
- How do they determine the order in which to display the search results?
In it’s most basic sense, you can think of searching the web as looking through the index of a really really large book which tells you right where to find everything. Similarly, when you perform a search query, search engines use their intuitive technology to check their in index and return (or “serve”) the most relevant search results to your query.
There are 3 key processes in delivery search results to a user:
- Crawling – Do search engines know about your site? Can they find it?
- Indexing – Is your site indexable?
- Serving – Does the site have useful content that is highly relevant to user search queries?
Crawling is the process by which search engines discover new and updated pages to be added to the search index.
Search engines use a vast network of computers and servers across the world to fetch (or “crawl”) billions of pages on the web. The program that does the fetching is called a “spider” – or robot/bot (Google calls theirs “Googlebot”).
A web spider is an automated Web browser whose job is to follow every link on a site, scan all the information on a webpage, and bring it back to the search engine’s servers to index.
Each search engine uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site.
A search engine’s crawl process typically begins with a list of web page URLs, which is usually generated from previous crawl processes. It is then augmented with sitemap data provided by webmasters. Webmasters can use services such as Google Webmaster Tools to submit their sitemaps specifically to Google.
As a web spider visits each of these websites it detects links on each page and adds them to its list of pages to crawl. New websites, changes to existing websites, as well as dead links are notated and used to update the search engine’s index.
Major search engines do not accept payment to crawl a site more frequently (or rank it higher), but they do offer services such as Pay-Per-Click advertising as a supplement to their natural search engine.
Web crawlers systematically process each of the web pages they crawl in order to compile a massive index of all the words it sees and their location on each given web page.
In addition, search engines process information included in key content tags and attributes, such as Title tags, Meta Tags, and ALT attributes.
#3. Serving Results
This is the part where you, the user comes in. When you visit a site such as Google and enter a search query, Google’s machines search their index for web pages that match your search query.
They then return (or “serve”) up the results in the natural section of their Search Engine Results Page (SERP) ordered from most relevant to least relevant among results that match the given search query.
#4. Determining Relevancy
So how does a search engine determine the relevancy of a group of web pages when serving up the pages in its SERPs? Most search engines have what is called an algorithm, which is a set of rules (or unique formula) for determining the significance and relevancy of web pages.
The algorithms of each search engine are completely unique, and every search engine handles web pages a little differently. However, the premise of each algorithm is the same – to find information on the web that somebody might find interesting or relevant to their given search query.
Google’s algorithm specifically includes over 200 different factors, including Google PageRank – which was named after one of Google’s founders Larry Page.
PageRank is the measure of the importance of a page based on the incoming links from other pages. In simple terms, each link to a page on your site from another site adds to your site’s PageRank.
However, not all links are equal. Google works hard to improve the user experience by identifying spam links and other practices that negatively impact search results. The best types of links are those that are given based on the quality of your content.
In order for your site to rank well in SERPs, it’s important to make sure that Google can crawl and index your site correctly. Check out Google’s Webmaster Guidelines, which outline some best practices that can help you avoid common SEO pitfalls and which should improve your site’s ranking.
#5. Universal Search
Universal or “blended” search has been around since 2007 when Google announced the significant change to their search engine and how it displays results. Yahoo and Bing have also followed suit with regards to integrating other types of media into search results.
In an attempt to find the most relevant results regardless of media type, the universal search algorithm allows search engines to return more than the just the traditional text results. They bring back and inject images, news, local listings, shopping, video, blog posts, and/or social media results right into the SERP pages.
#6. Personalized Search
In late 2009, Google introduced the concept of personalized search results for all users. Previously, they’d only used personalized search for signed-in users.
This means that when you search using Google, the results that you receive will include the most relevant results possible based on a number of factors including previous search behavior. This addition to their search engine enables Google to customize search results for users based upon 180 days of search activity linked to an anonymous cookie in your browser.
Here is a short video from Google Engineers Bryan Horling and Robby Bryant:
Hope you enjoy, and I hope this article gives you a good understanding of how search engines work in 2010!
- Definition: Web Search Engine via Wikipedia
- Google Basics via Webmaster Central @ Google
- What is Googlebot? via Webmaster Central @ Google
- Anatomy of a Search Result via Webmaster Central @ Google
- Universal Search 101 via Ron Jones @ Search Engine Watch