Never Heard of Googlebot?

Well, that’s not a big problem because this blog post will do its best to familiarize you with this SEO terminology right away. However, don’t get alarmed about certain other terminologies that may crop up here to help you know more about Googlebot. To begin with, Googlebot is nothing but the search bot software (used by Google) to collect documents from the web to build a searchable index for its own Search engine.

An Internet bot, which is also called web robot, WWW robot or just ‘bot’ is a software application, as earlier mentioned, that performs automated and often repetitive tasks at a remarkably high speed unattainable by human beings. Nevertheless, bots are mostly used in web spidering or web crawling where an automated script retrieves or more technically ‘fetches’, analyzes as also files information from web servers within the twinkling of a second. In such cases, however, each server is allotted a file that is called robots.txt, carrying rules for the spidering or crawling of that server the bot is made to confront. Now that you have got the first hand knowledge about bot or web robot, we might as well proceed to the next part of our Googlebot introduction program.

To put it bluntly, Googlebot is Google’s web crawling bot where crawling is the course through which Googlebot finds out new and updated pages that are to be added to the Google index. Of course, Googlebot uses an algorithmic process, while computer programs determine the particular sites to crawl, how often to do that as also the number of pages to ‘fetch’ from each website.  However, Googlebot’s crawl system starts with a list of webpage URLs, created from earlier crawls that are enhanced with Sitemap data supplied by webmasters. Since Goglebot touches each of these websites, it can detect links (SRC and HREF) on each page while adding these to its list of pages to crawl new sites or changes to current sites, while dead links are noted and utilized for updating Google index.

What Google’s Matt Cutts Says About Googlebot

One of the most tenacious blackhat webspam techniques we continue to see is hacked sites. I wanted to remind site owners that our free “Fetch as Google Tool”can be a really helpful way to see whether you’ve successfully cleaned up a hacked site.For example, recently a well-known musician’s website was hacked. The management firm for the musician wrote in to say that the site was clean now. Here’s the reply I sent back:

Unfortunately when our engineers checked this morning, the site was still hacked. I know the page looks clean to you, but when we send Googlebot to fetch www.[domain].com this morning, we see<title>Generic synthroid bad you :: Canadian Pharmacy</title> on the page. What the hackers are doing is sneaky but unfortunately pretty common. When you surf directly to the website, you see normal content. But when a search engine (or a visitor from a search engine) visits the website, they see hacked drug-related content. The reason that the hackers do it this way is so that the hacked content is harder to find/remove and so that hacked content stays up longer.

The fix in this case is to go deeper to clean the hack out of your system. See http://support.google.com/webmasters/bin/answer.py?hl=en&answer=163634 for some tips on how to do this, but every website is different.One important tool Google provides to help in assessing whether a site is cleaned up is our “Fetch as Googlebot” feature in our free webmaster console at http://google.com/webmasters/ . That tool lets you actually send Googlebot to your website and see exactly what we see when we fetch the page. That tool would have let you known that the website was still hacked.I hope that helps give an idea of where to go next.Something I love about “Fetch as Googlebot” is that it’s self-service–you don’t even need to talk to anyone at Google to diagnose whether your hacked site looks clean.