A Quick & Comprehensive Guide to:

Search Engine Crawling & Indexing

In this guide, you’ll learn everything about Search Engine Indexing and Crawling. Including:

   What is crawling and indexing
   How Google crawls and indexes your site
   How to improve the process so you can rank higher
   How to optimize your crawl budget

Content

Crawling

Chapter 1

Indexing

Chapter 2

Crawl Budget

Chapter 3

Introduction

Imagine if you’ve built a nice and sweet website. But no visitors are coming to your site. It turns out, your website doesn’t come up for the relevant search queries.

What is that you’re not doing or doing wrong?

In this guide, we’ll learn how to make our website visible and known by the search engines.

Let’s jump into it!

Chapter 1

Crawling 101

Now you know it all begins with crawling. But,

What is crawling?
What are the ways crawlers can discover your website?
When does the crawlers fail to find your page?

The Process Begins

There are three main functions of a search engine. That is to crawl the web, index the pages they discovered throughout that crawl and then rank those pages.

And if we as web site owners inhibit the process at any point, we won’t be able to get ranked.

What is Crawling?

The discovery process in which google spiders actively look for finding new web pages or new content in the old web pages is known as crawling, as simple as that.

There is no central index for websites, and search engines have to actively look for websites to index and rank them in the SERP. What Google does is they have their spiders, known as Googlebot actively looking for worthy websites in order to index and rank them. 

How do spiders or bots discover your website?

Ok so now you know, you have to be found by the spiders or google bots, But how? Well, there are two ways;

1. By submitting a Sitemap to Google Search Console

2. When Google discovers your links on a known website

Sitemap

A sitemap is a file that contains a list of all your important pages, images, videos or other contents.

When you submit this file to Google’s Search console, their spiders can now crawl your site more effectively.

How to Submit a Sitemap

You can use Yoast SEO plugin if you’re using wordpress, or Choose any of these 5 ways to generate and submit one.

Links

The best part? Even if you don’t submit a sitemap, Google is still able to discover your website and crawl.

Right now, while I am writing this article, I don’t have my sitemap submitted into GSC for our site Local SEO Lads.

But guess what? 7 of my pages are already indexed.

Now, Google bots have picked up my site from any of the indexed or known web pages.

But this site is relatively new. And I haven’t acquired any backlinks (except one citation).

Let’s have a look!

So I did make a Yelp profile a week ago. 

And guess what? they found me right there!

I didn’t make any other citations because I wanted to see if only one link from a high authority domain can send the crawlers to me.

And voila! It did! It took only six days for the spiders to find my page on Yelp!

Pages That Aren't Crawled

When you block the spiders in the robots.txt file or
When your page is behind a login or authorization protection or
Duplicate pages

These pages are usually inaccessible by the bots or they tend to avoid them.

Chapter 2

Index 101

Done with Crawling. Now,

   What is Indexing?
   How does Google indexes websites by crawling?

google-indexing (1)

What is Indexing?

Collecting and organizing the content found while crawling. Once a page is in the index, it’s ready to be displayed as a result to relevant queries.” – Moz.com

This is an image of a subway map of NYC and it’s a great analogy when we’re talking about the Google crawler. Because it’s kind of like a Google crawler map.

Imagine that each train station is a web page and the colorful lines or tracks are links. 

Similarly, Google indexes the web by sending out spiders or Google Bots and sends them to known sites or trusted sites.

It sends them to one of the indexed sites, crawls those pages, collects links from those pages, and follows those links.

And then crawls the pages at the end of those links, collects more links from those pages, follows those links crawls more pages and continues that cycle.

And then it takes all of that Nation and stores it or indexes it and makes it easy to retrieve.

Google hasn’t crawled the entire web, but they’re our thoughts that they’ve collected about five trillion megabytes or so. Google doesn’t share the exact figures.

Chapter 3

Crawl Budget 101

Now you know it all begins with crawling. But,

What is crawling?
What are the ways crawlers can discover your website?
When does the crawlers fail to find your page?

crawl-budget (1) (1)

Why Crawl budget?

Google has knowledge of more than a hundred and thirty trillion pages. Now, they have knowledge of these pages, but they aren’t necessarily ranking them or even indexing them.

Sites with less than a few thousand URLs are typically crawled effectively and don’t have to pay too close attention to crawl budget. But larger sites do have a couple of considerations.

Crawl budget is made up of two factors. 
  1. Crawl Rate Limit
  2. Crawl Demand

Crawl Rate Limit

Crawl rate limit is how often Googlebot allows itself to crawl your pages. In order not to overload your server or do harm or affect user experience in any way.

The crawl rate limit is affected by crawl health. And poor crawl health can be a result of slow load times on your website’s broken pages, server errors or any other technical issues. 

All of those factors result into poor crawl health, which can affect your crawl rate limit.

Crawl Demand

The second factor is crawl demand and that’s the demand for Google to crawl your pages, both new and old.

That’s determined by both the popularity of your site and pages on your website. That’s Google’s attempt to prevent URLs in their index from becoming stale.

So even if you have more crawl rate left, if the demand doesn’t require more crawl, their spiders won’t necessarily crawl.

Optimizing Your Crawl Budget

Crawling your site costs Google time and money. We don’t often think of Google as having any limit when it comes to time and resources and money, but they do.

Google wants to be active and efficient, so they will prune pages that are low quality or aren’t adding value. Those low-value URLs can negatively affect your crawl budget.

You might think that just a few pages on your site will go under the radar, which might be right.

But, they have to maintain a budget as well and have to pay attention to these types of things.

5 Important Factors That Affect Crawl Budget

There are some factors that affects your crawl budget:

  1.  Faceted Navigation

    Filtering, especially on e-commerce websites it can be helpful for the users.

    But can cause a problem to the crawlers by creating many combinations of URLs with duplicative content.

  2. Duplicate Content

    When user and tracking information is stored through URL parameters, duplicate content can arise.

  3. Soft Error Pages

    Soft error occurs when a web server responds with a 200 response code but the page doesn’t exist.
    It looks there’s a page there, but there’s not.



  4. Infinite Spaces

    Large number of URLs that provide little or no value for Google to index.
    Like a Calendar with a next month link.


  5. Low Quality Content

    Any content that doesn’t provide enough value for the users. 

Chapter 4

Actionable Plan

Now in this final chapter, we’re going to see how to make Google love our website.

Although their expectation is quite high, we have to start from somewhere, right? So, let’s see what we got here!

googlebot-loved-webpage (1)

How to Make Search Engines Love You?

As a website owner, your job is not to work against the search engines or manipulate the algorithms, but to work with them.

Here we list some of the ways you can make Google love your site:

  1. Sitemap

    When you have multiple pages to index, submit a sitemap. For a specific page, use the URL Inspection tool.

  2. Robots.TXT

    Prevent bots from entering particular pages using robots.txt file.

  3. URLs

    Get rid of all the broken links and redirect loops. Maintain a user-friendly URL structure.

  4. Duplicate Content 

    Avoid multiple URLs for a single page or similar content in multiple pages.

  5. Schema

    Use Schema markup to provide search engines with a better idea of what your pages are.

  6.  Clear Navigation 

    Have clear navigation that allows both the search engines and users to find the right content quickly.

  7.  Mobile Responsive

    Optimize your content and make it as responsive as it can get.

Now What?

That’s all from my side. I firmly believe you got a clear idea about what those terms mean and where they apply.

Now it’s time for you to take action. Don’t just wait and hope Google will crawl and index your website.

You can’t control the SERP, but you can most certainly improve your chances of being on the top.

If you have any questions or suggestions,  contact us. We’ll be happy to help!

googlebot-loved-webpage (1)

Author

Fahim Kabir

Local SEO Specialist & Consultant

Leave a Comment

Your email address will not be published. Required fields are marked *