Search engines use crawlers to find new and updated content. Content can range from webpages to PDF files and images. What all content has in common, however, is that it’s discovered by links.
The most important crawler is Googlebot, Google’s crawler, because more than 90% of searches are performed via Google. It works by collecting several web pagesand then following the links on them to find new pages. Googlebot can find new and updated content and add it to an index by following this link path. When someone types in a query and relevant information is available in this index, this information is retrieved from it.
Clicks and Rankings
When you do a search, Google will check its index for relevant content and rank it in order to solve your query. This is basically what ranking is. In theory, the highest-ranked result will be most relevant to your query. In practice, that’s often not the case.
Our SEO data-scientist set out to establish how Googlebot crawls affect page ranking, as well as the dependency between crawl frequency and URL performance. Here is what they discovered.
The Study Dataset
The analysis is based on the real data of UK based site with average monthly organic search impressions 1.2M and 3.5K indexed pages.
For this analysis,the team combined data from filtered Googlebot crawl events (GET requests from google.com/bot) and Google Search Console’s daily URL performance for period of 15 days in September 2020. For each URL, they calculated the number of crawls per day. The frequency of crawls for each URL was calculated by dividing 15 (the total number of days) by the number of days on which there was at least one crawl.
Predictably, the more frequent the crawls, the higher the number of daily impressions and clicks was. The ranking of these pages was also higher. In this study, the number of URLs crawled every day was 81. They had 211 daily impressions and 3.59 clicks on average, and an average position of 15. At the other end, 1866 URLs were crawled once every two weeks. They had two daily impressions, 0.01 clicks, and a position of 27 on average.
Average Crawl Frequency (Once In) | Nr Of Urls | Avg Daily Impressions | Avg Daily Clicks | Avg Position |
---|---|---|---|---|
Everyday | 81 | 211 | 3.59 | 15 |
Once in 2 days | 540 | 25 | 0.42 | 19 |
Once in 3 days | 162 | 11 | 0.08 | 22 |
Once in 4 days | 343 | 4 | 0.02 | 27 |
Once in 5 days | 677 | 3 | 0.02 | 28 |
Once in a week | 1068 | 2 | 0.02 | 26 |
Once in 2 weeks | 1866 | 2 | 0.01 | 27 |

Introducing Pearson’s Coefficient of Linear Correlation
You’ll agree that with age, children tend to get taller. A statistician would say there was a very strong positive relationship between age and height. The Pearson coefficient of correlation would range from 0.70 to 1 depending on the study sample. Likewise, the higher the speed of a car, the shorter the traveling time; if you drive faster, you’ll arrive at your destination sooner. We then speak of a very strong negative relationship between speed and travel time. The coefficient would range from -0.70 to -1.
The team calculated the Pearson correlation coefficient between the number of days with crawls and the average daily impressions, a metric quantifying how many times a piece of content (like a site or an ad) is viewed or engaged with. The coefficient between the number of crawls, clicks, and position (ranking) was calculated too.
Header | Nr Of Crawls |
---|---|
Avg impressions | 0.69 |
Daily clicks | 0.66 |
Position | -0.69 |
The coefficient between the number of crawls and average impressions was 0.69, which shows a strong positive relationship between them. It was 0.66 between the number of crawls and the clicks per day and -0.69 between the number of crawls and the position of the URL. This last negative value shows that high URL crawling frequency is associated with low average position number. In this case, this means more often crawled URLs are closer to the top SERP position.
The team compared two types of URLs: high-ranking URLs with low impressions (average position 1-5 and average daily number of impressions <=5) and low-ranking URLs with high impressions (position >20 and daily number of impressions >=50). They found low-ranking URLs with more impressions were crawled almost twice as often: 8.7 vs. 4.6. This led them to conclude impressions were a more important factor for crawl frequency.
How are Crawl Frequency and Page Rank Connected?
To determine how Googlebot affects page rank in the days after a crawl, the researchers combined data from three sources: a list of crawls and their frequency for a period of two weeks in September 2020 and daily position by keyword for the corresponding period. They determined the position of each URL and keyword on the day before the last crawl, on the day of the crawl, and each day for five days after the crawl.
They filtered out URLs which were crawled more frequently than once every three days, because these could have been affected by previous crawls, potentially resulting in data inaccuracy. They also filtered out keywords which had no position data on the day before the crawl.
The remaining keyword and URL data, spanning 153 rows, showed an above-average position change on the day of the crawl and two days after it. The biggest change was observed on the day right after the crawl.
Position Change | Avg Dtd Change | Day Of Last Crawl | 1 Day After Last Crawl | 2 Day After Last Crawl | 3 Day After Last Crawl | 4 Day After Last Crawl | 5 Day After Last Crawl |
---|---|---|---|---|---|---|---|
Avg DtD change | 1.1 | 1.6 | 3.5 | 2.2 | 1.3 | 1.2 | 1.6 |
Avg % DtD change | 3.71% | 5% | 6% | 4% | 3% | 1% | 3% |
Avg change to the day before crawl | Cell | 1.6 | 3.0 | 3.5 | 3.2 | 2.4 | 2.9 |
Avg %change to the day before crawl | Cell | 5% | 8% | 11% | 11% | 8% | 10% |
The share of URLs with position improvement after the crawl was 64%. The position was higher than before the crawl on the third day after it. The share of URLs,whose position deteriorated after the crawl,was 36%.
High-ranking URLs were less volatile. For first, second, and third-ranked URLs, the average day to day change was just 0.13 compared to 2.08 for URLs ranking 21st or lower.
Avg Dtd Change In Position Depending On Avg Position | Avg Dtd Change |
---|---|
Avg position 1-3 | 0.13 |
Avg position 4-10 | 0.15 |
Avg position 11-20 | 0.64 |
Avg position >21 | 2.06 |
What do These Findings Mean?
If your site isn’t crawled and indexed, it won’t appear in ranking pages. You need to see how many of your pages are indexed if you have a site. This will help you find out whether Google is crawling and locating all of the pages that need to be crawled or if it is wasting crawl budget on URLs that you don't want to be crawled. To check if all of the important pages have been indexed, you can check Google Search Console.
You can help to prioritize URL crawl rate by increasing the quantity and quality of internal and external links to it. You can also ensure that you are not wasting crawl budget on low-priority pages by decreasing the internal links pointing to them, adding a nofollow tag to the links or by adding a block crawl instruction on robots.txt file to these URLs. These will point Googlebot in the right direction in terms of crawling your web content, which will increase your control over what ultimately appears in the index.