Indexing refers to the process by which search engines crawl the Internet to discover web pages and store this information in an organized database called an index.
Google discovers new web pages as it crawls the web and then adds them to its index. To do this, it uses an indexing robot called Googlebot.
To better understand what SEO is, it is important for webmasters to know how the engines work and what the process is between the moment content is put online and the moment it will appear in Google's results.
Here are a few frequently asked questions:
What is Google indexing?
How long does it take to index a site on Google?
My site is indexed on Google, but I have no traffic?
What is Googlebot?
First, we have to understand the functioning of the Search Console.
Search engines work by crawling hundreds of billions of pages using their own web crawlers. These web crawlers are commonly called robots or spiders. A search engine navigates online by downloading web pages and following links to discover new ones.
They have three main functions:
Explore: the first step of the job, browsing the Internet looking for content, browsing the code and content for each URL found (site pages, images, videos, PDF ...).
Indexing: the work of indexing consists in storing and organizing the content found during the scanning process on a server. Once a page is in the Google index, it is available to be displayed on relevant queries made by the Internet user.
Positioning: Last step, present in the search results the contents that best answer the request of an Internet user. They are ranked in order of relevance according to a series of specific rules and algorithms.
Google's goal is to provide its users with the best possible renderings in terms of relevance and speed. Hundreds of billions of pages are stored on its servers. Thanks to its algorithms, which are updated several hundred times a year, Google tries to offer the most relevant results according to the search intentions of Internet users.
In order to offer optimal results, it will set aside duplicate content, content deemed uninteresting, and sites that abuse techniques to manipulate search results (spam).
Google;s spiders or crawlers, also called ;Googlebots;, crawl the entire Web, scanning each Web page (billions of documents) and exploring its hyperlinks in order to store this data in one or more indexes.
This process continues until the search engine;s spider has found, analyzed and indexed virtually the maximum amount of visible web content.
The best way for Google to find and return to your site is to detect and explore links from other sites that backlink to yours.
Engines see and analyze each page of the web independently. A website is simply a collection of web pages that are linked to each other using hyperlinks .
The basis of the internet and its network of sites is based on links and their tracking.
Once a web page has been crawled, Google analyzes and stores their code in huge data centers, called data centers (Google's index), ensuring that the data can be quickly presented to Internet users.
Google assigns a unique identifier to each web page and indexes its content to accurately identify its constituent elements.
This huge database contains all the content that Google has discovered and that it considers relevant enough to offer to Internet users .
Google manages an additional index, which is used to store sites suspected of spam, sites with duplicate content and those that are difficult to analyze (problems of size or structural errors).
Ranking in Google results
Ranking in Google results
The algorithms aim to present a relevant set of high quality search results that answer the user's query or question as quickly as possible.
When a query is entered into a search engine by a user, all pages deemed relevant are identified from the index and an algorithm is used to prioritize those that are relevant into a set of results ranked in a defined order.
The algorithms used to rank the most relevant results are different for each engine. A page that ranks in a specific place for a search query on Google may not rank the same for the same query on Bing .
In order to assign relevance and importance, they use complex algorithms designed to take into account hundreds of signals to determine the relevance and popularity of a web page.
Relevance: Identifying the content of a page corresponds to the search intention of an Internet user (intention is what researchers are trying to accomplish with that search, which is no small feat for search engines - or SEOs - to understand).
Popularity: The popularity and authority of a domain is determined by many factors, including the quality and quantity of existing inbound links.
In addition to the query, search engines use other relevant data to return results:
Location: Some queries depend on location and geolocation.
Language detected: They return content in the user's language.
Previous search history: They return different results for a query depending on the user's browsing history.
Device: A different set of results can be returned depending on the device (PC, mobile, tablet) from which the query was made.
In order to transmit the results to the end user of the engine, they must perform certain critical steps :
Interpretation of the user's query intent.
Identification of the pages in the index associated with the query.
Display of the result and ranking in order of relevance and popularity
Google has to crawl billions of new and updated pages. In order not to use resources unnecessarily, it allocates to each site a crawl budget that will determine the number of pages it will explore each day. By optimizing the priority and crawl budget and avoiding Googlebot to explore unnecessary pages, the engine's resources are centralized on the most important content of a website.
The SEO-oriented log analysis allows a better understanding of the behavior and errors encountered by the GoogleBot robot when crawling the site on the server.
Google can index a new page in different ways, depending on the method used to discover it.
There are many ways to let Google know about a new page:
Google Bot discovers it on your site via internal links.
The page is sent via a sitemap.
A request for indexing is made via the webmasters tool Search Console.
Receive a link from another site.
Indexing delays can be very variable depending on the popularity of your site, the method of submitting the new page to the search engine, its position on your site (number of clicks from the index), Google's priority.
The delay can range from 30 minutes to several days. However, you should not confuse indexing delay with the positioning delay which is much longer, depending on your SEO actions and not guaranteed.