Search engines follow a 3 step process in order to reveal the most relevant outputs regarding the users' needed inquiry. Orderly, these steps are Crawling, Indexing and Ranking and Serving.
- Crawling: The discovery of websites done by the search engine bots as they reach the existing information on the internet. As they scan, bots follow the links that are embedded on websites and reach new ones. Due to the links that are present on websites, bots scan by visiting billions of websites online.
- Indexing: The process of bots saving adding already visited websites to a data storage system. Indexing is the second step that comes right after crawling.
- Ranking and Serving: The end result of the Search Engine Results Page (SERP) which lists out websites that are the most relevant to the users' searches. This ranking is done from the most relevant websites to the least.
Indexing is the entire process of search engine bots processing and saving the data from the websites that they scan, and place them in a storage system. Bots try to analyze and make sense of the content on each scanned website. During the analysis, elements such as keywords, visuals, content and the generic build that the website has are classified. The information that is obtained through the analysis is added onto the index and is stored on the search engines' database; ready to be offered to the users.
The pages that are not indexed vie bots are not present on Search Engine Results Page because they do not reside in the databases. Therefore, they do not get any organic traffic. That is why, during SEO optimizations, indexing is crucial for pages that are expected to gain organic traffic.
A process also known as Google index questioning, allows us to see the page numbers that are indexed and not indexed on Google for a specific website. There are two different methods to check the number of pages that are indexed and which pages are indexed.
If we are to type "site:example.com" (Example being the domain name) on the search bar, we can see the number of the pages that are indexed by Google. If there are no results on SERP, this indicated that there are zero pages that are indexed.
After logging in to the related website's Google Search Console account, we can click on the "Coverage" section which is located right under the "Index" section. Here, the number that we see under the "Valid" section shows us the number of pages that are indexed. Through the details section, we can get more detailed information on the pages that are indexed. If the "Valid" section reveals the number to be zero, this means that no pages are indexed. The number of errors we get on indexed pages can be found under the "Errors" section whereas more information about these errors can be found under "Details".
Also referred to as Google add site, submitting an indexing request is to inform Google of your website pages and asking to be indexed. Submitting these pages to Google goes not mean that Google will quickly index them nor that the pages will immediately show up at the top on SERP. Indexing requests are done only to inform Google of new or modified pages are added onto a website that are not indexed yet. How and when the pages are indexed are up to Google bots.
In order to submit a Google Index request, the first step is logging in to the related website's Google Search Console account and adding the URLs of the chosen pages under the "URL Inspection" section. After a few seconds of waiting, Search Console draws Google Index datas and reveals the current indexing status of the pages in question. On the right hand side of the same screen, clicking the "REQUEST INDEXING" section submits an indexing request for the related URLs.
Also known as delete Google index pages , remove Google index pages is the act of informing Google about certain pages on a website and requesting for them to be removed. Informing Google about certain pages means signaling the bots to prioritize these pages; however once again, it is up to the Google bots how and when these pages will be removed from the index.
The first step is logging in to the related website's Google Search Console account and clicking on "Removals" located under the "Index" section. Later on, a removal request is formed by clicking on the "NEW REQUEST" button located on the right hand side of the page.
Sometimes, it is possible not to request each page on a website to be indexed. There can be different reasons to why one might want to check and/or change the pages' index statuses. These are the following;
- Pages that are unfit for indexing can be excluded for scanning budget optimization (ie: Static pages)
- Pages that are still being tested that do not provide original and quality content can be left out of indexing in order to protect the website authority and prevent user access.
In situations as such, the indexing status of pages can be checked by redirecting the search engine bots.
Robots Meta Directives are the codes that are assigned to bots in order to check the indexing status of the pages on a website. Robots Meta Directives are divided into two being Robots Meta Tags and X-Robots Tags.
Robots Meta Tags are codes that are written on the HTML sections of pages that are able to guide some or all browsers. The most common types of Robots Meta Tags are index/noindex, follow/nofollow and noarchive.
- Index / Noindex tags instructs search engine bots on whether to index pages or not. Where Index instructs the pages to be indexed and shown on SERP, noindex in contrast instructs the pages to not be indexed and not to be shown on SERP.
Unless the term noindex is not specified, search engines act in a way to index all pages. That is why, specifying the term index is unnecessary.
X Robots Tags are used as a part of the HTTPS over-script section. The instructions that are sent are the same as they are with Robots Meta Tags, they are only an alternative method.
The pages that are indexed by bots can be removed from the index without the intervention of webmasters (ie: webmasters using the "noindex" meta tag to instruct the bots). The removal of indexed pages from the search engines can be the result of the following,
- Getting 4XX coded client errors or 5XX coded server errors on the pages in question
- Violating the terms of conditions of the search engines
- Needing access permission for the related pages and the fact that they are not accessible to everyone
Canonical Tags are the codes that inform bots whether the related pages prefer a certain version or not. If a page contains a canonical tag, the bots assume that there is a much more preferred, alternative version of the selected page, and the URL that is placed on the canonical tag is viewed as the authoritarian page. However, if the page does not contain a canonical tag, bots assume that there are no alternative versions of that page and indexes the page as the original one.
Canonical tags prevent the original pages from losing their value to alternative versions. The main point of caution here is knowing that the canonical tags do not directly intervene to the pages' index statuses. In order to intervene to the index statuses of the pages, the index/noindex meta tags should be used.
- Canonical tags are used when a page contains elements such as filtering, ranking, etc. in order to direct URLs with parameters to versions without parameters.
- Canonical tags should also be provided in order to prevent the duplicate content problems that might be caused by similar versions of the pages.
- Canonical tags should be used in each original page in order to inform bots on the existing original pages within a website.
Optimizing Google indexing benefits to improve the scanning budget. That is why the indexing optimization is very important for SEO operations. During the optimization of indexing, we must make sure to apply the following articles;
- Using the correct robots.txt file (ie: Placing the "disallow" command on pages that are indexed in order to gain organic traffic is a faulty use because the pages that are disallowed will be unavailable for scanning, therefore will not be indexed.)
- Having the correct and organized on site link architecture for the website
- Conducting backlink analysis
- Using sitemap
- Using the robots meta tags and canonical tags correctly
- Having mobile compatibility for the website
- Providing up-to-date, quality and original content to the users, and indirectly to the bots