What Google Search Console Can (and Cannot) Tell You About Indexation
A framework for reading GSC indexation reports: which statuses are technical directives, which are Google quality judgments, and how to validate before acting.
Google Search Console is the most critical diagnostic tool for understanding how Google sees your website. But its indexation reports are a map, not the territory. Teams that misinterpret these signals waste months chasing the wrong fixes, de-indexing valuable content, or trying to force the indexation of pages Google will never value.
This is a diagnostic framework for reading those signals correctly. You'll learn to distinguish between a technical directive you set and a quality signal from Google, understand the tool's limitations, and know when to seek more evidence before you act.
What GSC CAN Tell You: The Certainties
For certain technical issues, Google Search Console provides clear, definitive answers. These signals relate directly to technical directives you control and can generally be trusted without significant secondary validation.
- Robots.txt Blocks: If a URL is listed as 'Blocked by robots.txt', you can be certain that Googlebot was prevented from crawling it. The tool is reporting the direct outcome of a rule in your
robots.txtfile. - 'Noindex' Directives: When GSC reports a page is 'Excluded by ‘noindex’ tag', it has successfully crawled the page, found the meta robots tag or
X-Robots-TagHTTP header, and is respecting your directive. This is a confirmation of your intent. - Server Errors (5xx): GSC's reporting on server errors directly reflects what Googlebot experienced. A spike in 5xx errors means your server was failing to respond correctly during a crawl attempt.
- Not Found Errors (404): The 'Not found (404)' status is a reliable indicator that Googlebot requested a URL and received a 404 HTTP status code. It confirms the page is gone from Google's perspective.
- Google-Selected Canonical: The URL Inspection Tool explicitly tells you which URL Google has chosen as the canonical for a given page or set of duplicates. This is Google's final decision, regardless of the canonical tag you declared.
These are the certainties—reports on the status of technical rules and server responses. When you see them, your first step is to decide if the reported status matches your intention.
What GSC CANNOT Tell You: The Ambiguities
GSC becomes less of a clear-cut manual and more of an interpretive guide when dealing with quality assessments. The tool provides signals, but it rarely provides the complete 'why'.
- The Specific Reason for a Quality Issue: The 'Crawled - currently not indexed' status is the most misunderstood in GSC. It confirms Google has seen your page but has chosen not to index it. GSC will not tell you precisely why. It won't say 'this content is too thin' or 'this page duplicates three others too closely'. It only signals the outcome of its quality algorithms, not the specific inputs.
- Real-Time Indexation Status: The main Coverage report is not live; it reflects data that is often several days old. A page you fixed yesterday might still appear as having an error. The URL Inspection Tool provides a much more current view, but only for a single URL at a time.
- The Full Crawl Path: GSC might tell you a page was discovered via a sitemap or a linking page, but it doesn't provide a comprehensive map of how internal link equity flows to a URL. It won't show you all 500 internal links pointing to a page, only a small sample.
- Guaranteed Ranking Potential: A 'Valid' status simply means the page is indexed. It is not a promise of performance. A page can be perfectly valid and technically sound but have zero chance of ranking for any competitive query.
A Diagnostic Framework: Mapping Questions to Reports
Effective indexation debugging starts by asking the right question. Don't just open the Coverage report and look for red alerts. Instead, frame your inquiry and use the reports to find the answer.
Question: "Is Google finding my new URLs?"
- Where to Look: 'Discovered - currently not indexed' status; Sitemap coverage.
- Interpretation: A high number of URLs here means Google knows the pages exist but hasn't allocated the crawl budget to fetch them yet. This can point to issues with site authority, internal linking, or server capacity. If the pattern persists, diagnose whether it is actually a crawlability problem before assuming crawl budget is the cause.
Question: "Is a technical rule I set blocking my pages?"
- Where to Look: The 'Excluded' list.
- Interpretation: Look for statuses like 'Blocked by robots.txt' and 'Excluded by ‘noindex’ tag'. These are typically the result of your own configurations.
Question: "Has Google seen my page but decided it's not worth indexing?"
- Where to Look: 'Crawled - currently not indexed' status.
- Interpretation: This is the critical quality signal. Technical barriers are gone, but the page failed to meet Google's quality threshold for indexation. This is where you investigate content thinness, duplication, or low perceived user value.
Question: "Which version of a duplicate page did Google choose?"
- Where to Look: 'Alternate page with proper canonical tag' and 'Duplicate, Google chose different canonical than user'. Use the URL Inspection Tool on a specific URL to see the definitive 'Google-selected canonical'.
- Interpretation: This tells you whether your canonicalization strategy is working as intended. If Google keeps overriding your choice, work through the canonical signal conflict diagnostic framework to find which signal is contradicting your tag.
'Excluded' vs. 'Crawled - Not Indexed': Directive vs. Judgment
A common mistake is treating every non-indexed page as an error. The distinction between the 'Excluded' bucket and the 'Crawled - currently not indexed' status is crucial.
'Excluded' is mostly about Indexation Control. URLs land here primarily because of a directive you set. This is often intentional and correct. For example, a URL with a canonical tag pointing elsewhere will appear as 'Alternate page with proper canonical tag'. This is good; your canonicalization is working. The 'Excluded' report is not a to-do list. It's a summary of pages that aren't indexed, many for good reasons you've defined. Your job is to review the reasons and ensure they align with your strategy.
'Crawled - currently not indexed' is a Google Judgment Call. This status is different. It means no technical directives prevented indexation. Googlebot successfully requested and rendered the page, but an algorithmic assessment decided it wasn't valuable enough to be added to the index. This is a powerful signal about perceived quality and should prompt a content audit, not a technical fix.
A Practical Workflow: How to Validate GSC Signals
Never make a significant change based on a single GSC report. The data is powerful but aggregated and delayed. Always validate your hypothesis with more immediate tools.
-
Identify a Pattern in GSC: Don't focus on a single URL. Look for trends. Is an entire page template or site section showing up with the same status? For example, are all
/product-reviews/pages being flagged as 'Crawled - not indexed'? -
Spot-Check with the URL Inspection Tool: Take 3-5 example URLs from the pattern and run them through the URL Inspection Tool. This gives you more current data on crawl time, referring pages, and the Google-selected canonical.
-
Confirm with a
site:Search: Use thesite:yourdomain.com/example-urlsearch operator. While not 100% reliable, it's a very fast, real-world check of whether a URL is in the index right now. If GSC says a page is indexed but asite:search shows nothing, you have a discrepancy to investigate. -
Form a Hypothesis: After gathering these data points, form a hypothesis. For instance: "GSC shows a pattern of 'Crawled - not indexed' for our
/product-reviews/pages. The URL Inspection Tool andsite:searches confirm they are not indexed. Our hypothesis is that these pages are being flagged as thin content."
Workflow Example: Debugging a 'Crawled - Not Indexed' Pattern
Let's apply this to a realistic scenario.
-
The Signal: An e-commerce site manager notices a steady increase in the 'Crawled - currently not indexed' count. Clicking into the report, they see the vast majority of affected URLs follow the pattern
domain.com/products/filter?color=blue&size=large. -
Initial Analysis: These are faceted navigation URLs created when users apply filters. Each one shows the same products as the main category page, just in a different order or with some items removed.
-
Cross-Referencing:
- URL Inspection: They inspect a few examples. GSC confirms Google has crawled them. The 'Google-selected canonical' for each filtered URL is shown to be the main category page (
domain.com/products/). site:Search: A search forsite:domain.com/products/filter?color=bluereturns no results, confirming it's not indexed. A search for the canonicalsite:domain.com/products/shows the main category page is indexed correctly.
- URL Inspection: They inspect a few examples. GSC confirms Google has crawled them. The 'Google-selected canonical' for each filtered URL is shown to be the main category page (
-
Diagnosis & Action: The GSC status, while alarming at first glance, describes a system working correctly. Google is crawling the filtered URLs, recognizing they are duplicates of the main category page, and choosing not to index them in favor of the canonical version. The 'Crawled - not indexed' status is technically accurate but not a problem to be solved. The correct action here is to do nothing, or perhaps use
robots.txtto block crawling of these parameters to conserve crawl budget on a very large site.
From Signal to Diagnosis
Google Search Console does not give you a simple list of errors to fix. It provides signals that act as a starting point for intelligent investigation. The difference between an effective SEO and an ineffective one is the ability to distinguish between a technical directive and a quality judgment, and to build a process for validating signals before taking action.
Treat GSC data as the beginning of your diagnosis, not the end. Cross-reference every pattern before you change a single canonical tag or robots.txt rule. That discipline will focus your efforts on changes that matter.
Frequently Asked Questions
What is the difference between 'Crawled - currently not indexed' and 'Discovered - currently not indexed'? 'Discovered' means Google knows the URL exists but hasn't crawled it yet, often a crawl budget or internal linking issue. 'Crawled' means Google has visited the page but decided against indexing it, which is typically a signal related to content quality or duplication.
If a page is 'Excluded' in GSC, does that mean it's a problem I need to fix? Not necessarily. The 'Excluded' category contains all pages not indexed for any reason, many of which are intentional (e.g., pages with a 'noindex' tag or canonicals). Review the specific reason for exclusion to see if it matches your intent.
How long does it take for GSC to update its indexation data? The main Coverage report can lag by several days. For the most current information on a single URL, always use the URL Inspection Tool.
Why does the URL Inspection Tool say my page is indexed, but the Coverage report says it isn't? This is usually due to the time lag. The URL Inspection Tool provides near-real-time data, while the Coverage report is based on an older, aggregated dataset. Trust the URL Inspection Tool for the current status of a specific page.
Can GSC tell me exactly why Google thinks my page is low quality? No. The 'Crawled - currently not indexed' status signals that Google's algorithms have deemed the page unsuitable for indexation, but GSC will not provide a specific checklist of the quality factors that failed.
If I fix a page that was 'Crawled - currently not indexed', how do I ask Google to reconsider it? After significantly improving the content, use the URL Inspection Tool for that page and click 'Request Indexing'. For a large-scale fix, use the 'Validate Fix' button in the main Coverage report after the improvements are live.
Does a 'Valid' status in GSC guarantee my page will rank? No. 'Valid' simply means the page has been indexed and is eligible to appear in search results. It is not a measure of quality or ranking potential, which depends on hundreds of other factors.
Related articles
Canonical Tags: Find Signal Conflicts Before They Break Indexation
Why Google may ignore your declared canonicals, and a framework for finding signal conflicts across redirects, sitemaps, internal links, and CMS templates.
Soft 404s: How to Diagnose Thin or Mismatched Pages Without False Positives
Stop losing rankings to false positives. Learn how to diagnose and resolve Google Search Console soft 404 errors using a state-dependent decision tree.
How to Diagnose Crawlability Problems Without Guessing
An evidence-first workflow to separate crawlability, indexation, and rendering problems using GSC and targeted crawls — before you blame crawl budget.