Skip to main content

Crawling & Indexing: Technical SEO Basics That Drive Revenue (Case Study)

https://rozhon.com/blog/crawling-indexing-technical-seo-basics-that-drive-revenue/

Getting technical aspects of SEO right can be rewarding and have a significant impact on your bottom line. The bigger the site is, the more important technical SEO becomes. I’m going to show you how deindexing over 80% of indexed pages leads to great results.
Just to be clear, this is not an ultimate guide of technical SEO because we would need several days to go through that.

KPI? Revenue!

It’s nice to improve crawling speed and similar metrics that will make you a rockstar at conferences and in SEO podcasts, but the revenue is what matters the most to all ecommerce websites.
In 2017, our primary focus was to get crawling and indexing under control because we didn’t want to rely on Google’s judgment (which is often right but can also be wrong), with a goal of increasing the revenue from organic traffic to category pages.
We started with technical changes in early spring and saw our first results in the summer. Category pages (PLP) generated 23.5% more revenue from organic traffic than they did during the same period last year.
As the year progressed, the revenue was increasing, which resulted in 54.9% increase in the revenue from organic traffic to category pages during the holidays. The positive trend kept going and we are up over 70% YOY in the first weeks of 2018.
Yes, we saw an increase in organic traffic, but that’s not your ultimate goal so I’m not going to talk about it at all.
Revenue from organic traffic to category pages 2016 vs 2017
Revenue from organic traffic to category pages 2016 vs. 2017

Content is not everything. Manage crawling and indexing.

At the beginning of the year, we had over 500,000 URLs indexed by Google but we decided to deindex over 400,000 URLs (mainly category pages) during the year and finished the year with only about 100,000 URLs indexed.
We deindexed over 80% of the URLs.
Why did we do that? Because search engines indexed tons of useless and duplicate category URLs.
We wanted to help search engines understand the structure of the site.
Total indexed pages in Google (screenshot taken from Google Search Console)
Total indexed pages in Google (screenshot taken from Google Search Console)
Before you decide to deindex URLs, look into Google Analytics to see if these URLs drive organic traffic.
How did we know it’s the right thing to do? We simply looked at the percentage of indexed URLs that generates organic traffic and the number was depressing: Only 8.55% indexed URLs had generated at least one session in a month. That’s a painfully low number.
After several months of hard work, the percentage has grown to 49.7%. There’s still work to be done and our goal is somewhere around 60%, but we’re getting there.
Every website has the sweet spot somewhere else. This website is strongly affected by seasonality and a part of the assortment aims at summer and another part aims at winter, so it would be foolish to expect that number to reach 90%. However, you can target 80-90% for certain types of non-seasonal businesses.

Wild Query Strings

As with every ecommerce website, the site had used URL parameters extensively. After digging into Google Analytics data, we discovered that some category pages had 58 unique URLs. Most of them were crawlable, indexable and without a clear canonical strategy.
If that’s the case, the first thing you want to do is collect all parameters used on the site. Collect as much data as you can (Google Search Console, Google Analytics, log files, Screaming Frog, etc.) and extract all parameters. Then spend time with your developers and write down the functionality of every single one. We found almost 150 URL parameters.
URL Parameters Tool in Google Search Console
You will find that some of them are not needed, some of them are a result of legacy systems, etc.
Does this parameter change content seen by the user?
If the answer is no, there’s a high chance that you don’t need that parameter because there are better ways to track things. Usually, you don’t want these URLs to be crawled and indexed by search engines.
If the answer is yes, and the change is meaningful, then these should be crawlable and indexable. You don’t want search engines to discover these parameters if the change is not significant and they make only small changes such as reordering products on category pages.
URL parameters decision tree
Once mapping and classifying was done, we configured all URL parameters in Google Search Console (Crawl >> URL Parameters) and Bing Webmaster Tools (Configure My Site >> Ignore URL Parameters).
These tools are powerful and allow you to provide clear instructions to these search engines in no time. But use them more like a short-term solution; I still recommend taking care of these issues directly in the code of the site.
There’s no one solution that fits all. It may make sense to prevent crawling a new parameter that hasn’t been discovered by search engines. If there are already thousands of indexed URLs with that parameter, you should be thinking about using the “noindex” or “canonical” tag instead of crawling restrictions.

Faceted Navigation

Faceted navigation is another common troublemaker on ecommerce websites we have been dealing with.
Every combination of facets and filters creates a unique URL. This is a good thing and a bad thing at the same time, because it creates tons of great landing pages but also tons of super specific landing pages no one cares about.
You can easily get thousands of URLs if you apply facets and filters to a category page.
  • Seven brands (Adidas, Nike, Puma …)
  • Four genders (Men, Women, Boys, Girls)
  • Five average ratings (1 star, 2 stars, 3 stars …)
  • Ten colors (White, Black, Red …)
This simple example doesn’t offer to refine based on specific features of a product and already creates 1,400 URLs.
We used the following tree to decide if a specific refinement should be a facet or a filter.
Filters vs. facets
Knowing the negative impacts of having each of these URLs indexed, we made sure that facets and filters were treated differently.
Facets
  • Are discoverable crawlable and indexable by search engines;
  • Contain self-referencing canonical tags;
  • Are not discoverable if multiple items from the same facet are selected (e.g. Adidas and Nike t-shirts).
Faceted navigation
Filters
  • Are not discoverable;
  • Contain a “noindex’ tag;
  • Use URL parameters that are configured in Google Search Console and Bing Webmaster tools.
One may argue that a canonical tag referring to a category is a better solution, but it didn’t work for us because we had other issues with canonical tags and Google tended to ignore them.
Because we already had thousands of URLs with filters indexed, we couldn’t prevent crawling of these URLs (search engines wouldn’t have discovered noindex tags). You can block crawling via robots.txt if search engines are yet to discover those URLs.
Every website is different, and there’s no solution that fits all.

TL;DR

There are tons of websites trying to apply more advanced technical SEO, but they should be trying to get the basics right.
Getting rid of duplicate pages and consolidating signals to one canonical URL is not rocket science and doesn’t sound as sexy as structured data, RankBrain or voice search, but it’s still a great way to improve rankings, traffic, and ultimately revenue.




Comments

Popular posts from this blog

The Difference Between LEGO MINDSTORMS EV3 Home Edition (#31313) and LEGO MINDSTORMS Education EV3 (#45544)

http://robotsquare.com/2013/11/25/difference-between-ev3-home-edition-and-education-ev3/ This article covers the difference between the LEGO MINDSTORMS EV3 Home Edition and LEGO MINDSTORMS Education EV3 products. Other articles in the ‘difference between’ series: * The difference and compatibility between EV3 and NXT ( link ) * The difference between NXT Home Edition and NXT Education products ( link ) One robotics platform, two targets The LEGO MINDSTORMS EV3 robotics platform has been developed for two different target audiences. We have home users (children and hobbyists) and educational users (students and teachers). LEGO has designed a base set for each group, as well as several add on sets. There isn’t a clear line between home users and educational users, though. It’s fine to use the Education set at home, and it’s fine to use the Home Edition set at school. This article aims to clarify the differences between the two product lines so you can decide which

Let’s ban PowerPoint in lectures – it makes students more stupid and professors more boring

https://theconversation.com/lets-ban-powerpoint-in-lectures-it-makes-students-more-stupid-and-professors-more-boring-36183 Reading bullet points off a screen doesn't teach anyone anything. Author Bent Meier Sørensen Professor in Philosophy and Business at Copenhagen Business School Disclosure Statement Bent Meier Sørensen does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations. The Conversation is funded by CSIRO, Melbourne, Monash, RMIT, UTS, UWA, ACU, ANU, ASB, Baker IDI, Canberra, CDU, Curtin, Deakin, ECU, Flinders, Griffith, the Harry Perkins Institute, JCU, La Trobe, Massey, Murdoch, Newcastle, UQ, QUT, SAHMRI, Swinburne, Sydney, UNDA, UNE, UniSA, UNSW, USC, USQ, UTAS, UWS, VU and Wollongong.

Logic Analyzer with STM32 Boards

https://sysprogs.com/w/how-we-turned-8-popular-stm32-boards-into-powerful-logic-analyzers/ How We Turned 8 Popular STM32 Boards into Powerful Logic Analyzers March 23, 2017 Ivan Shcherbakov The idea of making a “soft logic analyzer” that will run on top of popular prototyping boards has been crossing my mind since we first got acquainted with the STM32 Discovery and Nucleo boards. The STM32 GPIO is blazingly fast and the built-in DMA controller looks powerful enough to handle high bandwidths. So having that in mind, we spent several months perfecting both software and firmware side and here is what we got in the end. Capturing the signals The main challenge when using a microcontroller like STM32 as a core of a logic analyzer is dealing with sampling irregularities. Unlike FPGA-based analyzers, the microcontroller has to share the same resources to load instructions from memory, read/write the program state and capture the external inputs from the G