This is an attempt at writing a Screaming Frog guide that will help users to get the most out of the tool. When used correctly it can be a great asset in breaking down your website’s inner workings. However, with so many useful features it can be hard to tell what this web crawler is actually doing. Below you will find out more about what it is, how it works and some great tips for auditing your site.
This is written to be a Screaming Frog beginners guide, but it will cover some advanced techniques. For many people reading this, you will undoubtedly learn a few SEO lessons along the way. However, if you would like to find out more about the Google Ranking Factors then you should checkout my SEO checklist.
What is Screaming Frog?
Screaming Frog is a web crawler first and foremost. This means that the software starts from a single page and collects data as it crawls from one page to another. Typically this will begin with the homepage of a website and then move outwards. As it moves from one page to another the tool collects information with the intent of helping you analyse the website to improve SEO performance.
The tool requires a license for more than 500 pages, but there is a free version. If you have never used the tool before this Screaming Frog guide, then you should definitely download it here.
How Does Screaming Frog Work?
The first part of this tutorial includes an explanation of how the tool works. This isn’t integral for understanding how to use Screaming Frog. However, for those that are interested, these are my observations of the SEO spider and how it behaves.
The first observation is already mentioned previously, that the crawler bounces between the pages of a website. As it does this it collects data that can later be analysed. However, what many people do not notice is that the crawler works using depth-first crawling.
What is Depth First Crawling?
If an SEO spider uses depth-first, then it starts from the homepage and then collects all the URIs from that page. This means that the page you start on will be Level 0, and all links from that page are Level 1. The bot then follows one of these links and starts collecting all the Level 2 links from there and so fourth. Once the bot has exhausted all the links within that journey it will return to the route and follow the next path.
This means that Screaming Frog works by finding the deepest pages first. You will notice the link level changes as the crawl moves through the breadth of the site.
What is Breadth First Crawling?
This works by crawling your page horizontally and gradually working through all the links by level. This means that the bot struggles to find your deepest pages quickly, but will retrieve the link levels in the correct order. This has some advantages over depth-first crawling in that it reveals site structure more clearly. However, for crawlers such as Googlebot, finding your deepest page can only occur by regularly crawling your entire website.
This means that crawling your website regularly becomes costly. By crawling your website based on depth-first, Google can find your deepest pages each time it enters your site from a different location. Since the internet is a series of connections between different sites, Google doesn’t always enter through the homepage.
The Beginner’s Screaming Frog Guide
The first section of this article covers the basics about Screaming Frog, what it is, and how it works. Now we will break down the foundations of the tool. This will look at where to find information, how to change the layout, and saving / exporting data. This part of the Screaming Frog guide is designed for entry level users that have not used the tool.
By reading this section of the Screaming Frog tutorial you will learn at its simplest, how to collect data with an SEO spider. For analysts that are already familiar with the web crawler – you may wish to skip ahead to the advanced category and learn more about how to utilise and analyse the data.
The first step to using this tool is understanding where everything is broken down. I have created a numbered list to breakdown the different sections. Then afterwards you can find two images – one for windows and another for macintosh. These images have labels that correspond to the below list:
- Menu Bar
- Crawl Bar
- Tab Bar
- Main Window
- Detailed Window
I will not cover every tab in the menu for the sake of brevity, but I will include the most popular tabs within the software. Learning where these are and how to use them will help you grow your SEO Analysis and move towards the advanced tutorial further below.
Protocol – the protocol of a website can be filtered to include all pages, HTTP pages and HTTPS pages. This tab is great for quickly checking both internal and outbound links. If you have recently acquired an SSL or TLS security certificate, this tab is going to be useful for you.
Status Codes – these are the codes that your website returns when a page is requested. This will only include pages that can be found using your internal linking. You can exclude sections of your website to make crawling faster if you have large volumes of user profiles or user generated content that bloats your website’s crawling. Typically, the server will return a 20X, 30X, 40X or 50X status – which means OK, redirected, client error, and server error respectively.
Page Titles – this is where you will quickly be able to access the page titles across your site. If you have more than 1 page title it will say so. This tab also has filter options to find duplicate, short, long, and missing tags, plus an option to check if your title matches your H1 tag.
Meta Description – this is where you can find your meta description, and filter by duplicate, short, long, and missing descriptions.
H1 Tags – this is where you will find all of your H1 tags, it includes filters for duplicate, multiple and missing h1 tags. This is useful for analysing your whole site’s h1 tags.
H2 Tags – this is a repeat of the above section but only applies to H2 tags. It’s important to note that whilst there are options to find multiple h2 tags, this is no longer an important factor.
Images – this is where you will find a list of all the images used on the website that are found during the crawl. It will include their URL path, the number of times the image is used and can be filtered by size and alt attribute. This section is very basic, reflecting how difficult it is for crawlers to read images.
Save & Load a Crawl
Once you have taken the trouble to crawl a large site, you may not wish to do this again each time. Simply save the crawl using the menu option. This crawl can be opened again at a later date by double clicking it like a normal file, or by opening Screaming Frog. Once this is opened, you can select load crawl from the same dropdown you saved it.
The exclude tool functions much like an additional robots.txt document that is only applicable to you. It will help you when analysing a client site that is not optimised. By excluding an entire path of the website you will speed up your crawl. This is useful when a client site includes large amounts of duplicate pages that are isolated. For example, if RowanSEO.com had thousands of category pages, you would exclude as follows:
The period is an important part of making this work, so you should always include that at the end. If you are only blocking a single page for other reasons, you can simply remove after the crawl. To remove a URL from the crawl simply right click and ‘remove’.
Bulk Export Tool
The bulk export tool is useful for creating a spreadsheet with a single type of information. My favourite bulk options include export all anchor text, export all 30x redirects, and export all 40x errors. There are also options for internal links, images and directives. However, when I audit a website I find the above to be particularly useful.
The Advanced Screaming Frog Guide
This is the start to the advanced screaming frog guide section of this article. This is more in-depth and covers techniques for analysing your website. If you have only just started SEO, this section is going to avoid basic tips. You may wish to revisit the basic category first.
From this point onwards, it is assumed you either read the basic section or know everything within.