Audit and Analyze External Links

Getting the number of unique domains that a website links to can give a good idea on which other website they give importance. Next, we can check status codes, identify and locate any broken links, and take a look at redirects.
links
analytics
python
advertools
adviz
gsc
crawling
Author

Elias Dabbas

Published

June 14, 2024

Modified

July 5, 2024

This is a guide on how to audit external links on a website, how to check which domains it values the most, a status code check to find and locate broken links, and a look at redirected exernal links.

Crawl a website

This is a straightforward crawl with no special options other than following links and saving the crawl logs to a special file. This is usually good in case you want to debug something in your crawl.

import advertools as adv
import adviz
import pandas as pd

adv.crawl(
    url_list='https://supermetrics.com/',
    output_file='supermetrics_crawl.jl',
    follow_links=True,
    custom_settings={
        'LOG_FILE': 'supermetrics_crawl.log'
    })

Read the crawl file

Quickly read a few rows just to get an overview. We only need the url column, as well as all the links_* columns.

crawl_df = pd.read_json('supermetrics_crawl.jl', lines=True, nrows=3)
crawl_df
url title meta_desc viewport charset h1 h2 h3 canonical og:image og:title og:description twitter:card body_text size download_timeout download_slot download_latency depth status links_url links_text links_nofollow nav_links_url nav_links_text nav_links_nofollow header_links_url header_links_text header_links_nofollow footer_links_url footer_links_text footer_links_nofollow img_alt img_src img_fetchpriority img_width img_height img_decoding img_loading img_srcset ip_address crawl_time resp_headers_Date resp_headers_Content-Type resp_headers_X-Content-Type-Options resp_headers_X-Xss-Protection resp_headers_Content-Security-Policy resp_headers_X-Frame-Options resp_headers_Strict-Transport-Security resp_headers_Cache-Control resp_headers_Etag resp_headers_Vary resp_headers_Via resp_headers_Cf-Cache-Status resp_headers_Server resp_headers_Cf-Ray request_headers_Accept request_headers_Accept-Language request_headers_User-Agent request_headers_Accept-Encoding request_headers_Referer img_sizes
0 https://supermetrics.com/ Supermetrics: Turn your marketing data into opportunity - Supermetrics Focus on growth, not data silos. Streamline your marketing data so you can take control of what matters. Start your ... width=device-width utf-8 Maximize your marketing returns with better data Fueling insights for 200K+ companies in 120 countries@@Create one source of truth@@Our most popular destinations@@Da... Improve return on advertisement@@80% less expensive to maintain@@Reduce reporting time by 50% https://supermetrics.com/ https://supermetrics.com/images/supermetrics.png Supermetrics: Turn your marketing data into opportunity - Supermetrics Focus on growth, not data silos. Streamline your marketing data so you can take control of what matters. Start your ... summary_large_image Maximize your marketing returns with better data Streamline your marketing and sales data from every platform into ... 263394 180 supermetrics.com 0.747089 0 200 https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud/connect@@https://supermetrics.com/m... @@Connect@@Store@@Transform@@Analyze@@Act@@Facebook Ads@@Google Ads@@Google Analytics 4@@Microsoft Advertising@@Link... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Google Sheets logo@@G... https://supermetrics.com/images/connectors/Connector-logos_Facebook.svg@@https://supermetrics.com/images/connectors/... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@high@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@725@@@@@@@@@@@@@@@@443@@@@@@@@@@95@@240@@95@@240@@95@@240@@95@@240@@95@@240@@95@@240... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@710@@@@@@@@@@@@@@@@1000@@@@@@@@@@95@@60@@95@@60@@95@@60@@95@@60@@95@@60@@95@@60@@95@... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@async@@async@@async@@async@@async@@async@@async@@async@@async@@@@@@@@@@async@@async@... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@@@@@@@@@lazy@@lazy@@lazy@@lazy@@la... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/_next/image?url=%2Fimages%2Fpeople%2Fben-fitzpatrick.jpeg... 104.18.40.8 2024-07-05 10:38:25 Fri, 05 Jul 2024 10:38:25 GMT text/html; charset=utf-8 nosniff 0 frame-ancestors 'self' localhost:* supermetrics.sanity.studio SAMEORIGIN max-age=31536000; includeSubDomains private, no-cache, no-store, max-age=0, must-revalidate "kcz8ol4l2p5n60" Accept-Encoding 1.1 google DYNAMIC cloudflare 89e6b9ab597e5189-IST text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 en advertools/0.14.3 gzip, deflate, br NaN NaN
1 https://supermetrics.com/affiliate Become a Supermetrics Affiliate - Supermetrics Refer Supermetrics to others and get 20% recurring commissions from each sale. Join now! width=device-width utf-8 NaN @@Why become a Supermetrics affiliate?@@Grow revenue for your business@@Help marketers succeed@@Get education and ma... Join the Supermetrics affiliate program.@@Start referring your prospects.@@Keep growing your business with us. https://supermetrics.com/affiliate https://supermetrics.com/images/supermetrics.png Become a Supermetrics Affiliate - Supermetrics Refer Supermetrics to others and get 20% recurring commissions from each sale. Join now! summary_large_image Become a Supermetrics affiliate Let’s work together. Refer Supermetrics to others and get 20% recurring commissions ... 228214 180 supermetrics.com 0.541921 1 200 https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud/connect@@https://supermetrics.com/m... @@Connect@@Store@@Transform@@Analyze@@Act@@Facebook Ads@@Google Ads@@Google Analytics 4@@Microsoft Advertising@@Link... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Google Sheets logo@@G... https://supermetrics.com/images/connectors/Connector-logos_Facebook.svg@@https://supermetrics.com/images/connectors/... NaN @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@95@@240@@95@@240@@95@@240@@95@@240@@95@@240@@8@@13@@14@@15@@13 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@95@@60@@95@@60@@95@@60@@95@@60@@95@@60@@14@@13@@15@@11@@12 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@async@@async@@async@@async@@async@@async@@async@@async@@async@@async@@async@@async@@... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F8ly2m84z%2Fproduction%2Fae1c... 104.18.40.8 2024-07-05 10:38:26 Fri, 05 Jul 2024 10:38:26 GMT text/html; charset=utf-8 nosniff 0 frame-ancestors 'self' localhost:* supermetrics.sanity.studio SAMEORIGIN max-age=31536000; includeSubDomains private, no-cache, no-store, max-age=0, must-revalidate "1u0jlfv2pj4w25" Accept-Encoding 1.1 google DYNAMIC cloudflare 89e6b9b1ab685189-IST text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 en advertools/0.14.3 gzip, deflate, br https://supermetrics.com/ NaN
2 https://supermetrics.com/about About Supermetrics - Supermetrics Whether you’re a small business getting started on your data journey or a global enterprise working with business c... width=device-width utf-8 Our mission is to make data analysis simpler, more productive and more connected. It all started from a Google t-shirt@@Create one source of truth@@Supermetrics' Growth Story@@Fueling insights for 2... Helsinki Headquarters@@Atlanta@@Dublin@@Singapore https://supermetrics.com/about https://supermetrics.com/images/supermetrics.png About Supermetrics - Supermetrics Whether you’re a small business getting started on your data journey or a global enterprise working with business c... summary_large_image Our mission is to make data analysis simpler , more productive and more connected . Whether you’re a small busin... 232773 180 supermetrics.com 0.911505 1 200 https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud/connect@@https://supermetrics.com/m... @@Connect@@Store@@Transform@@Analyze@@Act@@Facebook Ads@@Google Ads@@Google Analytics 4@@Microsoft Advertising@@Link... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals... Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Supermetrics logo@@Google Sheets logo@@G... https://supermetrics.com/images/connectors/Connector-logos_Facebook.svg@@https://supermetrics.com/images/connectors/... NaN @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@502@@443@@@@@@@@@@@@@@@@@@@@@@230@@@@@@@@@@@@@@8@@13@@14@@15@@13 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@491@@1000@@@@@@@@@@@@@@@@@@@@@@290@@@@@@@@@@@@@@14@@13@@15@@11@@12 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@async@@async@@async@@async@@async@@async@@async@@async@@async@@async@@async@@async@@... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@lazy@@... @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/_next/image?url=%2Fimages%2Fphotos%2Fteam.png&w=640&q=75 1x, /_next/image?url=%2Fim... 104.18.40.8 2024-07-05 10:38:26 Fri, 05 Jul 2024 10:38:26 GMT text/html; charset=utf-8 nosniff 0 frame-ancestors 'self' localhost:* supermetrics.sanity.studio SAMEORIGIN max-age=31536000; includeSubDomains private, no-cache, no-store, max-age=0, must-revalidate "qk0qadzxdf4zka" Accept-Encoding 1.1 google DYNAMIC cloudflare 89e6b9b1f9425177-IST text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 en advertools/0.14.3 gzip, deflate, br https://supermetrics.com/ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@100vw@@100vw@@100vw@@@@100vw@@100vw@@@@@@@@@@@@@@@@@@

Read only the columns of interest

links_crawl_df = adv.crawlytics.jl_subset(
    filepath='supermetrics_crawl.jl',
    columns=['url'], # select one or more columns to read
    regex='^links_') # and/or add a regex when you have grouped columns
links_crawl_df.head(3)
url links_url links_text links_nofollow
0 https://supermetrics.com/ https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals...
1 https://supermetrics.com/affiliate https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals...
2 https://supermetrics.com/about https://supermetrics.com/@@https://supermetrics.com/marketing-intelligence-cloud@@https://supermetrics.com/marketing... .css-ozeoft{width:205px;}@@.css-ayr2p0{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-we... False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@False@@Fals...

Get status codes

fig = adviz.status_codes(
    external_links['status'].dropna().astype(int),
    title='Status Code - External Links<br>SuperMetrics.com',
    theme='flatly')
fig.layout.template.layout.colorway = ('#2C3E50', "#b71e1d", "#1bbd9b", "#ec9d0e", "#78b8ff")
fig
errors = external_links[external_links['status'].ne(200) | external_links['errors'].notna()][['url', 'status', 'errors']]
errors.head(3)
url status errors
9 https://optout.networkadvertising.org/?c=1 403.0 NaN
23 https://community.profitabledashboards.com/c/tips/what-you-need-to-know-about-ga4-api-quota-limits-and-looker-studio... 404.0 NaN
28 https://developers.facebook.com/docs/marketing-api/insights/parameters/v15.0?_fb_noscript=1 404.0 NaN

Are redirected pages going to error pages?

We can find rows where the status code starts with a “4” or “5”.

index_gt_400 = redirect_df[redirect_df['status'].fillna('').astype(str).str[0].isin(['4', '5'])].index
redirect_df.loc[index_gt_400, :].head(9)
url status order type download_latency redirect_times
28 https://developers.facebook.com/docs/marketing-api/insights/parameters/v15.0#:~:text=which%20returns%20a%20maximum%2... meta refresh 1 requested 0.439176 1
28 https://developers.facebook.com/docs/marketing-api/insights/parameters/v15.0?_fb_noscript=1 404.0 2 crawled 0.439176 1
768 https://acquire.io/blog/technology-drive-commerce-success/ 301 1 requested 0.491819 1
768 https://acquire.io/blog/technology-drive-commerce-success 404.0 2 crawled 0.491819 1
841 https://www.brightonseo.com/news/jess-spate-sampling-and-sample-size-in-google-analytics/ 308 1 requested 0.141226 2
841 https://brightonseo.com/news/jess-spate-sampling-and-sample-size-in-google-analytics/ 308 2 intermediate 0.141226 2
841 https://brightonseo.com/news/jess-spate-sampling-and-sample-size-in-google-analytics 404.0 3 crawled 0.141226 2
1252 https://www.shrushti.com/seoblog/white-label-link-building-services/ 301 1 requested 1.864162 1
1252 https://www.shrushti.com/seoblog/benefits-of-white-label-link-building/ 404.0 2 crawled 1.864162 1

This was a set of steps that you can take to audit and anlyze external links found on a website. They can reveal which domains the website values the most, and whether there are errors or issues in those links.