Google Search Console Analysis Template

A developer account has be to be created on Google Developer Console with a project

The credentials you get from that project will be used to authenticate with the GSC API.

Code

client_config = 'YOUR_CONFIG_FILE.apps.googleusercontent.com.json'
account = searchconsole.authenticate(client_config=client_config)
property_url = 'https://advertools.readthedocs.io/'
webproperty = account[property_url]

Get the data

This is just an example, and you can get any data, in any other combination of metrics and/or dimensions, and durations. In this case, I retreived monthly data for the year 2023, and got query data by country.

Code

last_days = ['-'.join(['2023', str(month).zfill(2), str(calendar.monthrange(2023, month)[1])]) for month in range(1, 10)]

query_month_reports = []

for date in last_days:
    temp_report = webproperty.query.range(date, months=-1).dimension('country', 'query').get()
    tempdf = temp_report.to_dataframe().assign(date=date)
    query_month_reports.append(tempdf)
query_df = pd.concat(query_month_reports, ignore_index=True)
query_df.to_csv('country_query_month.csv')

Get URL data segmented by country

Code

page_month_reports = []

for date in last_days:

    temp_report = webproperty.query.range(date, months=-1).dimension('page', 'country').get()

    tempdf = temp_report.to_dataframe().assign(date=date)
    page_month_reports.append(tempdf)
page_df = pd.concat(page_month_reports, ignore_index=True)
page_df.to_csv('page_per_month.csv')

Read the saved data

Code

query_df = pd.read_csv('country_query_month.csv')
page_df = pd.read_csv('page_per_month.csv')

Data: monthly queries by country (clicks, impressions, ctr, position)

Code

query_df = query_df[~query_df['country'].isin(['zzz', 'xkk'])].reset_index(drop=True)
query_df

	country	query	clicks	impressions	ctr	position	date
0	usa	advertools	40	60	0.666667	1.016667	2023-01-31
1	ind	advertools	21	35	0.600000	1.000000	2023-01-31
2	ind	log analysis using python	17	28	0.607143	1.000000	2023-01-31
3	gbr	advertools	15	25	0.600000	1.000000	2023-01-31
4	deu	advertools	10	20	0.500000	1.000000	2023-01-31
...	...	...	...	...	...	...	...
205890	zmb	video sitemaps	0	1	0.000000	63.000000	2023-09-30
205891	zwe	crawling seo	0	1	0.000000	95.000000	2023-09-30
205892	zwe	download sitemap	0	1	0.000000	12.000000	2023-09-30
205893	zwe	get sitemap xml	0	1	0.000000	47.000000	2023-09-30
205894	zwe	seo library	0	1	0.000000	64.000000	2023-09-30

205895 rows × 7 columns

Monthly metrics by page and country

Code

page_df

	page	country	clicks	impressions	ctr	position	date
0	https://advertools.readthedocs.io/	usa	51	391	0.130435	47.304348	2023-01-31
1	https://advertools.readthedocs.io/en/master/advertools.logs.html	ind	49	688	0.071221	21.985465	2023-01-31
2	https://advertools.readthedocs.io/	ind	28	108	0.259259	27.962963	2023-01-31
3	https://advertools.readthedocs.io/en/master/advertools.logs.html	usa	21	2350	0.008936	30.180851	2023-01-31
4	https://advertools.readthedocs.io/	gbr	16	53	0.301887	32.056604	2023-01-31
...	...	...	...	...	...	...	...
24698	https://advertools.readthedocs.io/en/master/readme.html	tun	0	1	0.000000	1.000000	2023-09-30
24699	https://advertools.readthedocs.io/en/master/readme.html	tur	0	6	0.000000	16.666667	2023-09-30
24700	https://advertools.readthedocs.io/en/master/readme.html	twn	0	4	0.000000	42.250000	2023-09-30
24701	https://advertools.readthedocs.io/en/master/readme.html	ukr	0	2	0.000000	1.000000	2023-09-30
24702	https://advertools.readthedocs.io/en/master/readme.html	vnm	0	9	0.000000	15.333333	2023-09-30

24703 rows × 7 columns

Create interesting subsets of the data

Most frequent words

Code

top_words = adv.word_frequency(query_df['query'], query_df['impressions'],)
# top_words.head(20).style.bar(subset=['wtd_freq'], color='steelblue').format({'wtd_freq': '{:,}'})

adviz.style_table(
    top_words.head(20),
    column_types=['text', 'text', 'bar', 'text'],
    column_widths=[0.15, 0.1, 0.4, 0.15],
    theme='flatly',
    width=900,
    height=600)

The wtd_freq column shows the total impressions of all queries containing the respective word. For example, “seo” has appeared in many different queries, the total impressions of which was 123,266.

We can now take the top words, create their respective regular expressions, and create special columns for each term.

Code

term_regex = [
    ('brand', 'advertool'),
    ('seo', 'seo'),
    ('python', 'python'),
    ('sitemap', 'sitemap|xml'),
    ('analysis', 'analy[sz]|analyt'),
    ('robots', 'robots'),
    ('crawl', 'crawl|scrap[ei]|spider'),
    ('log', '\blog(file)?'),
    ('search', 'search'),
    ('serp', 'serp'),
    ('google', 'google')
]
pd.DataFrame(term_regex, columns=['term', 'regex'])

	term	regex
0	brand	advertool
1	seo	seo
2	python	python
3	sitemap	sitemap\|xml
4	analysis	analy[sz]\|analyt
5	robots	robots
6	crawl	crawl\|scrap[ei]\|spider
7	log	log(file)?
8	search	search
9	serp	serp
10	google	google

Code

for term, regex in term_regex:
    query_df[f'{term}_term'] = query_df['query'].str.contains(regex, regex=True)
query_df.filter(regex='^query$|_term$').head().style.background_gradient()

	query	brand_term	seo_term	python_term	sitemap_term	analysis_term	robots_term	crawl_term	log_term	search_term	serp_term	google_term
0	advertools	True	False	False	False	False	False	False	False	False	False	False
1	advertools	True	False	False	False	False	False	False	False	False	False	False
2	log analysis using python	False	False	True	False	True	False	False	False	False	False	False
3	advertools	True	False	False	False	False	False	False	False	False	False	False
4	advertools	True	False	False	False	False	False	False	False	False	False	False

Top terms/topics across all queries

Code

(query_df
 .filter(regex='_term')
 .mean()
 .sort_values(ascending=False)
 .to_frame()
 .rename(columns={0: '%'})
 .style
 .format('{:.1%}')
 .bar(color='steelblue'))

	%
python_term	19.4%
robots_term	16.8%
crawl_term	14.7%
analysis_term	11.7%
seo_term	11.3%
sitemap_term	10.4%
serp_term	6.8%
search_term	5.1%
google_term	4.6%
brand_term	0.5%
log_term	0.0%

Convert average positions to SERP page

Code

query_df['serp_page'] = query_df['position'].div(10).astype(int).add(1)
query_df.sample(10)[['query', 'position', 'serp_page']].reset_index(drop=True)

	query	position	serp_page
0	etaospider	32.000000	4
1	word_tokenize	7.000000	1
2	revers dns lookup	91.000000	10
3	robot txt test	47.000000	5
4	robots file meaning	96.000000	10
5	get all urls from sitemap	21.000000	3
6	reverse dns lookup command	63.500000	7
7	seo web crawler	27.428571	3
8	xml sitemap	81.500000	9
9	keywords for permutations and combinations	62.000000	7

Create pivot tables

Code

month_country_ranks = pd.pivot_table(query_df, index=['date', 'country'], values=['clicks', 'impressions'], aggfunc='sum').reset_index()
month_country_ranks['monthly_rank_clicks'] = month_country_ranks.groupby('date')['clicks'].rank(ascending=False)
month_country_ranks['flag'] = [adviz.flag(cc) for cc in month_country_ranks['country']]
# month_country_ranks.head()

Code

monthly_impressions = pd.pivot_table(query_df, index=['date', 'country'], values='impressions', aggfunc='sum').reset_index()
monthly_impressions['flag'] = [adviz.flag(cc) for cc in monthly_impressions['country']]
# monthly_impressions

Code

top20_countries = pd.pivot_table(monthly_impressions, index='country', values='impressions', aggfunc='sum').sort_values('impressions', ascending=False).head(20).index

All reports and charts are made using the full dataset. You can easily use any subset based on available and newly-created columns/criteria

Code

# examples
# brand = query_df[query_df['brand_term']]
# non_brand = query_df[~query_df['brand_term']]
# crawl_queries = query_df[query_df['crawl_term']]
# python_queries = query_df[query_df['python_term']]

# etc...

Country comparison (monthly impressions)

Click a flag(s) to add/remove countrie(s)

Code

fig = px.line(
    monthly_impressions[monthly_impressions['country'].isin(top20_countries)],
    x='date',
    y='impressions',
    color='flag',
    template='flatly',
    height=700,
    hover_name='country')
fig.layout.legend.title.text = 'country'
for trace in fig.data:
    trace.visible = 'legendonly'
    trace.line.width = 4
fig.layout.legend.font.size = 30
fig.layout.legend.title.font.size = 15
fig.layout.hovermode = 'x unified'
fig.layout.hoverlabel.font.size = 15
fig.layout.xaxis.showgrid = False
# fig.layout.yaxis.showgrid = False
fig

Weighted word frequency - monthly (clicks)

Total clicks for all keywords containing <word>.
e.g.: in January, keywords containing “python” received a total of 323 clicks

Code

word_freq_dfs = []

for month in query_df['date'].drop_duplicates():
    tempdf = query_df[query_df['date'].eq(month)]
    query_metric = pd.pivot_table(tempdf, index='query', values='clicks', aggfunc='sum').reset_index()
    word_freq_df = adv.word_frequency(query_metric['query'], query_metric['clicks']).head(20)[['word', 'wtd_freq']]
    word_freq_df.insert(0, 'date', tempdf['date'].iloc[0])
    word_freq_dfs.append(word_freq_df)
word_freq_click = pd.concat(word_freq_dfs)

Code

n = 15
fig = adviz.racing_chart(
    word_freq_click[['word', 'wtd_freq', 'date']],
    n=n,
    height=800,
    title=f'GSC Top {n} words per month - clicks',
    theme='flatly')
fig.layout.yaxis.tickfont.size = 20
# for frame in fig.frames:
#     frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig.layout.xaxis.title = 'weighted frequency'
fig

Weighted bigram frequency - monthly (clicks)

Code

word_freq_dfs_bi = []

for month in query_df['date'].drop_duplicates():
    tempdf = query_df[query_df['date'].eq(month) ]
    query_metric = pd.pivot_table(tempdf, index='query', values='clicks', aggfunc='sum').reset_index()
    word_freq_df = adv.word_frequency(query_metric['query'], query_metric['clicks'], phrase_len=2).head(20)[['word', 'wtd_freq']]
    word_freq_df.insert(0, 'date', tempdf['date'].iloc[0])
    word_freq_dfs_bi.append(word_freq_df)
word_freq_clicks_bi = pd.concat(word_freq_dfs_bi)

Code

n = 15
fig = adviz.racing_chart(
    word_freq_clicks_bi[['word', 'wtd_freq', 'date']],
    n=n,
    height=800,
    title=f'GSC Top {n} bigrams per month - clicks',
    theme='flatly')
fig.layout.yaxis.tickfont.size = 20
# for frame in fig.frames:
#     frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig.layout.yaxis.title = 'bigram'
fig.layout.xaxis.title = 'weighted frequency'
fig

Code

# default_linewidth = 0.5
# highlighted_linewidth_delta = 2

# def update_trace(trace, points, selector):
#     if len(points.point_inds) == 0:
#         return
#     for i,_ in enumerate(fig.data):
#         fig.data[i]['line']['width'] = 4
#         fig.data[i]['line']['color'] = 'red'
#         # fig.data[i]['line']['width'] = default_linewidth + highlighted_linewidth_delta * (i == points.trace_index)


# fig = go.Figure()
# for country in monthly_impressions['country'].drop_duplicates():
#     tempdf = monthly_impressions[monthly_impressions['country'].eq(country)]
#     fig.add_scatter(
#         x=tempdf['date'],
#         y=tempdf['impressions'],
#         name=tempdf['flag'].iloc[0],
#         hoverlabel={'bgcolor': 'red'},
#         mode='lines+markers',
#         line={'width': 0.5},
#         marker={
#             'color': 'gray', 'opacity': 0.01,
#         })
# fig.layout.height = 700
# # fig.layout.yaxis.type = 'log'

# def hover_fn(trace):
#     trace.color = 'red'

# for i in range(len(fig.data)):
#     fig.data[i].on_click(hover_fn)

# fig

Racing chart - impressions

Code

n = 15
fig = adviz.racing_chart(
    month_country_ranks[['flag', 'impressions', 'date']],
    n=n,
    height=800,
    # width=900,
    title=f'GSC Top {n} countries per month - impressions',
    theme='flatly')
fig.layout.yaxis.tickfont.size = 25
# for frame in fig.frames:
#     frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig

Monthly impressions - map

Code

fig = px.choropleth(
    month_country_ranks,
    color='impressions',
    locations=month_country_ranks['country'].str.upper(),
    animation_frame='date',
    hover_name='flag',
    template='flatly',
    title='Google Search Console - Monthly Impressions by Country 2023',
    projection='natural earth',
    color_continuous_scale=flatly_custom_scale,
    height=800)
fig.layout.geo.showframe = False
fig.layout.geo.lataxis.range = [-53, 76]
fig.layout.geo.lonaxis.range = [-137, 168]
fig.data[0].marker.line.color = 'gray'
fig

Racing chart - clicks

Code

n = 15
fig = adviz.racing_chart(
    month_country_ranks[['flag', 'clicks', 'date']],
    n=n,
    height=800,
    title=f'GSC Top {n} countries per month - clicks',
    theme='flatly')
fig.layout.yaxis.tickfont.size = 25
# for frame in fig.frames:
#     frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig

Monthly clicks - map

Code

fig = px.choropleth(
    month_country_ranks,
    color='clicks',
    locations=month_country_ranks['country'].str.upper(),
    animation_frame='date',
    hover_name='flag',
    template='flatly',
    title='Google Search Console - Monthly Clicks by Country 2023',
    projection='natural earth',
    color_continuous_scale=flatly_custom_scale,
    height=800)
fig.layout.geo.showframe = False
fig.layout.geo.lataxis.range = [-53, 76]
fig.layout.geo.lonaxis.range = [-137, 168]
fig.data[0].marker.line.color = 'gray'
fig

Monthly impressions ~ clicks by country

Zoom in to any part of the chart to focus on a certain set of countries (simply drag your mouse across the region/flags of interest)

Code

df = month_country_ranks[month_country_ranks['monthly_rank_clicks'].lt(21)]
fig = px.scatter(
    df,
    x='clicks',
    y='impressions',
    animation_frame='date',
    title='Monthly impressions ~ clicks by country (top 20)',
    text='flag',
    template='flatly',
    range_x=(0, df['clicks'].max() * 1.1),
    range_y=(0, df['impressions'].max() * 1.1),
    height=600
)
for frame in fig.frames:
    frame.data[0].textfont.size = 30
fig.frames[0].data[0].textfont.size = 30
for button in fig.layout.updatemenus[0].buttons:
    button.visible = False
fig

Code

imp_clicks = pd.DataFrame({
    'impressions': query_df.groupby('date')['impressions'].sum(),
    'clicks': query_df.groupby('date')['clicks'].sum(),
}).reset_index().assign(ctr=lambda df: df['clicks'].div(df['impressions']))
imp_clicks['count'] = pd.pivot_table(query_df, index='date', values='impressions', aggfunc='count')['impressions'].rename('count').tolist()

Code

url_dir_df = adv.url_to_df(page_df['page']).filter(regex=r'dir_|last_dir')
page_url_df = pd.concat([page_df, url_dir_df], axis=1)
monthly_page_imp = pd.pivot_table(page_url_df, index=['date','last_dir'], values='impressions', aggfunc='sum').reset_index()
monthly_page_imp['last_dir'] = monthly_page_imp['last_dir'].str.replace(r'\.html$', '', regex=True)

Top pages monthly (impressions)

This chart can be much more useful with larger sites (this one has only ~50 pages)
Having URLs split, we can see how segments’ traffic have evoleved over time dir_1, dir_2, or any combination

Code

fig = adviz.racing_chart(
    monthly_page_imp[['last_dir', 'impressions', 'date']],
    n=20,
    title='Top 15 pages monthly (impressions)',
    height=850, theme='flatly')
fig.layout.margin.l = 200
fig.layout.yaxis.title = None
fig.layout.yaxis.tickfont.size = 16
fig

Query count - monthly

Code

# query_count = pd.pivot_table(query_df, index='date', values='query', aggfunc='count').reset_index()

Code

query_count = pd.pivot_table(
    query_df,
    index='date',
    values='query',
    aggfunc=pd.Series.nunique).reset_index()

Code

fig = px.line(
    query_count,
    x='date',
    y='query',
    markers=True,
    template='flatly',
    height=600,
    title='Monthly query count')
fig.data[0].marker.size = 20
fig.layout.yaxis.title = 'count'

fig

URL count - monthly

Code

page_count = pd.pivot_table(
    page_df,
    index='date',
    values='page',
    aggfunc=pd.Series.nunique).reset_index()

Code

fig = px.line(
    page_count,
    x='date',
    y='page',
    markers=True,
    height=600,
    template='flatly',
    title='Monthly page count')
fig.data[0].marker.size = 20
fig.layout.yaxis.title = 'count'

fig

Correlation matrix

Code

fig = px.scatter_matrix(imp_clicks, height=800, template='flatly')
fig.update_traces(diagonal_visible=False, showupperhalf=False)
fig