A developer account has be to be created on Google Developer Console with a project
The credentials you get from that project will be used to authenticate with the GSC API.
Get the data
This is just an example, and you can get any data, in any other combination of metrics and/or dimensions, and durations. In this case, I retreived monthly data for the year 2023, and got query data by country.
Code
last_days = ['-'.join(['2023', str(month).zfill(2), str(calendar.monthrange(2023, month)[1])]) for month in range(1, 10)]
query_month_reports = []
for date in last_days:
temp_report = webproperty.query.range(date, months=-1).dimension('country', 'query').get()
tempdf = temp_report.to_dataframe().assign(date=date)
query_month_reports.append(tempdf)
query_df = pd.concat(query_month_reports, ignore_index=True)
query_df.to_csv('country_query_month.csv')
Get URL data segmented by country
Code
page_month_reports = []
for date in last_days:
temp_report = webproperty.query.range(date, months=-1).dimension('page', 'country').get()
tempdf = temp_report.to_dataframe().assign(date=date)
page_month_reports.append(tempdf)
page_df = pd.concat(page_month_reports, ignore_index=True)
page_df.to_csv('page_per_month.csv')
Read the saved data
Data: monthly queries by country (clicks, impressions, ctr, position)
country | query | clicks | impressions | ctr | position | date | |
---|---|---|---|---|---|---|---|
0 | usa | advertools | 40 | 60 | 0.666667 | 1.016667 | 2023-01-31 |
1 | ind | advertools | 21 | 35 | 0.600000 | 1.000000 | 2023-01-31 |
2 | ind | log analysis using python | 17 | 28 | 0.607143 | 1.000000 | 2023-01-31 |
3 | gbr | advertools | 15 | 25 | 0.600000 | 1.000000 | 2023-01-31 |
4 | deu | advertools | 10 | 20 | 0.500000 | 1.000000 | 2023-01-31 |
... | ... | ... | ... | ... | ... | ... | ... |
205890 | zmb | video sitemaps | 0 | 1 | 0.000000 | 63.000000 | 2023-09-30 |
205891 | zwe | crawling seo | 0 | 1 | 0.000000 | 95.000000 | 2023-09-30 |
205892 | zwe | download sitemap | 0 | 1 | 0.000000 | 12.000000 | 2023-09-30 |
205893 | zwe | get sitemap xml | 0 | 1 | 0.000000 | 47.000000 | 2023-09-30 |
205894 | zwe | seo library | 0 | 1 | 0.000000 | 64.000000 | 2023-09-30 |
205895 rows × 7 columns
Monthly metrics by page and country
page | country | clicks | impressions | ctr | position | date | |
---|---|---|---|---|---|---|---|
0 | https://advertools.readthedocs.io/ | usa | 51 | 391 | 0.130435 | 47.304348 | 2023-01-31 |
1 | https://advertools.readthedocs.io/en/master/advertools.logs.html | ind | 49 | 688 | 0.071221 | 21.985465 | 2023-01-31 |
2 | https://advertools.readthedocs.io/ | ind | 28 | 108 | 0.259259 | 27.962963 | 2023-01-31 |
3 | https://advertools.readthedocs.io/en/master/advertools.logs.html | usa | 21 | 2350 | 0.008936 | 30.180851 | 2023-01-31 |
4 | https://advertools.readthedocs.io/ | gbr | 16 | 53 | 0.301887 | 32.056604 | 2023-01-31 |
... | ... | ... | ... | ... | ... | ... | ... |
24698 | https://advertools.readthedocs.io/en/master/readme.html | tun | 0 | 1 | 0.000000 | 1.000000 | 2023-09-30 |
24699 | https://advertools.readthedocs.io/en/master/readme.html | tur | 0 | 6 | 0.000000 | 16.666667 | 2023-09-30 |
24700 | https://advertools.readthedocs.io/en/master/readme.html | twn | 0 | 4 | 0.000000 | 42.250000 | 2023-09-30 |
24701 | https://advertools.readthedocs.io/en/master/readme.html | ukr | 0 | 2 | 0.000000 | 1.000000 | 2023-09-30 |
24702 | https://advertools.readthedocs.io/en/master/readme.html | vnm | 0 | 9 | 0.000000 | 15.333333 | 2023-09-30 |
24703 rows × 7 columns
Create interesting subsets of the data
Most frequent words
Code
top_words = adv.word_frequency(query_df['query'], query_df['impressions'],)
# top_words.head(20).style.bar(subset=['wtd_freq'], color='steelblue').format({'wtd_freq': '{:,}'})
adviz.style_table(
top_words.head(20),
column_types=['text', 'text', 'bar', 'text'],
column_widths=[0.15, 0.1, 0.4, 0.15],
theme='flatly',
width=900,
height=600)
The wtd_freq
column shows the total impressions of all queries containing the respective word. For example, “seo” has appeared in many different queries, the total impressions of which was 123,266.
We can now take the top words, create their respective regular expressions, and create special columns for each term.
Code
term_regex = [
('brand', 'advertool'),
('seo', 'seo'),
('python', 'python'),
('sitemap', 'sitemap|xml'),
('analysis', 'analy[sz]|analyt'),
('robots', 'robots'),
('crawl', 'crawl|scrap[ei]|spider'),
('log', '\blog(file)?'),
('search', 'search'),
('serp', 'serp'),
('google', 'google')
]
pd.DataFrame(term_regex, columns=['term', 'regex'])
term | regex | |
---|---|---|
0 | brand | advertool |
1 | seo | seo |
2 | python | python |
3 | sitemap | sitemap|xml |
4 | analysis | analy[sz]|analyt |
5 | robots | robots |
6 | crawl | crawl|scrap[ei]|spider |
7 | log | log(file)? |
8 | search | search |
9 | serp | serp |
10 |
Code
query | brand_term | seo_term | python_term | sitemap_term | analysis_term | robots_term | crawl_term | log_term | search_term | serp_term | google_term | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | advertools | True | False | False | False | False | False | False | False | False | False | False |
1 | advertools | True | False | False | False | False | False | False | False | False | False | False |
2 | log analysis using python | False | False | True | False | True | False | False | False | False | False | False |
3 | advertools | True | False | False | False | False | False | False | False | False | False | False |
4 | advertools | True | False | False | False | False | False | False | False | False | False | False |
Top terms/topics across all queries
Code
% | |
---|---|
python_term | 19.4% |
robots_term | 16.8% |
crawl_term | 14.7% |
analysis_term | 11.7% |
seo_term | 11.3% |
sitemap_term | 10.4% |
serp_term | 6.8% |
search_term | 5.1% |
google_term | 4.6% |
brand_term | 0.5% |
log_term | 0.0% |
Convert average positions to SERP page
Code
query | position | serp_page | |
---|---|---|---|
0 | etaospider | 32.000000 | 4 |
1 | word_tokenize | 7.000000 | 1 |
2 | revers dns lookup | 91.000000 | 10 |
3 | robot txt test | 47.000000 | 5 |
4 | robots file meaning | 96.000000 | 10 |
5 | get all urls from sitemap | 21.000000 | 3 |
6 | reverse dns lookup command | 63.500000 | 7 |
7 | seo web crawler | 27.428571 | 3 |
8 | xml sitemap | 81.500000 | 9 |
9 | keywords for permutations and combinations | 62.000000 | 7 |
Create pivot tables
Code
month_country_ranks = pd.pivot_table(query_df, index=['date', 'country'], values=['clicks', 'impressions'], aggfunc='sum').reset_index()
month_country_ranks['monthly_rank_clicks'] = month_country_ranks.groupby('date')['clicks'].rank(ascending=False)
month_country_ranks['flag'] = [adviz.flag(cc) for cc in month_country_ranks['country']]
# month_country_ranks.head()
All reports and charts are made using the full dataset. You can easily use any subset based on available and newly-created columns/criteria
Country comparison (monthly impressions)
- Click a flag(s) to add/remove countrie(s)
Code
fig = px.line(
monthly_impressions[monthly_impressions['country'].isin(top20_countries)],
x='date',
y='impressions',
color='flag',
template='flatly',
height=700,
hover_name='country')
fig.layout.legend.title.text = 'country'
for trace in fig.data:
trace.visible = 'legendonly'
trace.line.width = 4
fig.layout.legend.font.size = 30
fig.layout.legend.title.font.size = 15
fig.layout.hovermode = 'x unified'
fig.layout.hoverlabel.font.size = 15
fig.layout.xaxis.showgrid = False
# fig.layout.yaxis.showgrid = False
fig
Weighted word frequency - monthly (clicks)
- Total clicks for all keywords containing <word>.
- e.g.: in January, keywords containing “python” received a total of 323 clicks
Code
word_freq_dfs = []
for month in query_df['date'].drop_duplicates():
tempdf = query_df[query_df['date'].eq(month)]
query_metric = pd.pivot_table(tempdf, index='query', values='clicks', aggfunc='sum').reset_index()
word_freq_df = adv.word_frequency(query_metric['query'], query_metric['clicks']).head(20)[['word', 'wtd_freq']]
word_freq_df.insert(0, 'date', tempdf['date'].iloc[0])
word_freq_dfs.append(word_freq_df)
word_freq_click = pd.concat(word_freq_dfs)
Code
n = 15
fig = adviz.racing_chart(
word_freq_click[['word', 'wtd_freq', 'date']],
n=n,
height=800,
title=f'GSC Top {n} words per month - clicks',
theme='flatly')
fig.layout.yaxis.tickfont.size = 20
# for frame in fig.frames:
# frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig.layout.xaxis.title = 'weighted frequency'
fig
Weighted bigram frequency - monthly (clicks)
Code
word_freq_dfs_bi = []
for month in query_df['date'].drop_duplicates():
tempdf = query_df[query_df['date'].eq(month) ]
query_metric = pd.pivot_table(tempdf, index='query', values='clicks', aggfunc='sum').reset_index()
word_freq_df = adv.word_frequency(query_metric['query'], query_metric['clicks'], phrase_len=2).head(20)[['word', 'wtd_freq']]
word_freq_df.insert(0, 'date', tempdf['date'].iloc[0])
word_freq_dfs_bi.append(word_freq_df)
word_freq_clicks_bi = pd.concat(word_freq_dfs_bi)
Code
n = 15
fig = adviz.racing_chart(
word_freq_clicks_bi[['word', 'wtd_freq', 'date']],
n=n,
height=800,
title=f'GSC Top {n} bigrams per month - clicks',
theme='flatly')
fig.layout.yaxis.tickfont.size = 20
# for frame in fig.frames:
# frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig.layout.yaxis.title = 'bigram'
fig.layout.xaxis.title = 'weighted frequency'
fig
Code
# default_linewidth = 0.5
# highlighted_linewidth_delta = 2
# def update_trace(trace, points, selector):
# if len(points.point_inds) == 0:
# return
# for i,_ in enumerate(fig.data):
# fig.data[i]['line']['width'] = 4
# fig.data[i]['line']['color'] = 'red'
# # fig.data[i]['line']['width'] = default_linewidth + highlighted_linewidth_delta * (i == points.trace_index)
# fig = go.Figure()
# for country in monthly_impressions['country'].drop_duplicates():
# tempdf = monthly_impressions[monthly_impressions['country'].eq(country)]
# fig.add_scatter(
# x=tempdf['date'],
# y=tempdf['impressions'],
# name=tempdf['flag'].iloc[0],
# hoverlabel={'bgcolor': 'red'},
# mode='lines+markers',
# line={'width': 0.5},
# marker={
# 'color': 'gray', 'opacity': 0.01,
# })
# fig.layout.height = 700
# # fig.layout.yaxis.type = 'log'
# def hover_fn(trace):
# trace.color = 'red'
# for i in range(len(fig.data)):
# fig.data[i].on_click(hover_fn)
# fig
Racing chart - impressions
Code
n = 15
fig = adviz.racing_chart(
month_country_ranks[['flag', 'impressions', 'date']],
n=n,
height=800,
# width=900,
title=f'GSC Top {n} countries per month - impressions',
theme='flatly')
fig.layout.yaxis.tickfont.size = 25
# for frame in fig.frames:
# frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig
Monthly impressions - map
Code
fig = px.choropleth(
month_country_ranks,
color='impressions',
locations=month_country_ranks['country'].str.upper(),
animation_frame='date',
hover_name='flag',
template='flatly',
title='Google Search Console - Monthly Impressions by Country 2023',
projection='natural earth',
color_continuous_scale=flatly_custom_scale,
height=800)
fig.layout.geo.showframe = False
fig.layout.geo.lataxis.range = [-53, 76]
fig.layout.geo.lonaxis.range = [-137, 168]
fig.data[0].marker.line.color = 'gray'
fig
Racing chart - clicks
Code
n = 15
fig = adviz.racing_chart(
month_country_ranks[['flag', 'clicks', 'date']],
n=n,
height=800,
title=f'GSC Top {n} countries per month - clicks',
theme='flatly')
fig.layout.yaxis.tickfont.size = 25
# for frame in fig.frames:
# frame.data[0].marker.color = 'snow'
# fig.data[0].marker.color = 'snow'
fig
Monthly clicks - map
Code
fig = px.choropleth(
month_country_ranks,
color='clicks',
locations=month_country_ranks['country'].str.upper(),
animation_frame='date',
hover_name='flag',
template='flatly',
title='Google Search Console - Monthly Clicks by Country 2023',
projection='natural earth',
color_continuous_scale=flatly_custom_scale,
height=800)
fig.layout.geo.showframe = False
fig.layout.geo.lataxis.range = [-53, 76]
fig.layout.geo.lonaxis.range = [-137, 168]
fig.data[0].marker.line.color = 'gray'
fig
Monthly impressions ~ clicks by country
- Zoom in to any part of the chart to focus on a certain set of countries (simply drag your mouse across the region/flags of interest)
Code
df = month_country_ranks[month_country_ranks['monthly_rank_clicks'].lt(21)]
fig = px.scatter(
df,
x='clicks',
y='impressions',
animation_frame='date',
title='Monthly impressions ~ clicks by country (top 20)',
text='flag',
template='flatly',
range_x=(0, df['clicks'].max() * 1.1),
range_y=(0, df['impressions'].max() * 1.1),
height=600
)
for frame in fig.frames:
frame.data[0].textfont.size = 30
fig.frames[0].data[0].textfont.size = 30
for button in fig.layout.updatemenus[0].buttons:
button.visible = False
fig
Code
imp_clicks = pd.DataFrame({
'impressions': query_df.groupby('date')['impressions'].sum(),
'clicks': query_df.groupby('date')['clicks'].sum(),
}).reset_index().assign(ctr=lambda df: df['clicks'].div(df['impressions']))
imp_clicks['count'] = pd.pivot_table(query_df, index='date', values='impressions', aggfunc='count')['impressions'].rename('count').tolist()
Code
url_dir_df = adv.url_to_df(page_df['page']).filter(regex=r'dir_|last_dir')
page_url_df = pd.concat([page_df, url_dir_df], axis=1)
monthly_page_imp = pd.pivot_table(page_url_df, index=['date','last_dir'], values='impressions', aggfunc='sum').reset_index()
monthly_page_imp['last_dir'] = monthly_page_imp['last_dir'].str.replace(r'\.html$', '', regex=True)
Top pages monthly (impressions)
- This chart can be much more useful with larger sites (this one has only ~50 pages)
- Having URLs split, we can see how segments’ traffic have evoleved over time
dir_1
,dir_2
, or any combination