Country | Device category | Users | New users | Engaged sessions | Engagement rate | Engaged sessions per user | Average engagement time | Event count | Key events | User key event rate | Total revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | United States | desktop | 966 | 951 | 660 | 0.428016 | 0.683230 | 46.704969 | 8356 | 0 | 0 | 0 |
1 | India | desktop | 338 | 328 | 346 | 0.553600 | 1.023669 | 77.393491 | 3905 | 0 | 0 | 0 |
2 | United Kingdom | desktop | 249 | 240 | 271 | 0.558763 | 1.088353 | 116.955823 | 3708 | 0 | 0 | 0 |
The problem
Sometimes we have a lot of data on a chart. It can be a bit too busy and difficult to read. One way of improving this is to create a subset of the chart with some limits on the X and Y axes, to showcase the data that we want. We still don’t want to loose context and want to show how this subset relates to the full dataset.
The solution
Create two charts
- The full data (busy, difficult to read)
- The subset of the full data, right next to (above, below) the original chart
- Add a guide to the full chart (a rectangle for example) to clarify the focus area
Create the full chart
This is a simple chart showing Google Analytics data, plotting the number of users by country. There are 218 countries in this dataset, and plotting them all can be a bit difficult to read.
Setting limits on X and/or Y axes
Plotly Express provides very handy parameters that allow us to set those limits: range_x
and range_y
.
The below code shows only the data that falls between 3 and 7 on the X axis, and between 50 and 70 on the Y axis:
import plotly.express as px
import random
px.scatter(
x=list(range(1, 11)),
y=[random.randint(1, 100) for i in range(10)],
range_x=[3, 7],
range_y=[50, 70]
)
Setting the limits on our full chart.
fig = px.scatter(
ga,
x=list(range(1, len(ga)+1)),
y='Users',
hover_name='Country',
template='plotly_white',
height=550,
labels={'x': 'rank'},
range_x=[0, 25.5], # <-- set this, you can also set `range_y`
title='<b>Website Users by country')
fig.layout.title.subtitle.text = 'Easier to see the top 25, but can be misleading without the full data'
fig
fig = px.scatter(
ga,
x=list(range(1, len(ga)+1)),
y='Users',
hover_name='Country',
template='plotly_white',
height=550,
labels={'x': 'rank'},
range_x=[0, 25.5], # <-- set this, you can also set `range_y`
title='<b>Website Users by country')
fig.add_shape(type='line', x0=1, x1=25, y0=0, y1=1)
fig.add_shape(type='line', x0=0, x1=0, y0=47, y1=966, line={'width': 4})
fig.update_layout(xaxis_zeroline=False, yaxis_zeroline=False, xaxis_zerolinecolor='white')
fig.layout.title.subtitle.text = 'Easier to see the top 25, but can be misleading without the full data'
fig
Putting it all together
To make the charts easier to relate to one another, we can utilize Plotly’s subplots module and create a multi plot chart. The first one with the full data can take a smaller portion, just to make it clear what we are visualizing, and the zoomed-in chart can take a larger portion.
We can also utilize shapes to highlight the data subset that we are focusing on, and use the same highlight color in the zoomed in chart to make it easier to relate.
We will use proper labels of course.
Code
from plotly.subplots import make_subplots
fig = make_subplots(
cols=1,
rows=2,
row_heights=[0.3, 0.7],
subplot_titles=['<b>Website Users by Country</b>', 'Top 25 countries']
)
fig.add_scatter(
x=list(range(1, len(ga)+1)),
y=ga['Users'],
row=1, col=1,
name='All data',
mode='markers',
marker={'color': '#2c3e50'},
hovertext=ga['Country']
)
fig.add_shape(
row=1, col=1,
type='rect',
x0=0, y0=0, x1=25, y1=1000,
layer='below',
fillcolor='lightgray', line={'width': 0},
label={
'text': 'Top 25<br>countries',
'textposition': 'top center',
}
)
fig.add_scatter(
x=list(range(1, len(ga)+1)),
y=ga['Users'],
row=2, col=1,
name='Top 25',
mode='markers',
marker={'color': '#2c3e50'},
hovertext=ga['Country']
)
fig.add_shape(
row=2, col=1,
type='rect',
x0=0, y0=0, x1=25.5, y1=1000,
layer='below',
fillcolor='lightgray', line={'width': 0}
)
fig.update_xaxes(range=[0, 25.5], row=2, col=1, tickvals=[5, 10, 15, 20, 25], ticktext=[5, 10, 15, 20, 25])
fig.update_xaxes(range=[0, 225], row=1, col=1, tickvals=[50, 100, 150, 200], ticktext=[50, 100, 150, 200])
fig.layout.height = 800
fig.layout.template = "plotly_white"
fig.layout.showlegend = False
fig.update_xaxes(zeroline=False, title='Rank')
fig.update_yaxes(zeroline=False, title='Users')
fig