import datetime as dt
from io import BytesIO
from urllib2 import urlopen
import numpy as np
import pandas as pd
import holoviews as hv
from matplotlib.image import imread
from mpl_toolkits.basemap import Basemap
hv.notebook_extension('bokeh', width=90)
In this little demo we'll have a look at using the HoloViews DataFrame support and Bokeh backend to explore some real world data. This demo first appeared on Philipp Rudiger's blog, but this official example will be kept up to date.
First we extract shape coordinates for the continents and countries from matplotlib's basemap
toolkit and put them inside a Polygons
and Contours
Element respectively.
basemap = Basemap()
kdims = ['Longitude', 'Latitude']
continents = hv.Polygons([poly.get_coords() for poly in basemap.landpolygons],
group='Continents', kdims=kdims)
countries = hv.Contours([np.array(country) for path in basemap._readboundarydata('countries')
for country in path if not isinstance(country, int)],
group='Countries', kdims=kdims)
Additionally we can load an satellite image of earth. Unfortunately embedding large images in the notebook using bokeh quickly balloons the size of the notebook so we'll downsample by a factor of 5x here:
img = basemap.bluemarble()
blue_marble = hv.RGB(np.flipud(img.get_array()[::5, ::5]),
bounds=(-180, -90, 180, 90), kdims=kdims)
Finally we download a few months worth of earthquake data from the US Geological survey (USGS), which provides a convenient web API and read it into a pandas DataFrame. For a full reference of the USGS API look here.
# Generate a valid query to the USGS API and let pandas handle the loading and parsing of dates
query = dict(starttime="2014-12-01", endtime="2014-12-31")
query_string = '&'.join('{0}={1}'.format(k, v) for k, v in query.items())
query_url = "http://earthquake.usgs.gov/fdsnws/event/1/query.csv?" + query_string
df = pd.read_csv(BytesIO(urlopen(query_url).read()),
parse_dates=['time'], index_col='time',
infer_datetime_format=True)
df['Date'] = [str(t)[:19] for t in df.index]
# Pass the earthquake dataframe into the HoloViews Element
earthquakes = hv.Points(df, kdims=['longitude', 'latitude'],
vdims=['place', 'Date', 'depth', 'mag', 'rms'],
group='Earthquakes')
Let's have a look at what this data looks like:
df.head(2)
And get a summary overview of the data:
df.describe()
That's almost 9,000 data points, which should be no problem to load and render in memory. In a future blog post we'll look at loading and dynamically displaying several years worth of data using dask out-of-memory DataFrames.
Next we define some style options, in particular we map the size and color of our points to the magnitude.
%output size=150
%opts Overlay [width=800]
%opts Points.Earthquakes [color_index=5 size_index=5 scaling_factor=1.5] (cmap='hot_r' size=1)
%opts Polygons.Continents (color='k')
%opts Contours.Countries (color='white')
We'll overlay the earthquake data on top of the 'Blue Marble' image we loaded previous, we'll also enable the hover tool so we can access some more information on each point:
%%opts Points.Earthquakes [tools=['hover']]
blue_marble * earthquakes
Using groupby
we can split
our DataFrame up by day and using datetime
we can generate date strings which we'll use as keys in a HoloMap
, allowing us to visualize earthquakes for each day.
daily_df = df.groupby([df.index.year, df.index.month, df.index.day])
daily_earthquakes = hv.HoloMap(kdims=['Date'])
for date, data in daily_df:
date = str(dt.date(*date))
daily_earthquakes[date] = (continents * countries *
hv.Points(data, kdims=['longitude', 'latitude'],
vdims=['mag'], group='Earthquakes'))
If you're trying this notebook out in a live notebook you can set:
%output widgets='live'
here to update the data dynamically. Since we're embedding this data here we'll only display every third date.
%%output holomap='scrubber'
%%opts Overlay [width=800] Points.Earthquakes [color_index=2 size_index=2]
daily_earthquakes[::3]
Using some pandas magic we can also resample the data and smooth it a little bit to see the frequency of earthquakes over time.
%%opts Curve [width=600] Spikes [spike_length=4] (line_width=0.1)
df['count'] = 1
hourly_counts = pd.rolling_mean(df.resample('3H', how='count'), 5).reset_index()
hv.Curve(hourly_counts, kdims=['time'], vdims=['count']) *\
hv.Spikes(df.reset_index(), kdims=['time'], vdims=[])
Another feature I've been playing with is automatic sharing of the data across plots, which automatically allows linked brushing and selecting. Here's a first quick demo of what this can look like. The only thing we need to do when adding a linked Element such as a Table
is to ensure it draws from the same DataFrame
as the other Elements we want to link it with. Using the 'lasso_select' tool we can select only a subregion of points and watch our selection get highlighted in the Table. In reverse we can also highlight rows in the Table and watch our selection appear in the plot, even editing is allowed.
%%opts Points.Earthquakes [tools=['lasso_select']] Overlay [width=800 height=400] Table [width=800]
(blue_marble * earthquakes + hv.Table(earthquakes.data, kdims=['Date', 'latitude', 'longitude'], vdims=['depth', 'mag'])).cols(1)
Linking plots in this way is a very powerful way to explore high-dimensional data. Here we'll add an Overlay split into tabs plotting the magnitude, RMS and depth value against each other. By linking that with the familiar map, we can easily explore how the geographical location relates to these other values.
%%opts Points [height=250 width=400 tools=['lasso_select', 'box_select']] (unselected_color='indianred')
%%opts Overlay [width=500 height=300] Overlay.Combinations [tabs=True]
from itertools import combinations
dim_combos = combinations(['mag', 'depth', 'rms'], 2)
(blue_marble * earthquakes +
hv.Overlay([hv.Points(earthquakes.data, kdims=[c1, c2], group='%s_%s' % (c1, c2))
for c1, c2 in dim_combos], group='Combinations')).cols(2)
That's it for this demo.