import datetime as dt
from io import BytesIO
from urllib2 import urlopen

import numpy as np
import pandas as pd
import holoviews as hv

from matplotlib.image import imread
from mpl_toolkits.basemap import Basemap

hv.notebook_extension('bokeh', width=90)

In this little demo we'll have a look at using the HoloViews DataFrame support and Bokeh backend to explore some real world data. This demo first appeared on Philipp Rudiger's blog, but this official example will be kept up to date.

Loading data¶

First we extract shape coordinates for the continents and countries from matplotlib's basemap toolkit and put them inside a Polygons and Contours Element respectively.

basemap = Basemap()
kdims = ['Longitude', 'Latitude']
continents = hv.Polygons([poly.get_coords() for poly in basemap.landpolygons],
                         group='Continents', kdims=kdims)
countries  = hv.Contours([np.array(country) for path in basemap._readboundarydata('countries')
                         for country in path if not isinstance(country, int)],
                         group='Countries', kdims=kdims)

Additionally we can load an satellite image of earth. Unfortunately embedding large images in the notebook using bokeh quickly balloons the size of the notebook so we'll downsample by a factor of 5x here:

img = basemap.bluemarble()
blue_marble = hv.RGB(np.flipud(img.get_array()[::5, ::5]),
                     bounds=(-180, -90, 180, 90), kdims=kdims)

Finally we download a few months worth of earthquake data from the US Geological survey (USGS), which provides a convenient web API and read it into a pandas DataFrame. For a full reference of the USGS API look here.

# Generate a valid query to the USGS API and let pandas handle the loading and parsing of dates 
query = dict(starttime="2014-12-01", endtime="2014-12-31")
query_string = '&'.join('{0}={1}'.format(k, v) for k, v in query.items())
query_url = "http://earthquake.usgs.gov/fdsnws/event/1/query.csv?" + query_string
df = pd.read_csv(BytesIO(urlopen(query_url).read()),
                 parse_dates=['time'], index_col='time',
                 infer_datetime_format=True)
df['Date'] = [str(t)[:19] for t in df.index]

# Pass the earthquake dataframe into the HoloViews Element
earthquakes = hv.Points(df, kdims=['longitude', 'latitude'],
                        vdims=['place', 'Date', 'depth', 'mag', 'rms'],
                        group='Earthquakes')

Let's have a look at what this data looks like:

df.head(2)

And get a summary overview of the data:

df.describe()

That's almost 9,000 data points, which should be no problem to load and render in memory. In a future blog post we'll look at loading and dynamically displaying several years worth of data using dask out-of-memory DataFrames.

Styling our plots¶

Next we define some style options, in particular we map the size and color of our points to the magnitude.

%output size=150
%opts Overlay [width=800]
%opts Points.Earthquakes [color_index=5 size_index=5 scaling_factor=1.5] (cmap='hot_r' size=1)
%opts Polygons.Continents (color='k')
%opts Contours.Countries (color='white')

Explore the data¶

We'll overlay the earthquake data on top of the 'Blue Marble' image we loaded previous, we'll also enable the hover tool so we can access some more information on each point:

%%opts Points.Earthquakes [tools=['hover']]
blue_marble * earthquakes

Earthquakes by day¶

Using groupby we can split our DataFrame up by day and using datetime we can generate date strings which we'll use as keys in a HoloMap, allowing us to visualize earthquakes for each day.

daily_df = df.groupby([df.index.year, df.index.month, df.index.day])
daily_earthquakes = hv.HoloMap(kdims=['Date'])
for date, data in daily_df:
    date = str(dt.date(*date))
    daily_earthquakes[date] = (continents * countries *
                               hv.Points(data, kdims=['longitude', 'latitude'],
                                         vdims=['mag'], group='Earthquakes'))

If you're trying this notebook out in a live notebook you can set:

%output widgets='live'

here to update the data dynamically. Since we're embedding this data here we'll only display every third date.

%%output holomap='scrubber'
%%opts Overlay [width=800] Points.Earthquakes [color_index=2 size_index=2] 
daily_earthquakes[::3]

Using some pandas magic we can also resample the data and smooth it a little bit to see the frequency of earthquakes over time.

%%opts Curve [width=600] Spikes [spike_length=4] (line_width=0.1)
df['count'] = 1
hourly_counts = pd.rolling_mean(df.resample('3H', how='count'), 5).reset_index()
hv.Curve(hourly_counts, kdims=['time'], vdims=['count']) *\
hv.Spikes(df.reset_index(), kdims=['time'], vdims=[])

Update: Linked data and widgets¶

Another feature I've been playing with is automatic sharing of the data across plots, which automatically allows linked brushing and selecting. Here's a first quick demo of what this can look like. The only thing we need to do when adding a linked Element such as a Table is to ensure it draws from the same DataFrame as the other Elements we want to link it with. Using the 'lasso_select' tool we can select only a subregion of points and watch our selection get highlighted in the Table. In reverse we can also highlight rows in the Table and watch our selection appear in the plot, even editing is allowed.

%%opts Points.Earthquakes [tools=['lasso_select']] Overlay [width=800 height=400] Table [width=800]
(blue_marble * earthquakes + hv.Table(earthquakes.data, kdims=['Date', 'latitude', 'longitude'], vdims=['depth', 'mag'])).cols(1)

Linking plots in this way is a very powerful way to explore high-dimensional data. Here we'll add an Overlay split into tabs plotting the magnitude, RMS and depth value against each other. By linking that with the familiar map, we can easily explore how the geographical location relates to these other values.

%%opts Points [height=250 width=400 tools=['lasso_select', 'box_select']] (unselected_color='indianred')
%%opts Overlay [width=500 height=300] Overlay.Combinations [tabs=True]
from itertools import combinations
dim_combos = combinations(['mag', 'depth', 'rms'], 2)
(blue_marble * earthquakes +
 hv.Overlay([hv.Points(earthquakes.data, kdims=[c1, c2], group='%s_%s' % (c1, c2))
            for c1, c2 in dim_combos], group='Combinations')).cols(2)

That's it for this demo.

	latitude	longitude	depth	mag	magType	nst	gap	dmin	rms	net	...	place	type	horizontalError	depthError	magError	magNst	status	locationSource	magSource	Date
time
2014-12-30 23:55:24.640	35.953833	-117.736	1.86	1.26	ml	11	84.000000	0.06565	0.16	ci	...	22km ESE of Coso Junction, California	earthquake	NaN	0.9	0.149	9	reviewed	ci	ci	2014-12-30 23:55:24
2014-12-30 23:53:45.000	63.259100	-150.563	118.00	1.50	ml	10	86.399993	NaN	0.41	ak	...	82km W of Cantwell, Alaska	earthquake	0.9	0.7	NaN	NaN	reviewed	ak	ak	2014-12-30 23:53:45

	latitude	longitude	depth	mag	nst	gap	dmin	rms	horizontalError	depthError	magError	magNst
count	8578.000000	8578.000000	8578.000000	8578.000000	5337.000000	6995.000000	5951.000000	8539.000000	8126.000000	8396.000000	5760.000000	6066.000000
mean	38.933693	-103.397180	29.703699	1.863685	17.294922	123.306146	0.868665	0.346618	2.654624	3.178976	0.173132	17.500165
std	21.575080	77.974554	66.568730	1.305513	14.233448	68.831121	2.890818	0.332336	4.067740	5.690687	0.125748	32.494206
min	-70.404900	-179.993600	-3.500000	-0.970000	2.000000	9.000000	0.000000	0.000000	0.000000	0.100000	0.000000	0.000000
25%	35.559300	-148.170950	3.811350	0.930000	8.000000	71.000000	0.022765	0.080000	0.300000	0.450000	0.097000	4.000000
50%	38.822334	-121.154000	9.005000	1.500000	13.000000	105.000000	0.092340	0.210000	0.600000	0.900000	0.152000	9.000000
75%	54.184450	-116.004750	24.100000	2.400000	21.000000	161.499994	0.308561	0.550000	4.100000	3.849750	0.216000	18.000000
max	78.922500	179.991000	639.640000	6.600000	138.000000	352.800000	44.512000	3.000000	50.500000	86.000000	1.080000	548.000000