In [1]:
import datetime as dt
from io import BytesIO
from urllib2 import urlopen

import numpy as np
import pandas as pd
import holoviews as hv

from matplotlib.image import imread
from mpl_toolkits.basemap import Basemap

hv.notebook_extension('bokeh', width=90)
HoloViewsJS, BokehJS successfully loaded in this cell.

In this little demo we'll have a look at using the HoloViews DataFrame support and Bokeh backend to explore some real world data. This demo first appeared on Philipp Rudiger's blog, but this official example will be kept up to date.

Loading data

First we extract shape coordinates for the continents and countries from matplotlib's basemap toolkit and put them inside a Polygons and Contours Element respectively.

In [2]:
basemap = Basemap()
kdims = ['Longitude', 'Latitude']
continents = hv.Polygons([poly.get_coords() for poly in basemap.landpolygons],
                         group='Continents', kdims=kdims)
countries  = hv.Contours([np.array(country) for path in basemap._readboundarydata('countries')
                         for country in path if not isinstance(country, int)],
                         group='Countries', kdims=kdims)

Additionally we can load an satellite image of earth. Unfortunately embedding large images in the notebook using bokeh quickly balloons the size of the notebook so we'll downsample by a factor of 5x here:

In [3]:
img = basemap.bluemarble()
blue_marble = hv.RGB(np.flipud(img.get_array()[::5, ::5]),
                     bounds=(-180, -90, 180, 90), kdims=kdims)

Finally we download a few months worth of earthquake data from the US Geological survey (USGS), which provides a convenient web API and read it into a pandas DataFrame. For a full reference of the USGS API look here.

In [4]:
# Generate a valid query to the USGS API and let pandas handle the loading and parsing of dates 
query = dict(starttime="2014-12-01", endtime="2014-12-31")
query_string = '&'.join('{0}={1}'.format(k, v) for k, v in query.items())
query_url = "http://earthquake.usgs.gov/fdsnws/event/1/query.csv?" + query_string
df = pd.read_csv(BytesIO(urlopen(query_url).read()),
                 parse_dates=['time'], index_col='time',
                 infer_datetime_format=True)
df['Date'] = [str(t)[:19] for t in df.index]

# Pass the earthquake dataframe into the HoloViews Element
earthquakes = hv.Points(df, kdims=['longitude', 'latitude'],
                        vdims=['place', 'Date', 'depth', 'mag', 'rms'],
                        group='Earthquakes')

Let's have a look at what this data looks like:

In [5]:
df.head(2)
Out[5]:
latitude longitude depth mag magType nst gap dmin rms net ... place type horizontalError depthError magError magNst status locationSource magSource Date
time
2014-12-30 23:55:24.640 35.953833 -117.736 1.86 1.26 ml 11 84.000000 0.06565 0.16 ci ... 22km ESE of Coso Junction, California earthquake NaN 0.9 0.149 9 reviewed ci ci 2014-12-30 23:55:24
2014-12-30 23:53:45.000 63.259100 -150.563 118.00 1.50 ml 10 86.399993 NaN 0.41 ak ... 82km W of Cantwell, Alaska earthquake 0.9 0.7 NaN NaN reviewed ak ak 2014-12-30 23:53:45

2 rows × 22 columns

And get a summary overview of the data:

In [6]:
df.describe()
Out[6]:
latitude longitude depth mag nst gap dmin rms horizontalError depthError magError magNst
count 8578.000000 8578.000000 8578.000000 8578.000000 5337.000000 6995.000000 5951.000000 8539.000000 8126.000000 8396.000000 5760.000000 6066.000000
mean 38.933693 -103.397180 29.703699 1.863685 17.294922 123.306146 0.868665 0.346618 2.654624 3.178976 0.173132 17.500165
std 21.575080 77.974554 66.568730 1.305513 14.233448 68.831121 2.890818 0.332336 4.067740 5.690687 0.125748 32.494206
min -70.404900 -179.993600 -3.500000 -0.970000 2.000000 9.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000
25% 35.559300 -148.170950 3.811350 0.930000 8.000000 71.000000 0.022765 0.080000 0.300000 0.450000 0.097000 4.000000
50% 38.822334 -121.154000 9.005000 1.500000 13.000000 105.000000 0.092340 0.210000 0.600000 0.900000 0.152000 9.000000
75% 54.184450 -116.004750 24.100000 2.400000 21.000000 161.499994 0.308561 0.550000 4.100000 3.849750 0.216000 18.000000
max 78.922500 179.991000 639.640000 6.600000 138.000000 352.800000 44.512000 3.000000 50.500000 86.000000 1.080000 548.000000

That's almost 9,000 data points, which should be no problem to load and render in memory. In a future blog post we'll look at loading and dynamically displaying several years worth of data using dask out-of-memory DataFrames.

Styling our plots

Next we define some style options, in particular we map the size and color of our points to the magnitude.

In [7]:
%output size=150
%opts Overlay [width=800]
%opts Points.Earthquakes [color_index=5 size_index=5 scaling_factor=1.5] (cmap='hot_r' size=1)
%opts Polygons.Continents (color='k')
%opts Contours.Countries (color='white')

Explore the data

We'll overlay the earthquake data on top of the 'Blue Marble' image we loaded previous, we'll also enable the hover tool so we can access some more information on each point:

In [9]:
%%opts Points.Earthquakes [tools=['hover']]
blue_marble * earthquakes
Out[9]:

Earthquakes by day

Using groupby we can split our DataFrame up by day and using datetime we can generate date strings which we'll use as keys in a HoloMap, allowing us to visualize earthquakes for each day.

In [10]:
daily_df = df.groupby([df.index.year, df.index.month, df.index.day])
daily_earthquakes = hv.HoloMap(kdims=['Date'])
for date, data in daily_df:
    date = str(dt.date(*date))
    daily_earthquakes[date] = (continents * countries *
                               hv.Points(data, kdims=['longitude', 'latitude'],
                                         vdims=['mag'], group='Earthquakes'))

If you're trying this notebook out in a live notebook you can set:

%output widgets='live'

here to update the data dynamically. Since we're embedding this data here we'll only display every third date.

In [11]:
%%output holomap='scrubber'
%%opts Overlay [width=800] Points.Earthquakes [color_index=2 size_index=2] 
daily_earthquakes[::3]
Out[11]:


Once Loop Reflect

Using some pandas magic we can also resample the data and smooth it a little bit to see the frequency of earthquakes over time.

In [12]:
%%opts Curve [width=600] Spikes [spike_length=4] (line_width=0.1)
df['count'] = 1
hourly_counts = pd.rolling_mean(df.resample('3H', how='count'), 5).reset_index()
hv.Curve(hourly_counts, kdims=['time'], vdims=['count']) *\
hv.Spikes(df.reset_index(), kdims=['time'], vdims=[])
Out[12]:

Update: Linked data and widgets

Another feature I've been playing with is automatic sharing of the data across plots, which automatically allows linked brushing and selecting. Here's a first quick demo of what this can look like. The only thing we need to do when adding a linked Element such as a Table is to ensure it draws from the same DataFrame as the other Elements we want to link it with. Using the 'lasso_select' tool we can select only a subregion of points and watch our selection get highlighted in the Table. In reverse we can also highlight rows in the Table and watch our selection appear in the plot, even editing is allowed.

In [13]:
%%opts Points.Earthquakes [tools=['lasso_select']] Overlay [width=800 height=400] Table [width=800]
(blue_marble * earthquakes + hv.Table(earthquakes.data, kdims=['Date', 'latitude', 'longitude'], vdims=['depth', 'mag'])).cols(1)
Out[13]:

Linking plots in this way is a very powerful way to explore high-dimensional data. Here we'll add an Overlay split into tabs plotting the magnitude, RMS and depth value against each other. By linking that with the familiar map, we can easily explore how the geographical location relates to these other values.

In [14]:
%%opts Points [height=250 width=400 tools=['lasso_select', 'box_select']] (unselected_color='indianred')
%%opts Overlay [width=500 height=300] Overlay.Combinations [tabs=True]
from itertools import combinations
dim_combos = combinations(['mag', 'depth', 'rms'], 2)
(blue_marble * earthquakes +
 hv.Overlay([hv.Points(earthquakes.data, kdims=[c1, c2], group='%s_%s' % (c1, c2))
            for c1, c2 in dim_combos], group='Combinations')).cols(2)
Out[14]:

That's it for this demo.