In this notebook, we are going to plot the stock prices to demonstrate the following in Altair.

How to:

create compound charts
create selections i.e cross filtering capabilities

The data was downloaded from Yahoo Finance using R.

import pandas as pd
import altair as alt

Load Data¶

We will load the data and print some of its rows.

df = pd.read_csv("data.csv", parse_dates=['date'])
df.head(5)

Print the column info and associated datatype

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18523 entries, 0 to 18522
Data columns (total 9 columns):
Unnamed: 0    18523 non-null int64
symbol        18523 non-null object
date          18523 non-null datetime64[ns]
open          18523 non-null float64
high          18523 non-null float64
low           18523 non-null float64
close         18523 non-null float64
volume        18523 non-null float64
adjusted      18523 non-null float64
dtypes: datetime64[ns](1), float64(6), int64(1), object(1)
memory usage: 1.3+ MB

How many ticker/symbols?

df['symbol'].unique()

array(['AAPL', 'GOOG', 'AMZN', 'TSLA', 'BTC-USD'], dtype=object)

What is the data range?

print(f"Min Date: {df.date.min()}, Max Date: {df.date.max()}")

Min Date: 2001-01-02 00:00:00, Max Date: 2020-07-24 00:00:00

# We are only going to use 2020 data
df = df.loc[df.date > "2020-01-01", :]

Print statistics of the numerical columns

# print stats
df.describe()

def find_data(df, symbol):
    result = df.copy().loc[df['symbol'] == symbol, :]
    return result.loc[:, ['symbol', 'date', 'adjusted', 'pct_change', 'volume']]

Actually, we are only going to use APPL in this notebook.

aapl_df = find_data(df, 'AAPL')
aapl_df.head()

Plots¶

We will start by creating individual plots and then add the compound ones.

Daily Volume¶

We start by plotting the daily volume of Apple in 2020

aapl_bar = alt.Chart(aapl_df).mark_bar().encode(
    alt.X('date:T', title=""),
    alt.Y('volume', title="Volume")
).properties(    
    width=700,
    title="Apple: Daily Volume 2020"
)
aapl_bar

Similarly, we can plot the price (adjusted price) too.

aapl_line = alt.Chart(aapl_df).mark_line().encode(
    alt.X('date:T', title=""),
    alt.Y('adjusted', title="Price")
).properties(    
    width=700,
    height=200,
    title="Apple: Daily Price 2020"
)
aapl_line

Compound Chart¶

Let us add them together using layers. Here, we are plotting Price and Volume in the same chart but separate y-axis (y1, y2).

base = alt.Chart(aapl_df).mark_line().encode(
    alt.X('date:T', title=""),
)

bar = base.mark_bar().encode(
    alt.Y('volume', title="Volume"),    
    alt.Tooltip(['date', 'volume', 'adjusted'])    
)

line = base.mark_line(color='orange').encode(
    alt.Y('adjusted', title="Price"),
    alt.Tooltip(['date', 'volume', 'adjusted'])
)

alt.layer(bar, line).resolve_scale(
        y='independent').properties(
        title="Apple: Price and Volume Chart",
        width=600)

The panic sell-off in March is very evident in this plot.

Since we have the data from a few stocks, it is a better idea to create functions to create the plot.

def create_base(stock_df):
    base = alt.Chart(stock_df[stock_df['date'] > '2020-01-01']).mark_line().encode(
        alt.X('date:T', title=""),
    )
    return base

def plot_line(stock_df, base, color='magenta', width=700, height=400, date_labels=None, 
         xlab="Date", y1_lab="Volume", y2_lab="Price", y1_domain=None, y2_domain=None, 
         label_angle=45): 

    if date_labels is None:
        date_labels = list(stock_df.date.unique())
    if y1_domain is None:
        y1_domain = (stock_df.volume.min(), stock_df.volume.max())
    
    if y2_domain is None:
        y2_domain = (stock_df.adjusted.min(), stock_df.adjusted.max())
        
    chart = base.mark_line(color=color).encode(
        alt.Y("adjusted:Q", title=y2_lab, scale=alt.Scale(domain=y2_domain, )),        
    ).properties(height=height, width=width)
    
    return chart


def plot_bar(stock_df, base, color='magenta', width=700, height=400, date_labels=None, 
         xlab="Date", y1_lab="Volume", y2_lab="Price", y1_domain=None, y2_domain=None, 
         label_angle=45): 

    if date_labels is None:
        date_labels = list(stock_df.date.unique())
    if y1_domain is None:
        y1_domain = (stock_df.volume.min(), stock_df.volume.max())
    
    if y2_domain is None:
        y2_domain = (stock_df.adjusted.min(), stock_df.adjusted.max())
        
    chart = base.mark_bar(opacity=0.7).encode(
        alt.Y('volume:Q', title=y1_lab, scale=alt.Scale(domain=y1_domain), axis=alt.Axis(format='s')),        
    ).properties(height=height, width=width)    
    
    return chart

Another Price + Volume Chart¶

Let us now create the price and volume chart, which is more common. We concatenate the plots using vconcat and adjust the heights. The bottom chart is interactive, meaning you can use the mouse to zoom in and out.

aapl_base = create_base(aapl_df)
aapl_line = plot_line(aapl_df, aapl_base, color="orange", xlab="", label_angle=0, width=700, height=200)
aapl_bar = plot_bar(aapl_df, aapl_base, label_angle=0, width=700, height=80)
alt.vconcat(aapl_line, aapl_bar.interactive()).properties(
        title="APPLE: 2020 Price Volume Chart"        
    )

The zoom in/out is of litte use in this plot. so, let us focus on the selections. Selections allow us to highlight a certain part of the plot.

Cross Filtering¶

We are going to add brush selection to the price chart and change the plot colors based on user selection.

💡 You can use your mouse to select a part of the line chart to see it in action.

brush = alt.selection_interval(encodings=['x'])
color = alt.condition(brush,
                      # adding date as color feels like a hack, and that also causes 
                      # the blue gradient on the bar chart
                      alt.Color('date:T', legend=None),
                      alt.value('lightgray'))


upper = aapl_line.add_selection(brush)
lower = aapl_bar.encode(alt.Y('volume:Q'), color=color)

alt.vconcat(upper, lower).properties(
        title="APPLE: 2020 Price Volume Chart"        
    )

More Datasources¶

We can now start to bring in other data sources such as covid-19 data (source: our world in data).

covid_cases = pd.read_csv("owid-covid-data.csv", parse_dates=['date'])
covid_cases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49668 entries, 0 to 49667
Data columns (total 41 columns):
iso_code                           49381 non-null object
continent                          49094 non-null object
location                           49668 non-null object
date                               49668 non-null datetime64[ns]
total_cases                        49032 non-null float64
new_cases                          48809 non-null float64
new_cases_smoothed                 48027 non-null float64
total_deaths                       49032 non-null float64
new_deaths                         48809 non-null float64
new_deaths_smoothed                48027 non-null float64
total_cases_per_million            48745 non-null float64
new_cases_per_million              48745 non-null float64
new_cases_smoothed_per_million     47962 non-null float64
total_deaths_per_million           48745 non-null float64
new_deaths_per_million             48745 non-null float64
new_deaths_smoothed_per_million    47962 non-null float64
new_tests                          18016 non-null float64
total_tests                        18435 non-null float64
total_tests_per_thousand           18435 non-null float64
new_tests_per_thousand             18016 non-null float64
new_tests_smoothed                 20379 non-null float64
new_tests_smoothed_per_thousand    20379 non-null float64
tests_per_case                     18772 non-null float64
positive_rate                      19232 non-null float64
tests_units                        21241 non-null object
stringency_index                   41070 non-null float64
population                         49381 non-null float64
population_density                 47109 non-null float64
median_age                         44260 non-null float64
aged_65_older                      43599 non-null float64
aged_70_older                      44030 non-null float64
gdp_per_capita                     43681 non-null float64
extreme_poverty                    29130 non-null float64
cardiovasc_death_rate              44247 non-null float64
diabetes_prevalence                45835 non-null float64
female_smokers                     34603 non-null float64
male_smokers                       34162 non-null float64
handwashing_facilities             20779 non-null float64
hospital_beds_per_thousand         39933 non-null float64
life_expectancy                    48754 non-null float64
human_development_index            42705 non-null float64
dtypes: datetime64[ns](1), float64(36), object(4)
memory usage: 15.5+ MB

covid_cases.head(1)

Covid Related Cases & Deaths¶

covid_deaths_plot = alt.Chart(covid_cases.groupby(['continent', 'date']).agg(sum).reset_index()).mark_area(opacity=0.7, clip=True, color='red').encode(
    alt.X('date:T', scale=alt.Scale(domain=(aapl_df.date.min(), aapl_df.date.max()))),
    alt.Y('total_deaths:Q', title="Total Deaths", axis=alt.Axis(format='s')),
    color='continent:N'
).properties(
    title="Total deaths due to Covid-19",
    width=700
)
covid_deaths_plot

# Total Deaths (not grouped by continent anymore)
covid_deaths_plot = alt.Chart(covid_cases.groupby(['date']).agg(sum).reset_index()).mark_bar(opacity=0.7, clip=True, color='red').encode(
    alt.X('date:T', scale=alt.Scale(domain=(aapl_df.date.min(), aapl_df.date.max()))),
    alt.Y('total_deaths:Q', title="Total Deaths", axis=alt.Axis(format='s')),
    color='continent:N'
).properties(
    title="Total deaths due to Covid-19",
    width=700
)

covid_cases_plot = alt.Chart(covid_cases.groupby(['continent','date']).agg(sum).reset_index()).mark_bar(opacity=0.8, clip=True, color='red').encode(
    alt.X('date:T', scale=alt.Scale(domain=(aapl_df.date.min(), aapl_df.date.max())), title=""),
    alt.Y('new_cases:Q', title="New Cases", axis=alt.Axis(format='s'), stack='normalize'),
    color='continent:N'
).properties(
    width=700
)
covid_cases_plot.interactive()

# Daily New Cases (not grouped by continent anymore)
covid_cases_plot = alt.Chart(covid_cases.groupby(['date']).agg(sum).reset_index()).mark_bar(opacity=0.7, clip=True, color='red').encode(
    alt.X('date:T', scale=alt.Scale(domain=(aapl_df.date.min(), aapl_df.date.max()))),
    alt.Y('new_cases:Q', title="New Cases", axis=alt.Axis(format='s')),
    color='continent:N'
).properties(
    width=700
)

Linking Plots¶

Now, we can add covid plots to the price volume chart to observe its impact on the stock price. As is known, the stock prices only took the beating in March 2020 and since then are on the rise.

# create selection
brush = alt.selection_interval(encodings=['x'])
color = alt.condition(brush,
                      # adding date as color feels like a hack, and that also causes 
                      # the blue gradient on the bar chart
                      alt.Color('date:T', legend=None),
                      alt.value('lightgray'))


upper = aapl_line.add_selection(brush)
lower = aapl_bar.encode(alt.Y('volume:Q', axis=alt.Axis(format='s')), color=color)
covid_deaths = covid_deaths_plot.properties(width=700, height=100).encode(
    alt.Y('total_deaths:Q', title="Covid Deaths", axis=alt.Axis(format='s')),
    color=color
)
covid_new_cases = covid_cases_plot.properties(width=700, height=100).encode(
    alt.Y('new_cases:Q', title="New Cases", axis=alt.Axis(format='s')),
    color=color
)

alt.vconcat(upper, lower, covid_deaths, covid_new_cases).properties(
        title="APPLE: 2020 Price Volume Chart"        
    )

You can now select a range on the price chart and see all other charts highligted accordingly.

Trellis Chart¶

And since we have the data for other stocks too, so we can create a trellis plot.

# same as the example here: https://altair-viz.github.io/gallery/trellis_area_sort_array.html

alt.Chart(df).transform_filter(
    alt.datum.symbol != 'BTC-USD'
).mark_area(opacity=0.7).encode(
    alt.X('date:T', title=""),
    alt.Y('adjusted:Q', title='Price'),
    alt.Tooltip(['date','adjusted', 'symbol']),    
    alt.Color('symbol:N',title=""),
    row=alt.Row('symbol:N', title=""),
).properties(title="2020: Adjusted Stock Price", height=100, width=700).interactive()

Yeah, You can hardly see AAPL & TSLA. Since we have added the interactivity (i.e .interactive()), you can zoom into the plots.

Question for you: "What else would you do to improve the trellis plots?"

Your Turn¶

Go Ahead and make some plots.

	Unnamed: 0	symbol	date	open	high	low	close	volume	adjusted
0	1	AAPL	2001-01-02	0.265625	0.272321	0.260045	0.265625	452312000.0	0.229537
1	2	AAPL	2001-01-03	0.258929	0.297991	0.257813	0.292411	817073600.0	0.252684
2	3	AAPL	2001-01-04	0.323940	0.330357	0.300223	0.304688	739396000.0	0.263292
3	4	AAPL	2001-01-05	0.302455	0.310268	0.286830	0.292411	412356000.0	0.252684
4	5	AAPL	2001-01-08	0.302455	0.303292	0.284598	0.295759	373699200.0	0.255577

	Unnamed: 0	open	high	low	close	volume	adjusted
count	769.000000	769.000000	769.000000	769.000000	769.000000	7.690000e+02	769.000000
mean	12942.535761	2991.745747	3047.534263	2936.710797	2996.093676	8.831951e+09	2996.039726
std	5031.284751	3509.525773	3572.568545	3445.237602	3513.739949	1.581048e+10	3513.784838
min	4780.000000	57.020000	57.125000	53.152500	56.092499	9.295000e+05	55.840385
25%	8841.000000	140.199997	147.953995	136.854004	141.126007	4.121900e+06	141.126007
50%	13812.000000	1449.160034	1465.430054	1432.469971	1451.859985	8.900750e+07	1451.859985
75%	18331.000000	6245.624512	6504.515137	5920.085938	6242.193848	1.571397e+10	6242.193848
max	18523.000000	10323.960938	10457.626953	10202.387695	10326.054688	7.415677e+10	10326.054688

	symbol	date	adjusted	pct_change	volume
4779	AAPL	2020-01-02	74.573036	NaN	135480400.0
4780	AAPL	2020-01-03	73.848030	NaN	146322800.0
4781	AAPL	2020-01-06	74.436470	NaN	118387200.0
4782	AAPL	2020-01-07	74.086395	NaN	108872000.0
4783	AAPL	2020-01-08	75.278160	NaN	132079200.0

▦ How to Create Compound Charts using Altair?

Example of the selection and compound chart