Event Study

While Alphalens is a tool designed to evaluate a cross-sectional signal which can be used to rank many securities each day, we can still make use of Alphalens returns analysis functions, a subset of Alphalens, to create a meaningful event study.

An event study is a statistical method to assess the impact of a particular event on the value of a stock. In this example we will evalute what happens to stocks whose price fall below 30$

Imports & Settings

[1]:
import warnings
warnings.filterwarnings('ignore')
[2]:
import alphalens
import pandas as pd
[3]:
%matplotlib inline

Load Data

Below is a simple mapping of tickers to sectors for a universe of 500 large cap stocks.

[4]:
tickers = ['ACN', 'ATVI', 'ADBE', 'AMD', 'AKAM', 'ADS', 'GOOGL', 'GOOG', 'APH', 'ADI', 'ANSS', 'AAPL',
           'AVGO', 'CA', 'CDNS', 'CSCO', 'CTXS', 'CTSH', 'GLW', 'CSRA', 'DXC', 'EBAY', 'EA', 'FFIV', 'FB',
           'FLIR', 'IT', 'GPN', 'HRS', 'HPE', 'HPQ', 'INTC', 'IBM', 'INTU', 'JNPR', 'KLAC', 'LRCX', 'MA', 'MCHP',
           'MSFT', 'MSI', 'NTAP', 'NFLX', 'NVDA', 'ORCL', 'PAYX', 'PYPL', 'QRVO', 'QCOM', 'RHT', 'CRM', 'STX',
           'AMG', 'AFL', 'ALL', 'AXP', 'AIG', 'AMP', 'AON', 'AJG', 'AIZ', 'BAC', 'BK', 'BBT', 'BRK.B', 'BLK', 'HRB',
           'BHF', 'COF', 'CBOE', 'SCHW', 'CB', 'CINF', 'C', 'CFG', 'CME', 'CMA', 'DFS', 'ETFC', 'RE', 'FITB', 'BEN',
           'GS', 'HIG', 'HBAN', 'ICE', 'IVZ', 'JPM', 'KEY', 'LUK', 'LNC', 'L', 'MTB', 'MMC', 'MET', 'MCO', 'MS',
           'NDAQ', 'NAVI', 'NTRS', 'PBCT', 'PNC', 'PFG', 'PGR', 'PRU', 'RJF', 'RF', 'SPGI', 'STT', 'STI', 'SYF', 'TROW',
           'ABT', 'ABBV', 'AET', 'A', 'ALXN', 'ALGN', 'AGN', 'ABC', 'AMGN', 'ANTM', 'BCR', 'BAX', 'BDX', 'BIIB', 'BSX',
           'BMY', 'CAH', 'CELG', 'CNC', 'CERN', 'CI', 'COO', 'DHR', 'DVA', 'XRAY', 'EW', 'EVHC', 'ESRX', 'GILD', 'HCA',
           'HSIC', 'HOLX', 'HUM', 'IDXX', 'ILMN', 'INCY', 'ISRG', 'IQV', 'JNJ', 'LH', 'LLY', 'MCK', 'MDT', 'MRK', 'MTD',
           'MYL', 'PDCO', 'PKI', 'PRGO', 'PFE', 'DGX', 'REGN', 'RMD', 'SYK', 'TMO', 'UNH', 'UHS', 'VAR', 'VRTX', 'WAT',
           'MMM', 'AYI', 'ALK', 'ALLE', 'AAL', 'AME', 'AOS', 'ARNC', 'BA', 'CHRW', 'CAT', 'CTAS', 'CSX', 'CMI', 'DE',
           'DAL', 'DOV', 'ETN', 'EMR', 'EFX', 'EXPD', 'FAST', 'FDX', 'FLS', 'FLR', 'FTV', 'FBHS', 'GD', 'GE', 'GWW',
           'HON', 'INFO', 'ITW', 'IR', 'JEC', 'JBHT', 'JCI', 'KSU', 'LLL', 'LMT', 'MAS', 'NLSN', 'NSC', 'NOC', 'PCAR',
           'PH', 'PNR', 'PWR', 'RTN', 'RSG', 'RHI', 'ROK', 'COL', 'ROP', 'LUV', 'SRCL', 'TXT', 'TDG', 'UNP', 'UAL',
           'AES', 'LNT', 'AEE', 'AEP', 'AWK', 'CNP', 'CMS', 'ED', 'D', 'DTE', 'DUK', 'EIX', 'ETR', 'ES', 'EXC']

YFinance Download

[5]:
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()

df = web.get_data_yahoo(tickers, start='2015-06-01',  end='2017-01-01')
[*********************100%***********************]  247 of 247 completed

17 Failed downloads:
- BHF: Data doesn't exist for startDate = 1433134800, endDate = 1483250400
- ARNC: Data doesn't exist for startDate = 1433134800, endDate = 1483250400
- STI: No data found, symbol may be delisted
- RHT: No data found, symbol may be delisted
- HRS: No data found, symbol may be delisted
- JEC: No data found, symbol may be delisted
- BCR: No data found for this date range, symbol may be delisted
- BRK.B: No data found, symbol may be delisted
- IR: Data doesn't exist for startDate = 1433134800, endDate = 1483250400
- AGN: No data found, symbol may be delisted
- CELG: No data found, symbol may be delisted
- LLL: No data found, symbol may be delisted
- LUK: No data found for this date range, symbol may be delisted
- ETFC: No data found, symbol may be delisted
- MYL: No data found, symbol may be delisted
- RTN: No data found, symbol may be delisted
- BBT: No data found, symbol may be delisted

Data Formatting

[6]:
df = df.stack()
df.index.names = ['date', 'asset']
df = df.tz_localize('UTC', level='date')
[7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 91935 entries, (Timestamp('2015-06-01 00:00:00+0000', tz='UTC'), 'A') to (Timestamp('2016-12-30 00:00:00+0000', tz='UTC'), 'XRAY')
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   Adj Close  91935 non-null  float64
 1   Close      91935 non-null  float64
 2   High       91935 non-null  float64
 3   Low        91935 non-null  float64
 4   Open       91935 non-null  float64
 5   Volume     91935 non-null  float64
dtypes: float64(6)
memory usage: 4.6+ MB

Factor Computation

Now it’s time to build the events DataFrame, the input we will pass to Alphalens.

Alphalens calculates statistics for those dates where the input DataFrame has values (not NaN). So to compute the performace analysis on specific dates and securities (like an event study) then we have to make sure the input DataFrame contains valid values only on those date/security combinations where the event happens. All the other values in the DataFrame must be NaN or not present.

Also, make sure the event values are positive (it doesn’t matter the value but they must be positive) if you intend to go long on the events and use negative values if you intent to go short. This impacts the cumulative returns plots.

Let’s create the event DataFrame where we “mark” (any value) each day a security price fall below 30$.

[8]:
today_price = df.loc[:, 'Open'].unstack('asset')
yesterday_price = today_price.shift(1)
events = today_price[(today_price < 30.0) & (yesterday_price >= 30)]
events = events.stack()
events = events.astype(float)
events
[8]:
date                       asset
2015-06-04 00:00:00+00:00  LNT      29.645000
                           PWR      29.850000
2015-06-12 00:00:00+00:00  CA       29.830000
2015-06-18 00:00:00+00:00  PWR      29.870001
2015-06-25 00:00:00+00:00  PWR      29.629999
                                      ...
2016-12-06 00:00:00+00:00  GE       29.990385
2016-12-07 00:00:00+00:00  PFE      29.724857
2016-12-09 00:00:00+00:00  CSCO     29.980000
2016-12-13 00:00:00+00:00  EW       29.476667
2016-12-14 00:00:00+00:00  EBAY     29.850000
Length: 161, dtype: float64

The pricing data passed to alphalens should contain the entry price for the assets so it must reflect the next available price after an event was observed at a given timestamp. Those prices must not be used in the calculation of the events for that time. Always double check to ensure you are not introducing lookahead bias to your study.

The pricing data must also contain the exit price for the assets, for period 1 the price at the next timestamp will be used, for period 2 the price after 2 timestats will be used and so on.

While Alphalens is time frequency agnostic, in our example we build ‘pricing’ DataFrame so that for each event timestamp it contains the assets open price for the next day afer the event is detected, this price will be used as the assets entry price. Also, we are not adding additional prices so the assets exit price will be the following days open prices (how many days depends on ‘periods’ argument).

[9]:
pricing = df.loc[:, 'Open'].iloc[1:].unstack('asset')
[10]:
pricing.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 402 entries, 2015-06-01 00:00:00+00:00 to 2016-12-30 00:00:00+00:00
Columns: 230 entries, AAL to FTV
dtypes: float64(230)
memory usage: 725.5 KB

Run Event Study

Configuration

Before running Alphalens beware of some important options:

[11]:
# we don't want any filtering to be done

filter_zscore = None
[12]:
# We want to have only one  bin/quantile. So we can either use quantiles=1 or bins=1

quantiles = None
bins = 1

# Beware that in pandas versions below 0.20.0 there were few bugs in panda.qcut and pandas.cut
# that resulted in ValueError exception to be thrown when identical values were present in the
# dataframe and 1 quantile/bin was selected.
# As a workaroung use the bins custom range option that include all your values. E.g.

quantiles = None
bins = [-1000000, 1000000]
[13]:
# You don't have to directly set 'long_short' option when running alphalens.tears.create_event_study_tear_sheet
# But in case you are making use of other Alphalens functions make sure to set 'long_short=False'
# if you set 'long_short=True' Alphalens will perform forward return demeaning and that makes sense only
# in a dollar neutral portfolio. With an event style signal you cannot usually create a dollar neutral
# long/short portfolio

long_short = False

Get Alphalens Input

[14]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(events,
                                                                   pricing,
                                                                   quantiles=None,
                                                                   bins=1,
                                                                   periods=(
                                                                       1, 2, 3, 4, 5, 6, 10),
                                                                   filter_zscore=None)
Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

Run Event Tearsheet

[15]:
alphalens.tears.create_event_study_tear_sheet(
    factor_data, pricing, avgretplot=(5, 10))
Quantiles Statistics
min max mean std count count %
factor_quantile
1 26.549999 29.990385 29.577682 0.523523 161 100.0
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_29_3.png
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_29_5.png
../_images/notebooks_event_study_29_6.png

Short Signal Analysis

If we wanted to analyze the performance of short signal, we only had to switch from positive to negative event values

[16]:
events = -events
[17]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(events,
                                                                   pricing,
                                                                   quantiles=None,
                                                                   bins=1,
                                                                   periods=(
                                                                       1, 2, 3, 4, 5, 6, 10),
                                                                   filter_zscore=None)
Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
[18]:
alphalens.tears.create_event_study_tear_sheet(
    factor_data, pricing, avgretplot=(5, 10))
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -29.990385 -26.549999 -29.577682 0.523523 161 100.0
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_34_3.png
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_34_5.png
../_images/notebooks_event_study_34_6.png