Intraday Factor

In this notebook we use Alphalens to analyse the performance of an intraday factor, which is computed daily but the stocks are bought at marker open and sold at market close with no overnight positions.

Imports & Settings

[2]:
import warnings
warnings.filterwarnings('ignore')
[3]:
import alphalens
import pandas as pd
[4]:
%matplotlib inline

Loading Data

Below is a simple mapping of tickers to sectors for a small universe of large cap stocks.

[5]:
sector_names = {
    0 : "information_technology",
    1 : "financials",
    2 : "health_care",
    3 : "industrials",
    4 : "utilities",
    5 : "real_estate",
    6 : "materials",
    7 : "telecommunication_services",
    8 : "consumer_staples",
    9 : "consumer_discretionary",
    10 : "energy"
}

ticker_sector = {
    "ACN" : 0, "ATVI" : 0, "ADBE" : 0, "AMD" : 0, "AKAM" : 0, "ADS" : 0, "GOOGL" : 0, "GOOG" : 0,
    "APH" : 0, "ADI" : 0, "ANSS" : 0, "AAPL" : 0, "AMAT" : 0, "ADSK" : 0, "ADP" : 0, "AVGO" : 0,
    "AMG" : 1, "AFL" : 1, "ALL" : 1, "AXP" : 1, "AIG" : 1, "AMP" : 1, "AON" : 1, "AJG" : 1, "AIZ" : 1, "BAC" : 1,
    "BK" : 1, "BBT" : 1, "BRK.B" : 1, "BLK" : 1, "HRB" : 1, "BHF" : 1, "COF" : 1, "CBOE" : 1, "SCHW" : 1, "CB" : 1,
    "ABT" : 2, "ABBV" : 2, "AET" : 2, "A" : 2, "ALXN" : 2, "ALGN" : 2, "AGN" : 2, "ABC" : 2, "AMGN" : 2, "ANTM" : 2,
    "BCR" : 2, "BAX" : 2, "BDX" : 2, "BIIB" : 2, "BSX" : 2, "BMY" : 2, "CAH" : 2, "CELG" : 2, "CNC" : 2, "CERN" : 2,
    "MMM" : 3, "AYI" : 3, "ALK" : 3, "ALLE" : 3, "AAL" : 3, "AME" : 3, "AOS" : 3, "ARNC" : 3, "BA" : 3, "CHRW" : 3,
    "CAT" : 3, "CTAS" : 3, "CSX" : 3, "CMI" : 3, "DE" : 3, "DAL" : 3, "DOV" : 3, "ETN" : 3, "EMR" : 3, "EFX" : 3,
    "AES" : 4, "LNT" : 4, "AEE" : 4, "AEP" : 4, "AWK" : 4, "CNP" : 4, "CMS" : 4, "ED" : 4, "D" : 4, "DTE" : 4,
    "DUK" : 4, "EIX" : 4, "ETR" : 4, "ES" : 4, "EXC" : 4, "FE" : 4, "NEE" : 4, "NI" : 4, "NRG" : 4, "PCG" : 4,
    "ARE" : 5, "AMT" : 5, "AIV" : 5, "AVB" : 5, "BXP" : 5, "CBG" : 5, "CCI" : 5, "DLR" : 5, "DRE" : 5,
    "EQIX" : 5, "EQR" : 5, "ESS" : 5, "EXR" : 5, "FRT" : 5, "GGP" : 5, "HCP" : 5, "HST" : 5, "IRM" : 5, "KIM" : 5,
    "APD" : 6, "ALB" : 6, "AVY" : 6, "BLL" : 6, "CF" : 6, "DWDP" : 6, "EMN" : 6, "ECL" : 6, "FMC" : 6, "FCX" : 6,
    "IP" : 6, "IFF" : 6, "LYB" : 6, "MLM" : 6, "MON" : 6, "MOS" : 6, "NEM" : 6, "NUE" : 6, "PKG" : 6, "PPG" : 6,
    "T" : 7, "CTL" : 7, "VZ" : 7,
    "MO" : 8, "ADM" : 8, "BF.B" : 8, "CPB" : 8, "CHD" : 8, "CLX" : 8, "KO" : 8, "CL" : 8, "CAG" : 8,
    "STZ" : 8, "COST" : 8, "COTY" : 8, "CVS" : 8, "DPS" : 8, "EL" : 8, "GIS" : 8, "HSY" : 8, "HRL" : 8,
    "AAP" : 9, "AMZN" : 9, "APTV" : 9, "AZO" : 9, "BBY" : 9, "BWA" : 9, "KMX" : 9, "CCL" : 9,
    "APC" : 10, "ANDV" : 10, "APA" : 10, "BHGE" : 10, "COG" : 10, "CHK" : 10, "CVX" : 10, "XEC" : 10, "CXO" : 10,
    "COP" : 10, "DVN" : 10, "EOG" : 10, "EQT" : 10, "XOM" : 10, "HAL" : 10, "HP" : 10, "HES" : 10, "KMI" : 10
}

YFinance Download

[6]:
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()

tickers = list(ticker_sector.keys())
df = web.get_data_yahoo(tickers, start='2017-01-01',  end='2017-06-01')
df.index = pd.to_datetime(df.index, utc=True)
[*********************100%***********************]  182 of 182 completed

19 Failed downloads:
- CHK: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- BF.B: No data found for this date range, symbol may be delisted
- BRK.B: No data found, symbol may be delisted
- MON: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- CELG: No data found, symbol may be delisted
- APC: No data found, symbol may be delisted
- ARNC: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- CBG: No data found for this date range, symbol may be delisted
- CTL: No data found, symbol may be delisted
- BCR: No data found for this date range, symbol may be delisted
- DWDP: No data found, symbol may be delisted
- BHF: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- GGP: No data found for this date range, symbol may be delisted
- CXO: No data found, symbol may be delisted
- HCP: No data found, symbol may be delisted
- DPS: No data found for this date range, symbol may be delisted
- AGN: No data found, symbol may be delisted
- BBT: No data found, symbol may be delisted
- BHGE: No data found, symbol may be delisted
[7]:
df = df.stack()
df.index.names = ['date', 'asset']
df.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 16789 entries, (Timestamp('2017-01-03 00:00:00+0000', tz='UTC'), 'A') to (Timestamp('2017-05-31 00:00:00+0000', tz='UTC'), 'XOM')
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   Adj Close  16789 non-null  float64
 1   Close      16789 non-null  float64
 2   High       16789 non-null  float64
 3   Low        16789 non-null  float64
 4   Open       16789 non-null  float64
 5   Volume     16789 non-null  float64
dtypes: float64(6)
memory usage: 842.6+ KB

Factor Computation

Our example factor ranks the stocks based on their overnight price gap (yesterday close to today open price). We’ll see if the factor has some alpha or if it is pure noise.

[8]:
available_tickers = df.index.unique('asset')
ticker_sector = {k: v for k, v in ticker_sector.items() if k in available_tickers}
[9]:
today_open = df.loc[:, 'Open'].unstack('asset')
today_close = df.loc[:, 'Close'].unstack('asset')
yesterday_close = today_close.shift(1)
[10]:
factor = (today_open - yesterday_close) / yesterday_close

The pricing data passed to alphalens should contain the entry price for the assets so it must reflect the next available price after a factor value was observed at a given timestamp. Those prices must not be used in the calculation of the factor values for that time. Always double check to ensure you are not introducing lookahead bias to your study.

The pricing data must also contain the exit price for the assets, for period 1 the price at the next timestamp will be used, for period 2 the price after 2 timestamps will be used and so on.

There are no restrinctions/assumptions on the time frequencies a factor should be computed at and neither on the specific time a factor should be traded (trading at the open vs trading at the close vs intraday trading), it is only required that factor and price DataFrames are properly aligned given the rules above.

In our example, we want to buy the stocks at marker open, so the need the open price at the exact timestamps as the factor valules, and we want to sell the stocks at market close so we will add the close prices too, which will be used to compute period 1 forward returns as they appear just after the factor values timestamps. The returns computed by Alphalens will therefore be based on the difference between open to close assets prices.

If we had other prices we could compute other period returns, for example one hour after market open and 2 hours and so on. We could have added those prices right after the open prices and instruct Alphalens to compute 1, 2, 3… periods too and not only period 1 like in this example.

Data Formatting

Time Adjustments

[11]:
# Fix time as Yahoo doesn't set it
today_open.index += pd.Timedelta('9h30m')
today_close.index += pd.Timedelta('16h')
# pricing will contain both open and close
pricing = pd.concat([today_open, today_close]).sort_index()
[12]:
pricing.head()
[12]:
asset A AAL AAP AAPL ABBV ABC ABT ACN ADBE ADI ... NUE PCG PKG PPG SCHW STZ T VZ XEC XOM
date
2017-01-03 09:30:00+00:00 45.930000 47.279999 170.779999 28.950001 62.919998 78.510002 38.630001 117.379997 103.430000 72.599998 ... 59.740002 60.810001 85.160004 95.430000 40.049999 155.009995 42.689999 53.959999 137.529999 90.940002
2017-01-03 16:00:00+00:00 46.490002 46.299999 170.600006 29.037500 62.410000 82.610001 39.049999 116.459999 103.480003 72.510002 ... 59.610001 60.369999 85.000000 95.250000 40.200001 154.750000 43.020000 54.580002 138.789993 90.889999
2017-01-04 09:30:00+00:00 46.930000 46.630001 170.369995 28.962500 62.639999 82.599998 39.060001 116.910004 103.739998 72.769997 ... 59.759998 60.610001 85.440002 95.709999 40.400002 157.149994 42.939999 54.549999 138.479996 91.120003
2017-01-04 16:00:00+00:00 47.099998 46.700001 172.000000 29.004999 63.290001 84.660004 39.360001 116.739998 104.139999 72.360001 ... 61.250000 60.590000 86.370003 97.269997 41.220001 157.990005 42.770000 54.520000 138.500000 89.889999
2017-01-05 09:30:00+00:00 47.049999 46.520000 170.869995 28.980000 63.380001 84.379997 39.240002 116.980003 104.129997 72.410004 ... 61.119999 60.660000 86.370003 96.459999 40.970001 150.550003 42.849998 54.779999 138.500000 90.190002

5 rows × 163 columns

Align Factor & Price

[13]:
# Align factor to open price
factor.index += pd.Timedelta('9h30m')
factor = factor.stack()
factor.index = factor.index.set_names(['date', 'asset'])
[14]:
factor.unstack().head()
[14]:
asset A AAL AAP AAPL ABBV ABC ABT ACN ADBE ADI ... NUE PCG PKG PPG SCHW STZ T VZ XEC XOM
date
2017-01-04 09:30:00+00:00 0.009464 0.007127 -0.001348 -0.002583 0.003685 -0.000121 0.000256 0.003864 0.002513 0.003586 ... 0.002516 0.003976 0.005176 0.004829 0.004975 0.015509 -0.001860 -0.000550 -0.002234 0.002531
2017-01-05 09:30:00+00:00 -0.001062 -0.003854 -0.006570 -0.000862 0.001422 -0.003307 -0.003049 0.002056 -0.000096 0.000691 ... -0.002122 0.001155 0.000000 -0.008327 -0.006065 -0.047092 0.001870 0.004769 0.000000 0.003337
2017-01-06 09:30:00+00:00 0.001934 -0.000872 -0.003258 0.001458 0.001725 -0.001793 0.000000 0.000000 0.000661 0.003646 ... -0.000328 -0.003304 0.000469 0.001569 0.008055 0.001635 -0.015709 -0.017753 0.001784 0.002710
2017-01-09 09:30:00+00:00 0.000417 -0.004328 0.002535 0.000339 0.000157 -0.002359 0.000245 -0.001376 -0.003139 0.000559 ... 0.009786 -0.000327 -0.003440 -0.005649 -0.005578 0.003747 -0.000726 -0.000751 -0.011285 -0.003164
2017-01-10 09:30:00+00:00 0.004155 -0.001699 -0.004896 -0.001849 -0.002492 -0.004446 0.001718 -0.000522 0.000000 -0.000973 ... 0.012743 0.000498 -0.000798 -0.002906 0.001702 -0.001797 -0.003431 0.000190 0.004664 0.001494

5 rows × 163 columns

Run Alphalens

Period 1 will show returns from market open to market close while period 2 will show returns from today open to tomorrow open

Get Alphalens Input

[15]:
non_predictive_factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor,
                                                                                  pricing,
                                                                                  periods=(1,2),
                                                                                  groupby=ticker_sector,
                                                                                  groupby_labels=sector_names)
Dropped 2.9% entries from factor data: 1.0% in forward returns computation and 2.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

Returns Tear Sheet

[16]:
alphalens.tears.create_returns_tear_sheet(non_predictive_factor_data)
Returns Analysis
6h30m 1D
Ann. alpha 0.324 -0.046
beta 0.177 0.179
Mean Period Wise Return Top Quantile (bps) -7.776 -2.096
Mean Period Wise Return Bottom Quantile (bps) -0.445 0.697
Mean Period Wise Spread (bps) -7.331 -2.795
<Figure size 432x288 with 0 Axes>
../_images/notebooks_intraday_factor_30_3.png
[17]:
alphalens.tears.create_event_returns_tear_sheet(non_predictive_factor_data, pricing);
<Figure size 432x288 with 0 Axes>
../_images/notebooks_intraday_factor_31_1.png