This post has been over a year and a half in the making. Back around June, 2016 I came across this post in r/investing on reddit. It discussed an intriguing investment strategy that was apparently originally developed by James O'Shaughnessy in his book, What Works on Wall Street.
This strategy attracted my attention for two reasons:
- First, it is built upon value-centered, buy-and-hold concept; the kind of strategy I would hope my granddad and Warren Buffet would approve of.
- Second, it uses data to drive investing decisions. There's no gut instinct here, no testosterone, just cold, hard facts.
I was about six months into Alteryx and instinctively I knew "Alteryx can do this" but while the instinct was there (and correct, I might add), I didn't yet have all the skills within Alteryx to make it happen and while I was successful at the time in finding some of the metrics via free data sources like Google Finance, I couldn't get all of them.
In early 2017, I signed up for an ambitious cross-functional project with my Marketing Department. Part of the appeal was their appetite for unconventional data sources. I built multiple workflows for them based on third party APIs and fell in love with the art of compiling an efficient set of API calls, and then teasing apart whatever data came out the other side of my Download Tool. I built up skills like parsing JSON files, which is far more involved than just dropping a Parse JSON tool into an Alteryx canvas, and using temporary files and a Dynamic Input Tool to read them back into a workflow.
All the while, when I came upon some spare time, I'd pick the project back up, sifting through the internet for any programmatic data sources that might yield the metrics I was hoping for. Nearly 17 months later, I came upon Intrinio. This time, I was ready.
Intrinio is awesome. They're a little company with a vast horde of data and these kind entrepreneurs recognize that there's folks like me out there: I have zero budget for a project like this, so I can't afford the insane subscription costs that most Financial Data Firms like Thompson-Reuters or Bloomberg might charge for the privilege of querying their data. Still, Intrinio only offers 500 free API calls per day against their data-point API function and each unique metric per ticker counts as one call. I'm an S&P 500 guy, so I needed to make 5000 API calls to get a complete data set because I needed to pull 10 data points to calculate my six value metrics.
- pricetobook
- pricetorevenue
- pricetoearnings
- price to cash flow
- Enterprise Value to EBITDA
- Shareholder Yield
Allowing for some testing and distractions, it took me about three more weeks to get two workflows up and running. The first workflow just gathers the data: It inputs the authentication credentials, the tickers that had't had data pulled yet, the list of metrics to pull. It appends credentials to tickers to metrics and compiles the complete list of remaining API calls needed and sends them off to Intrinio. The successful results get stored in a time-stamped output file after a minimum of processing - just enough to get the data parsed into clean columns and time-stamped.
A second workflow brought in each day's files, did additional reshaping of the structure of the data from long and narrow to short and wide; it found the latest data points for each ticker, collated them, cleaned up the data types and the "not meaningful" data points and then "sort-ranked" the list of tickers by each metric, giving each one a score out of the total number of tickers. At the end, the indexes of each metric were summed and voila - A Composite Value Index for the S&P 500.
I confess, I think there's still a little bit left to do:
For starters, I think I truncate a given ticker's metrics when I hit my 500 API calls for the day mid-ticker. My list of API calls to make isn't at the Ticker-Metric granularity that it needs to be to prevent this; it's at the ticker level. That'll be a fairly simple fix to make.
An improvement: Replace P/E with E/P. Currently, all but one metric is sort/ranked lowest-to-highest. For any metric with no data or "nm" I have to just stash those companies at the bottom of the sorted list in no particular order. This model would have better data if I used Earnings to Price and sort/ranked it highest-to-lowest, instead of Price-to-Earnings, since P/E results in a divide-by-zero when Earnings are negative. This would give me a more detailed data set for all the companies with negative earnings, while still telling the same story, so I'll look into that change in the future.
Lastly, and perhaps most vexing, the Price to Cash Flow metric has a fair number of data points that are negative because their cash flow is negative, which is definitely worse than a positive cash flow, because while a near-zero score is very good, a negative P/CF is definitely questionable but only possibly an unhealthy metric.
A thirst-quenching example of that is TAP, which has a super-high reinvestment metric which is swallowing up its net profit after taxes, which I believe is due to their leveraged and aggressive consolidation strategy. Molson was tiny relative to Coors when it announced the buyout in 2015 and since then they've continued to buy microbrewery labels. I think what's going on with this particular situation is that their short term reinvestment activity is larger than their cash flow for the same period, and that's not necessarily a bad thing. That's a very normal scenario for a startup climbing a production s-curve, like Tesla. Bottom line, if their eventual return on investment ends up making up for their current reinvestment, they're probably fine and maybe they're doing awesome!
Thus, it seems to me that Price to Cash Flow being negative in isolation is not sufficient to determine whether we're talking about a highly confident company piling money into a successful strategy to maximize its share of a growing industry or whether they're a sinking ship with quarterly cash flows that are insufficient to cover their costs. I'e decided that, for now at least, a negative P/CF will rank as good as it gets in my model, but I'm likely to revisit the issue once I look into the individual stocks. I think that given we're in a bull market and low interest rates, and given we're talking about the S&P 500, with nary a start up in sight, I think this isn't that big of a deal and at any rate, within the model I've still got five other metrics to buoy or sink a given ticker; so we're not making any decisions on P/CF in isolation.