S01E07: The Man in the Machine (Learning)

“How do you come to a rational conclusion as to what a company is worth?” A seemingly simple question with little-to-no clear answer.

For John Alberg, a background in computer science and a passion for machine learning led him to view the problem through the lens of data. “If it is true that you can use publicly available information to buy companies for less than their economic worth,” he thought, “then you should be able to see it in the data.”

And thus was born Euclidean, an investment firm that marries machine learning with a deep value mentality.

Our conversation spanned more than 2.5 hours and covered everything from the basics of machine learning, to the evolution of Euclidean’s approach over the last decade, to the implications of adversarial examples in neural networks.

This podcast, an abridged version of our conversation, picks up the thread mid-way through, where I have asked John to expand upon his experience with his startup, Employease, and how it influenced his value-based thinking at Euclidean.

I hope you enjoy.

Subscribe on Apple Podcasts

Subscribe on Spotify

Transcript

Corey Hoffstein 00:03

Hello and welcome, everyone. I’m Corey Hoffstein. And this is flirting with models, the podcast that pulls back the curtain to discover the human factor behind the quantitative strategy.

Narrator 00:15

Corey Hoffstein Is the co founder and chief investment officer of newfound research due to industry regulations, he will not discuss any of newfound researches funds on this podcast all opinions expressed by podcast participants are solely their own opinion and do not reflect the opinion of newfound research. This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of newfound research may maintain positions and securities discussed in this podcast for more information is it think newfound.com?

Corey Hoffstein 00:46

How do you come to a rational conclusion as to what a company is worth? A seemingly simple question with little to no clear answer. For John Hallberg, a background in computer science and a passion for machine learning led him to view the problem through the lens of data. If it is true that you can use publicly available information to buy companies for less than their economic worth, he thought, then you should be able to see it in the data. And thus was born Euclidean, an investment firm that marries machine learning with deep value mentality. Our conversation spanned more than two and a half hours, and covered everything from the basics of machine learning to the evolution of Euclidean process over the last decade, to the implications of adversarial examples in neural networks. This podcast, an abridged version of our conversation, picks up the thread midway through where I have asked John to expand upon his experience with his startup employees, and how it influenced his value based thinking at Euclidean, I hope you enjoy.

John Alberg 01:58

Yes, so in 2008, Michael Sackler and my longtime business partner and I founded Euclidean specifically on this idea of applying machine learning to long term equity investing, Mike and I had built a software as a service business over the prior decade that we sold to ADP automatic data processing in 2006. And we had made a little money off that sale, and so started to think about how to grow that wealth over the remainder of our lives. And I sought out wisdom on this subject in the writings of people who had been great long term investors, people that achieved high returns over very, very long periods of time, not, you know, folks that achieved a great result from a couple of trades. And if you ask this question, you inevitably end up learning a lot about Ben Graham, Walter Schloss, John Templeton, and of course, Buffett. And the thing that I observed is that most of these folks are very open about how they invest, and essentially describe it in this way that they look at the historical financials, the publicly available information on companies, perhaps going back decades, and try to get a sense of what the economic character or intrinsic value of the business is. And if that value is attractive relative to what you might be able to buy the company for, then it’s considered a good investment. And in thinking through this, again, that there are only using publicly available information and price and that I had a deep background in machine learning going back to the early 90s, that I might be able to imitate this success using machine learning. Note that I’m not saying that our goal was to directly imitate what specific long term fundamental investors do wasn’t, for example, trying to make a Graham model so to speak. Rather, if you give a machine learning algorithm, all historical fundamentals and price and ask it to distinguish between winners and losers in that data, that it will be successful. If it is possible to succeed on that data alone, the machine learning algorithms should be able to find in the data, this relationship between long term fundamentals of companies, their prices, and success as future success as an investment, so long as that relationship actually does exist. So this is what we set out to do. Now this may seem unusual applying machine learning to long term investing seems like most people apply it to short term investing. I’d like to point out that if you look around at other fields, where machine learning has had a lot of success in achieving good results, it’s an areas where we’re trying to improve upon human decision making. So take for example, self driving cars. In that case, you may build a perceptual system, a neural network that is trained to do a better job of perceiving the environment that a human would behind the wheel of a car. In medicine, what they’re doing is they’re taking medical images and attempting to identify whether there’s a malignant tumor in it trying to prove upon the radiologists ability to to identify whether there’s a malignancy or not. And in the stunning example, of course, it’s a game. But in the stunning example of AlphaGo, when your neural networks were trained to beat the World Masters, the game of Go, obviously, what they’re doing there is they’re building a machine, an algorithm that does a better job than a human at that task. Well, in the world of capital markets, we have all these people who go around and evaluate companies based on their fundamentals, and try to understand if they’re going to be a good long term investing, essentially, what Euclidian set out to do is improve upon that process with machine learning. So I think it’s important to point out also that most big machine learning successes are in the form of what’s called a classification problem and not a regression problem. Regression is where you build a model to predict numeric values, like predicting the percent return of a stock classification is where you predict categorical values. So if listeners remember the Silicon Valley episode, where Jin Yang built the hot dog, not hot dog image classification app, to I think, get them out of indentured servitude. That’s a good example of a classification app. You know, even language translation is a form of classification, where you are predicting one of a number of words in a language that should be translated to typically when people look at equity forecasting, they’re doing regression. So instead, we structured the problem is a classification problem. You know, that is we’re predicting whether something is going to be a good investment or a bad investment, we did this because, you know, it’s our view that this is an easier, less noisy problem than trying to forecast say, excess return.

Corey Hoffstein 06:48

Maybe before we dive deep down the rabbit hole, which I think we’re gonna get to very quickly. Maybe we can start with the way you see the landscape of machine learning, because there’s a number not only of techniques, which I know we’re going to talk about, but the purpose of those techniques. So you’ve already, for example, mentioned regression versus classification. But we know that there’s been an evolution from things like support vector machines, to deep neural networks over the last decade, maybe we can start with discussing that landscape a little and where you draw the line between machine learning and more advanced statistical techniques.

John Alberg 07:33

So the question on the relationship between machine learning and statistics is an interesting one. And I think the answer is somewhat nuanced. For some forms of machine learning. It’s clearly a subset of statistics, or at least intimately related to statistics. Whereas other forms of machine learning take on a very different character. For example, In supervised learning, which is the most popular and successful form, you’re trying to map a set of inputs to a set of outputs through a model. And this process is very similar to what you’re doing in statistical linear regression, where you have if you remember, an independent variable x and a dependent variable y, and you’re trying to relate them through through a linear model, the difference being that In supervised learning, it tends to be a more complicated model with many more parameters. And you’re employing techniques to prevent overfitting. But because both linear regression and supervised learning are essentially attempting to do the same thing, I think it would be hard to argue that supervised machine learning is not a form or extension of statistics. On the other hand, if you look at something like reinforcement, learning where the model is an agent in an environment, and the model can take actions and those actions have outcomes that are then fed back to the model, and use to reinforce its behavior, I think in that case, it’s taking on a very different character than traditional statistics, and in therefore, is is different. With respect to the landscape of machine learning, ensemble learning techniques have had a lot of success. There’s this result that shows that the average decision made by a group of models that have been trained to make uncorrelated decisions, it has a better chance of success than the best individual model within the group. So it’s sort of like saying, the sum of the parts is greater than the whole or you know that there’s this wisdom of crowds. At any rate, this result led to vigorous effort to develop algorithms that train many models to make uncorrelated decisions. And you know, it has been shown that these ensembles perform very well. If you look at the Kegel competition, which is a worldwide competition where datasets are put out there and researchers can submit their best models and they’re ranked according to their accuracy on out of sample data. If you look at the top Kaggle results, they tend to be dominated by these ensemble techniques. And so even though ensembles don’t get the headlines like deep learning, which I’ll talk about here, in a second, I would argue that if you have an application where you’re interested in employing machine learning, the first foot I would put forward is with ensemble learning, because I think on the average problem and has the highest chance of success, there’s also it’s very easy to get off the shelf tools to rapidly build ensemble models. So you can find out if you’re going to be successful with it pretty quickly. But the other maybe more sexy technique that does get a lot of the headlines is deep learning. So what is deep learning deep learning is learning in an artificial neural network, where input is transformed to output through many processing units and parameters that are typically organized into these successive layers. Hence, the deep, deep in terms of layers. In the term deep learning, the transformation of input to output through all these nodes and connections can be very complex, there can be millions of connections, slash parameters in these models, and hence, deep learning can model these very challenging problems. Most of the problems that it’s really good at are things that have historically had been very hard to solve with a computer or traditional computing techniques, things like computer vision, language translation and voice recognition. Again, though, I’d say ensemble learning is best for most problems. But deep learning seems to be best at the most challenging in finance, forecasting is, of course of great interest. And it’s interesting that a lot of these companies that are investing heavily in machine learning for their products, such as Google, and Facebook, and Amazon are also very interested in the challenge of forecasting, as I think they believe it’s relevant to their business. Right, Amazon is well served to be able to forecast product to fit man so it can effectively manage his inventory. And you can see this in academic papers that are coming out, where these folks have started to leverage their expertise in machine learning into challenging forecasting problems. For example, there’s a group again at Amazon that published a series of paper on using deep learning to improve forecasting under uncertainty. And they showed that it does, you know, meaningfully better than, you know, traditional forecasting techniques. And what do I mean by uncertainty? Well, this, again, is very related to finance, where it’s very hard to forecast an exact number in the future, you know, that a stock is gonna go up 10%, or that we’re gonna sell 35,000 toothbrushes next month, but maybe you can get a sense of what the distribution of outcomes are going to be. And so Amazon built this tool, which is now available in the Amazon cloud, I think it’s called Deep AR, for deep autoregressive neural network where they have this recurrent neural network that you can use to build these forecasting models where you’re not trying to forecast a value, but forecast a distribution of outcomes. At any rate, it seems like there is a very direct crossover potential of this technology to finance applications.

Corey Hoffstein 13:31

The literature for machine learning goes back a really, really long way. As we mentioned, some of the earliest papers written on perceptrons, for example, I think, go back to the 50s 60s. But as we’ve been sort of talking about, there’s been a massive acceleration in the last decade for many of the reasons you discussed both. From the hardware perspective, the use of GPUs, as well as some of the algorithms that have been discovered to solve some of the problems that we’re facing things like deep neural networks, has allowed for a massive acceleration of the techniques, and therefore the application of deep neural networks and the success of deep neural networks in the last decade. From a practitioner standpoint, someone who’s using machine learning who started a decade ago, where things like support vector machines were state of the art, and then suddenly, it became ensemble methods. And now today, it’s deep neural networks. How do you think about adapting your process, as the field of machine learning is changing so quickly?

John Alberg 14:39

So as you mentioned, there’s been quite an amazing evolution of machine learning techniques over the last decade. And so we feel pretty lucky to have started Euclidean when we did in 2008. Because of the fact that companies like Google, Facebook, Amazon, and others have decided that machine learning is going to play a critical role in their product development. They have invested heavily in these areas in This investment is both in the form of innovations that are published in peer reviewed journals, but also in the form of open source projects where very complex pieces of software like Google’s TensorFlow are written and put out there for anybody to download and use. And since our inception, it has sort of behooves us to take advantage of the fact that we’re living in this pretty incredible time. In 2008, we started with support vector machines, as the model we used for investing is those were state of the art at the time. Then in 2014 15. We migrated our technology ensemble models for some of the reasons I described earlier. But besides just their accuracy, there’s another aspect of ensemble learning that makes it compelling for an investment advisor. And that’s that ensembles tend to be built up out of decision trees as their base model. And this, this is appealing because decision trees are more transparent than, say, a support vector machine or a deep neural network, in the sense that you can follow the logic down the decision tree of how it’s deciding why a company, for example, is going to outperform the market over the subsequent year or not. And you can do this at quite a granular level. Now, more recently, deep neural networks and deep learning have gotten a lot of attention because of the spectacular successes in areas such as language translation, computer vision, and beating the world champion at the game of Go. And so we of course, became interested in the idea of seeing how deep learning might benefit our processes. And there are really three reasons why deep learning interests us. First, deep learning has created the opportunity to do less factor engineering work that is so involved in typical quantitative investing and instead rely on raw financial data to tell the story of a stock. As we discussed, the deepness in deep learning means that successive layers of a model are able to untangle important relationships in a hierarchical way from the data as it is found in the wild, like what’s in a balance sheet, or an income statement. And therefore, there’s much less pre processing that then we’ve had to do in the past, potentially. So, you know, really, the way to think about it is, there is potential to find measures are factors that are more meaningful than what we rely on today. And the process of finding them is less biased than it is if you construct them on your own. Second, you know, with respect to deep learning, recurrent neural networks, which is a form of deep learning have an inherent time dimensionality to them. And with respect to stocks, the future value of a company depends on the evolving state of a company’s cash flows, as they report them from quarter to quarter. It is in this sort of area of modeling sequences of data through time that deep learning is of interest to us. And last is that some of the greatest progress in machine learning has been in this area of deep learning really has been in this area of text processing, whether it be language translation, or sentence completion, or voice recognition or voice synthesis. Clearly, there’s a lot of textual data out there on stocks that is not reflected in income statements, or balance sheets. And it would be great to be able to use that information in the decision making process insofar as it’s a value. As a cautionary note, however, there are some less appealing aspects of moving to deep learning first and foremost is the issue of transparency. As I described before, you know, ensemble of decision trees is much more transparent than a deep neural network, because the logic of what it does is embedded have a deep neural network is embedded into all the connections that is made up of. And second, from a more practical perspective is that training deep neural networks takes a very long time. And therefore, it’s hard to quickly iterate through a lot of ideas, as it is with support vector machines, or ensembles of decision trees. But also in the context of deep neural networks, there’s still this important issue of how you frame the question. Whether you’re using support vector machines, ensembles of decision trees or deep neural networks, you still need to decide whether you’re going to frame the question as a classification problem or as a regression problem. As I mentioned earlier, most of our success has been in framing as classification problem, because well, it turns out that that is an easier problem to solve. But maybe deep neural networks provide an opportunity to succeed at something harder.

Corey Hoffstein 19:46

When you think of the degrees a difficulty of the problem as well when you say I’m going to predict a specific return, right? But let’s take it another way. I think Stock A is going to outperform stock B is orders of magnitude less complex than I think Stock A is going to outperform stock b by 5%, which is orders of magnitude less complex than I think Stock A is going to return 22% and stock B is going to return 17%. And your confidence in the former can be incredibly high. And while they all might technically say the same thing, they all agree, you know, they’ll all be true at the same time, one of those is far, far easier, ultimately, to model. Or you would expect it to be far, far easier to model. And your confidence in that model would be much higher than the ladder due to the accuracy required.

John Alberg 20:41

So as I mentioned, we formulated the challenge of applying machine learning to long term investing as a classification problem as opposed to a regression problem. Meaning that we said we want the model to tell us, hey, this looks like a good long term investment instead of the model saying, you know, this stock will return 20% above the market over the next year. And the reason we took this approach is for the same reasons that you’re mentioning that it just seems like a more tractable problem. And as it turns out, empirically, this is the case you can do quite well, if you pose a problem of long term investing in this way. Now, that being said, with the advent of deep learning, and observing that it has been successful on some very challenging problems, where there’s lots of noise, and the signal needs to be kind of teased out of it, we started to re investigate whether deep learning could be successfully used to forecast excess returns. And what we found was that deep neural networks were essentially no better than linear regression at this problem that now that doesn’t mean that they can’t forecast excess returns as linear regression actually does have some predictive power in forecasting. That is, essentially why something like the value effect or the momentum effect exists, there is a relationship between those factors and the excess return of a stock. It’s just that that relationship is best modeled with a linear model, there’s, oh, comes razor, don’t use a model that’s any more complex than it needs to be. But in this process, we also started to ask the question, well, if we can’t forecast prices with a deep neural network, maybe we can forecast something else, and borrowing on the idea of sequence learning and language translation, and other uses of recurrent neural networks, we explored whether you could use a sequence of historical financial statements to forecast future financials or future fundamentals. And it is in this area that we’ve had a good degree of success with deep neural networks.

Corey Hoffstein 22:53

So I read a paper this morning, and I’m forgetting the exact title. But it was something to the effect of deep alpha. And it was this idea of using a deep neural network for the very reasons you mentioned, which is there doesn’t need to be any pre emptive feature engineering, you can feed it this raw data. And one of the added benefits of using this deep neural network is the opportunity to identify nonlinear features that very often a lot of the features, we pre engineer, are very linear in nature, identifying nonlinear relationships in the data. And allowing that to flow through the deep neural network in the classification problem was their way of trying to identify unique sources of alpha. But to a layperson, like me, who is not incredibly sophisticated in the realm of machine learning. This sounds a whole lot. Like it’s just overfitting the data, that there is a massive risk of just passing all of this information in and letting the model identify that information, which was most predictive in the past. But perhaps it is just identifying nothingness and noise, that there’s a lot of spurious relationships that it’s uncovering, talk me through, particularly with something like a deep neural network where they are entirely opaque and a lot of the reason within the model is hidden deep in the layers. How do you gain confidence that you’re not overfitting? And I want to say I almost find it ironic, because you didn’t start using deep neural networks until the last couple of years, but you actually wrote a piece very early on at Euclidean about how you gain confidence in a machine learning approach. And so maybe that you can tie that in to sort of the criteria you outlined. But how do you think about gaining confidence in an approach in which by definition, there’s an extreme lack of transparency?

John Alberg 24:56

It may sound like overfitting, but in this case, it’s not. I think it’s worthwhile to clarify the relationship between machine learning and overfitting. The fitting of data is in fact a spectrum going from under fit to overfit. And somewhere in the middle, there is a point that is like this Goldilocks point where you’re not too under fit, and you’re not to overfit. This is the point where a model is said optimally generalize the relationship and the training data to the out of sample data. The tools of machine learning like regularization, cross validation, and holdout testing are designed to allow you to navigate the spectrum so that you can find this Goldilocks point of generalization. So take computer vision as an example, where there’s been a lot of success with deep neural networks, there’s a tremendous amount of noise in images, we don’t notice how extreme it is because our brains are so good at converting a field of pixels into recognizable objects. But just imagine the diversity of pixels and pixel combinations that exist in a photo of something like an Arabian bizarre, then imagine how those combinations multiplies lighting changes throughout the day. So how is it possible that we were able to build perceptual systems that are able to visually navigate a self driving car? Wouldn’t those perceptual systems just overfit the noise in all that video? The answer is regularization. Without it, we would have none of the successes we see with neural networks today. Now there is this relationship between the complexity of a model. And again, deep neural networks are very complex and how much data we have for a given problem. The more complex the model, the more data you need. And I’ve heard this argument that in finance, there just isn’t enough data to support the complexity of a deep neural network as there is with computer vision. But in this particular problem forecasting future fundamentals from historic fundamentals, there is actually plenty of data. So we can do the math. If we were interested in five year time series, then that’s a data point, then, how many five year time series on company fundamentals are there. If we use a sample historic period to learn from a 35 years, they assume we have approximately 2500 companies at each given time. And that we’re looking at monthly data. If you multiply those three numbers together, we have close to a million unique time series to learn from, which is plenty for the purpose of deep learning. Now, there are aspects of financial data that can make learning in any kind of inference for that matter challenging. So like any problem, we have this issue when learning of trying to infer a distribution from only a sample of data. And we’re successful if we’re able to infer that relationship from the data on out of sample data. But because the time dimensionality of financial data, the underlying true distribution that we’re trying to learn, can change. You know, that is this distribution can be what’s called non stationary so that whatever you learn in one time period may not be true in the next time period,

Corey Hoffstein 28:08

you absolutely read my mind, because this is exactly where I want it to go. Next, this discussion of machine learning seems incredibly well suited for datasets that are implicitly stationary, as you mentioned. And it seems to have made huge advancements in areas where there are well defined rules or games where there’s ultimately defined boundaries. So go is an incredibly complex game, but there are ultimately governing rules. If you ascribe to someone like Andrew Lowe’s adaptive market hypothesis, the degree of competition within the market is ultimately in some way changing the rules. And in many ways, the rules have changed over time, the way for example, that CFOs may report financials has changed over time, how do you tackle this issue of non stationarity in the data and and maybe just more generally, is machine learning applicable in in finance at all?

John Alberg 29:11

Sure, but this is not so much an issue with machine learning, but rather a question of when and where inference can be done at all. If you have extremely non stationary data, where the distribution is constantly changing from one period to the next, then any statistical description of the data and prior time periods have no real value in the next. Now, there are tools that can be employed if you have non stationary data that is slowly changing through time. And that is to iteratively build models, be it a linear regression or a neural network that use data from a fixed trailing window of time to forecast into the current or next time period. So let’s say you believe that your distribution is relatively stable over a four month period, that if you want to make forecasts for April, you build a model have data from January through March and use that model to make your APR forecasts. And then you repeat this process for forecasts in May by building a model on data from the period February through April. And you can see how you can just iterate this approach out on forever. We do a form of, of this at Euclidean when we’re forecasting fundamentals, but our time window is closer to 20 years than four months. And this gets to the point of why you Euclidean we focus on long term value investing are fundamental investing, because it’s a more stationary problem and other types of investing. I mean, it’s pretty well understood that the value of an asset is its future cash flows discounted to present value. And that’s true, was true in 1925. It was also true in 2007, strew, in 2018. And it will be true in 2050. Therefore, if we can forecast cash flows in the future with any degree of accuracy, you should have an investment approach that would work well over the long run, whatever forecasting model is, that is it should be valuable regardless of what decade it’s being used in.

Corey Hoffstein 31:20

So one of the things we’ve touched upon here, one of the benefits, potentially of a deep neural network is that you don’t have to pre engineer your factors. But in many ways, it strikes me that what makes someone a quant value investor, is that pre engineering in a certain way that the factors that they are, by definition, looking at have to do with valuation, so they might be looking at price to book they might be looking at enterprise value to EBIT da, it strikes me that when you just provide all the raw input to the deep neural network, maybe it’s how you ask the question of the deep neural network that ultimately makes you a value investor. So I guess the question I would pose to you is, if you’re passing in all this raw data, what is it necessarily that connects you to the history of value investing that that would make you say that you’re applying machine learning to value investing?

John Alberg 32:19

Well, what we really are is long term fundamental investors, meaning that at any point in time, all I want to know is all of the publicly available information about a company that can be found in its historical financials, or other documents that are filed with the SEC or published online. And all I want to assess is whether this company is going to be a good long term investment or not. Now, the reality is that when you train any kind of model, with that setup, it’ll inevitably come up with a model that is characteristically value in nature. And that is just because of the strength of the value effect and the data. It’s embedded in the combination of base economic principles of mean reversion and human behavioral biases, which create the effect and therefore it’s existed for a very, very long time, and will continue to exist. Now, with a value model, there is, of course, value investments, good value investments, and there are also value traps. And I think that is where machine learning really shows its power. And its ability to use non linearity to discriminate within the universe of value between companies that are true value opportunities, and those companies that are just deservedly cheap.

Corey Hoffstein 33:32

So last year, you wrote this paper titled, improving factor based quantitative investing by forecasting company fundamentals, which took, I think, a pretty different approach to applying machine learning to the investment landscape than a lot of other approaches had tried before, where instead of trying to forecast returns, you took this approach of trying to forecast company fundamentals, and then use that information in what I believe was sort of your classification algorithm. Can you talk me through that paper a bit and some of the results that you came upon?

John Alberg 34:10

This is the result where we found that using a deep recurrent neural network, we could not forecast prices or excess returns any better than we could with a linear model. But we could forecast future fundamentals from historical fundamentals much better than we could with a linear model, or with a random walk. Now, you need to ask is there any value to forecasting future fundamentals? Don’t we really want to know how a investment is going to do not how the fundamentals are going to unfold? So to empirically answer this question, what we did was imagine that at each point in time we had clairvoyant access to future fundamentals if we could see the future. So say it’s December 1983, and we give our hypothetical selves access to 1984 year and earnings for all companies. And then we take these future earnings and construct a factor by dividing the earnings, the future earnings by the current, December 1983 enterprise value and construct portfolios of stocks by this factor. And so through simulation, we show that using this hypothetical clairvoyant factor, you would have generated just fantastic returns in excess of 40%. And so in our minds, this results motivates desire to forecast fundamentals in as accurate way as possible. And I think that this is just the tip of the iceberg on this idea of forecasting fundamentals, there are a lot of interesting directions, we can take this research, and that includes using unstructured data, since we’re using deep neural networks to help improve the forecast. And then there’s this idea of forecasting under uncertainty, for example, in the real world to investments that look extremely similar can have different outcomes, both in price, but also in how their fundamentals evolve. And this is just a product of the natural uncertainty involved in companies and investing earlier, you know, I mentioned these guys at Amazon, that have been doing a lot of work on this problem of forecasting, not just a value in the future, but forecasting, you know, not the expected value of of something in the future, but in forecasting the entire distribution of outcomes. And if you knew, you know what the distribution of outcomes and not just what the expected value was, you could construct portfolios in a much more rigorous and interesting way. And so I think, you know, using techniques like deep autoregressive, recurrent neural networks, I think it’s a very interesting and promising area. This ties

Corey Hoffstein 37:01

into a topic I want to get into a little bit, which is that today, most researchers and practitioners would agree that we probably have a pee hacking problem in the industry and a lot of other industries as well. We’re just when you have so many people evaluating the same data, you’re bound to end up with a lot of false positives. So we’ve discussed this fact that deep neural networks, by definition, we don’t have to pre engineer the factors. And we can just sort of allow the machine to programmatically search for methods of selection, weight, the importance transformed the data, which to a very naive ear, mine included sounds a lot like data mining. But you have made the argument in the past that, and I’ve read this on your blog, that machine learning can actually be when done correctly, many of the techniques in machine learning can be more robust to the problems found in traditional factor research. And I was hoping you could expand on that for me a little bit. And then and maybe some of the lessons that machine learning can offer to traditional factor research.

John Alberg 38:23

So I think it’s worthwhile to explain a little bit what P hacking is, in the social sciences and natural sciences, the way that you show statistical significance for a number, let’s say the alpha in a back test, is you construct a 99, or 95% confidence interval around that number. And the way that you interpret that interval is that if you were to run the experiment 100 times, constructing 100 confidence intervals on each try, you should expect the true value, that is the true alpha in the back test example, to be in 95% of those confidence intervals. And in five of those experiments, the true value will not be in the interval you construct. Now notice the nature of this, the more times you repeat the experiment, the more likely the chances are that the true value, the true alpha is not in the constructed interval. And so if you think about this, in the context of repeatedly testing for new factors, the more times you attempt to make, the higher the likelihood that you will get a spurious result. Now, in machine learning, there’s a long history of using techniques like out of sample testing, and cross validation to help avoid being misled in this way by the data. And instead, when you build a model, you don’t validate whether you’re confident in the model on the same data that you built it on, you validate its success on out of sample data. So to be fair, I think there’s been you know, some complaints from the finance community that out of sample testing and cross validation are not well suited for time series data, but that has really to changed a lot in the last several years. And there are now fairly powerful techniques that have been adapted to time series, and finance in general. In particular, there’s a paper called The probability of back test overfit, which introduces this tool that you can download freely to run your back tests through. And this algorithm, the probability of back test over fit, which was inspired by the machine learning technique, called cross validation, allows you to validate whether your series of back tests is likely to be overfit or not. My point is, is there is a wealth of research in the machine learning community on how to prevent getting fooled by your data and analysis. And I think the finance community could benefit from better understanding this work.

Corey Hoffstein 40:46

One of the interesting rabbit holes I went down in preparation for talking with you today was this idea of adversarial examples. And it seems to come up a lot in image classification where a neural network is trained to classify an image as an object. And by changing one little tiny detail about the image often imperceptible to the human eye, the neural network is completely confident it is not what the image is, or the opposite direction. An image that to the human eye looks like complete noise is classified with 100% confidence as being the image and this idea that by exploiting, I guess, they’re called almost activation pathways of these neural networks, these heavily used activation pathways by exploiting the structure after the neural network has been put in place, you can come up with these examples that show really how fragile the neural network may be. And obviously, maybe there aren’t isn’t a direct correlation to investing necessarily. But it strikes me again, that this balance of the threat of opaqueness and fragility of a neural network would almost require a higher degree of confidence to use it, that for me, something like a price to book price to earnings is a very transparent methodology. I want to go back to this idea of developing confidence in this thought of in applying a very, very new concept, something that’s really only gotten very popular in the last couple of years, and maybe isn’t incredibly well understood yet. Talk to me again, about how you go about developing your confidence in applying that approach.

John Alberg 42:39

Yeah, so the adversarial example is really a kind of creepy thing. And it’s related to this whole issue of transparency and model explainability. As I said before, it’s very comforting to use an ensemble of decision trees, because you can follow the logic of how a model is making a decision about a particular investment. Whereas with a deep neural network, it’s not so straightforward. But I would caution here that these issues, adversarial examples, and model explainability are insurmountable problems, I mean, going back again, to self driving cars, we’re not going to accept a car that crashes more just because we can explain why it made the decision it made. And we’re going to use the best models there are and these models are going to be held to a higher standard than even a human in terms of explainability. And that’s because it’s just not acceptable for a car also to kill someone and not be able to explain why. And so essentially, the incentives are so big, and the people who are working on it are super smart. So this is gonna get figured out. And you know, not surprisingly, there’s a lot of great research going on in this area, basically in how to take actions of deep neural networks and convert any specific action into an explanation. And further how to make deep neural networks safe from adversarial type examples.

Corey Hoffstein 43:57

There’s sort of three big categories that people point to when it comes to opportunities to outperform the market. The first is through big edges, so to speak, versus an informational edge that you’ve got better information than the rest of the market. The second is what I would call an analytical edge that everyone has the same information, but you’re able to interpret it with more accuracy. And then the final one that I would argue is an emotional edge, that everyone’s got the same information, they can all interpret it with the same accuracy. But you’ve got better fortitude to hold through pain when other investors fold and pass you the Alpha. Machine learning has been this really interesting space that it’s really been driven by an open source endeavor, that a lot of companies like Facebook and Google are publishing all of these tools that lower the barrier to entry for people who want to start exploring datasets with machine learning. And it strikes me that if machine learning is really an analytical edge in that second category, that there’s no going to be to a large degree, an arms race that leads to diminishing returns in the applicability of machine learning as an investment edge. Do you think that the edge in applying machine learning is analytical? And that over time that edge may degrade? Or do you think there’s a sustainable edge here for the application of machine learning in the investment landscape?

John Alberg 45:27

So again, this is not something that’s unique to machine learning. I mean, any strategy that’s out there is simple makes money and especially when it makes money in a short amount of time. If it’s known, it’s going to be susceptible to being arbitrage away. This is obviously maybe more true with higher frequency trading. But even in factor models, you’re starting to see people wondering, wow, does book the market really work anymore? And some argue that it doesn’t work, because you know, maybe the balance sheet doesn’t reflect true value of assets anymore, especially intangibles. But others make the argument that it’s just an overcrowded factor. I think the verdict is still out on that. It is interesting that with machine learning, there is a sort of flipside of the lack of transparency, when we don’t engage in factor engineering, and instead use raw financial information as input to say, a deep neural network. I mean, that is the model remains a sort of secret sauce embedded within this deep neural network. And the model cannot really become well known, like a factor that someone can no replicate and arbitrage away.

Corey Hoffstein 46:33

Machine learning is an area of really rapid growth right now. Lots of acceleration and developments in access to software. If someone wants to enter this field, what’s the best way? What’s sort of the foundational knowledge they need to develop? Let’s say someone’s in college, for example, what are sort of the core curriculum they need to think about taking? And what’s the best way for someone to say abreast of the developments that are happening?

John Alberg 47:01

So I think it’s really an amazing time to be learning machine learning. And I think what I would recommend to anybody who’s interested in it, make sure you have a solid foundation in math, statistics, linear algebra, and then jump in and take Andrew ings course, his Stanford MOOC on Introduction to machine learning. It’s just amazing. And then, as I mentioned earlier, there’s all these companies that are investing heavily in tools and putting them out there so people can use Google’s TensorFlow, Facebook has tools. Amazon has a huge number of tools within AWS. So there are great resources out there for people if they want to learn machine learning.

Corey Hoffstein 47:44

John, last question for you. And this is the question I have been asking everyone at the end of the podcast. And it is, if you were to describe yourself as an investment strategy, what strategy would that be? So what is the investment strategy that best encapsulates your personality? It can be anything from vanilla market beta to complex option strategies, and everything in between what what would it be?

John Alberg 48:14

I would describe my investment approach is machine learning applied to long term fundamental investing.

Corey Hoffstein 48:21

Why do you feel that that investment approach describes your personality?

John Alberg 48:27

Well, I think it’s largely to do with the fact that I built a business over more than a decade. And that experience really embedded in me a sense that value is creating companies not in markets and combined with my long expertise and enthusiasm for machine learning. I think those two things really drove my desire to approach investing in this way,

Corey Hoffstein 48:57

John, it’s been a real pleasure chatting today. Thank you for taking the time. I hope you enjoyed my conversation with John Hallberg. You can find more of John on Twitter under the handle John Hallberg and learn more about Euclidean on their website euclidean.com. For show notes, Please see www dot flirting with models.com/podcast. And as always, if you enjoyed the show, we’d urge you to share it with others via email, social media, and don’t forget to leave us a review on iTunes.