S7E2: text2quant – Flirting with Models

In this episode I speak with Bin Ren, founder of SigTech, a financial technology platform providing quantitative researchers with access to a state-of-the-art analysis engine.

This conversation is really broken into two parts. In the first half, we discuss Bin’s views on designing and developing a state-of-the-art backtesting engine. This includes concepts around monolithic versus modular design, how tightly coupled the engine and data should be, and the blurred line between where a strategy definition ends and the backtest engine begins.

In the second half of the conversation we discuss the significant pivot SigTech has undergone this year to incorporate large language models into its process. Or, perhaps more accurately, allow large language models to be a client to its data and services. Here Bin shares his thoughts on both the technical ramifications of integrating with LLMs as well as his philosophical views as to how the role of a quant researcher will change over time as AI becomes more prevalent.

I hope you enjoy my conversation with Bin Ren.

Subscribe on Apple Podcasts

Subscribe on Spotify

Transcript

Corey Hoffstein 00:00

Bin you Ready? Ready? All right 321 Let’s jam.

Corey Hoffstein 00:10

Hello and welcome everyone. I’m Corey Hoffstein. And this is flirting with models, the podcast that pulls back the curtain to discover the human factor behind the quantitative strategy.

Narrator 00:22

Corey Hoffstein Is the co founder and chief investment officer of new found research due to industry regulations. He will not discuss any of newfound researches funds on this podcast all opinions expressed by podcast participants are solely their own opinion and do not reflect the opinion of newfound research. This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of newfound research may maintain positions and securities discussed in this podcast for more information is it think newfound.com.

Corey Hoffstein 00:53

In this episode, I speak with Bin Ren founder of SiC tech a financial technology platform providing quantitative researchers with access to a state of the art analysis engine. This conversation is really broken into two parts. And the first half we discuss Bin’s views on designing and developing a state of the art back testing engine. This includes concepts around monolithic versus modular design, how tightly coupled the engine and data should be, and the blurred line between where a strategy definition ends and the back testing engine begins. In the second half of the conversation, we discuss the significant pivots big tech has undergone this year to incorporate large language models into his process, or perhaps more accurately, allow large language models to be a client to its data and services. Here Bin shares his thoughts on both the technical ramifications of integrating with MLMs as well as his philosophical views as to how the role of a quant researcher will change over time, as AI becomes more prevalent. I hope you enjoy my conversation with Bin rent.

Corey Hoffstein 01:59

Bin Ren thank you for joining me today on this episode, this will likely go down as one of the more unique episodes I’ve done, I am really excited to dive in to the world of AI. I know I’ve done a bunch of episodes on machine learning before but this is probably the first that I can really say is more AI focused. And maybe we will get into a little bit of the difference between machine learning and AI. But really excited to have you here. I know you’re on the cutting edge of what’s been going on this year with things like chat GPT and large language models. And I’m really excited to learn about how you are introducing them into your business. So thank you for joining me, Cory, it’s such a pleasure to join you today. So I really want to start with your background, you actually have a somewhat non traditional background for entering into the Systematic Investment space. You started your career in the world of cloud computing. Now today, cloud computing is ubiquitous. Every quant developer probably touches it in some capacity. But you were actually one of the earliest developers on the Zen virtualization technology. And you were an early engineer for Amazon’s AWS platform back in 2004, and 2005. And I just wanted to get a sense from you just for history sake. What was it like in those early days of building what we now call the cloud?

Bin Ren 03:26

It’s kind of interesting to think that it’s almost 20 years ago, actually, exactly 20 years ago. So in October 2003, I started as a PhD student in computer science at Cambridge University in England, I was a member of the research group that built and designed the Zen virtual machine monitor the cloud computing at the time, I don’t think at the time people you might have the right name for it. I don’t think we even had the phrase cloud 2003 was a very chaotic period. Because what happened is, the Intel was the biggest semiconductor manufacturer in the world at the time. And they have been riding this wave of Moore’s Law for decades. But they finally hit a ceiling in terms of single core CPU performance. And as a result, for them to continue to churning out CPU that’s doubled performance every 18 months, they really didn’t have a choice, but to go down the path of introducing more CPU cores on the same chip. So 2003 was like the watershed moment when they started to introduce dual core, quad core and Later on eight core 16 core CPUs in the market. So that was like the backdrop of the industry, the semiconductor industry at the time. Now, what that means what that we suddenly have this supply of a new kind of CPUs, and they were looking for a killer application. One next

Bin Ren 04:59

We’ll way to utilize this new computing power was to run so called multi processing or multi threading applications. But the issue with that was that it’s very difficult to write multi processing and multi threading applications. It’s not a simple switch, almost all the programmers are very used to and they are trained to write single thread single process applications, I will say a still the same situation even like 20 years later today. So that really didn’t quite work. So they were looking for killer applications and solutions that maybe, instead of forcing developers to change the way they work, we can divide the multicores we have today and chop it up into multiple single core virtual machines. And then we can run kind of a full environments on the same physical environment. And that’s how essentially the cloud was invented. So that piece of software that was responsible for slicing up physical resources like CPU and memory and hard drives, and then encapsulate each slice in a very secure and isolated environment, and then presented as if it’s just like a real physical computing environment to the developers was the virtual machine hypervisor. At the time, there was only one company offering something like this, it was VMware, it was very expensive, and it’s very slow. So my research group was building something called Zen. And we took a drastically different approach, and turned out to be 100 times faster, and it was open source. So we put the entire source code on Git. And funny enough that did was created by Linus Torvalds, in 2005, the creator of Linux, so it’s also in the same period. Let me just digress a bit to talk about like, why he literally took like one year off from Linux kernel to build get the Gates was a more decentralized version control system. Because the ecosystem Linux was growing at such a breakneck pace, that there are just too many developers around the world are trying to contribute to the Linux kernel source code. So Linus realized the bottleneck was the version control system they were using. So he actually had to build a more decentralized version of version control from scratch. And that was called Get and get them then to GitHub in 2008. So we put our source code in Git, and made it easily approachable and usable by pretty much everybody in the world. And I remember that was the first external developer who found our repository of source code, started to use it, and reported some errors was someone from a developer’s team from Amazon, based in Cape Town. So we were in the same time zone and that team at the time, obviously, we didn’t know that would move into the AWS cloud computing business at Amazon. But that team was one of the earliest, if not the earliest, external contributors sort of them project. So years later, I think five, six years later, they launched AWS v 200 708. And the entire AWS cloud environment was based on the multi core CPUs by Intel, and the Zen virtual machine hypervisor. That was a really fun time.

Corey Hoffstein 08:37

I hope you at least got some Amazon stock out of it for all your contribution. I’ll say on behalf of all quants, thank you. Because I know, cloud computing revolutionized the way in which data analysis was performed. It’s been instrumental in so much of the work that I know we do, and as well as quants, everywhere, so hugely impactful on the industry. And I’m sure we could spend just a full hour going into the tech side, my background being computer science, I absolutely love this stuff. But I want to keep this episode on track, because I want to make sure we get to the really juicy stuff that’s happening today, and not just talk about the past. So I’m gonna jump forward a decade, just to keep this process going to keep the conversation going to your time at Brevan Howard, where you helped build the Systematic Investment Group. And the question I have for you here is that brevan Howard is well known as a macro hedge fund and discretionary macro hedge fund in many ways. And I’m curious as to what sort of lessons you learned trying to run a systematic trading team inside of that macro process.

Bin Ren 09:42

I started my career as an equity exotics trader. So my initial background was in fairly exotic, fairly complicated derivatives. And I started to do a lot of multi asset and structuring and derivatives and then systematic strategies. So at breakeven, there are multiple lessons I’ve learned, which has a huge impact on both me as a person and my career. The first thing I want to talk about is, you know, a discretionary macro hedge fund is extremely derivatives heavy, right? So I think the bread and butter of a macro hedge fund is you have these traders coming up with ideas. And then the secret sauce I will say is mostly about how to structure and sizing the trade. For people from a more traditional pure systematic trading background, we tend to see our job as more of a prediction problem, we think about financial assets. And we think that the financial assets, if the market is efficient, most of the time, then the financial assets are fairly priced, which means the returns are more like a random walk. Therefore, we seems to impose ourselves a prediction problem with a benchmark of trying to beat the market in terms of predicting tomorrow when there are certain assets going to go up or down and trying to do better than just 5050. Now, in an environment of discretion and macro hedge fund, it made me realize, actually, one of the implicit assumptions we had when we thought about this beating 5050 prediction problem was like, actually, we were assuming the assets is linear. And the assets were offering a linear payoff, like futures or ETFs, or single stocks, they have very little if any convexity. Whereas in the derivatives world, when payoffs are highly nonlinear, then 5050 is not really the right benchmark, because you could be betting on something that only become in the money like 20% of the time, but the payoff could be five, six and a time. So it’s really the payoff and prediction become two sides of a coin. And suddenly realize, depending on how we frame the problem, sometimes it’s easier to crack it as a prediction problem. Or sometimes it’s easier to just assume we’re not knowing better in terms of probability, but we may can do something better in terms of finding the right trade structure or there’s some instrument temporarily mispriced. And the Think of that way. So I think the first lesson was those two angles, just simply give usa second problem space for us to think about what we worked out. The second lesson I learned during my time there is that Brandon Howard was a discretionary macro hedge fund. So they didn’t actually have any legacy technology, stack or tech that devoted to systematic trading. So the entire project turned out to be a greenfield project. That allowed my team to adopt what we thought at the time, the most relevant the most demands, and state of art technologies and libraries, such as the Python pandas, such as the different system designs, and we certainly design a system with a view about running it in the cloud, etc, etc. So that I felt was a big blessing, because most of the systematic found going back decades, and they have years of tech debt. And I personally think being a financial institution that technology development tends to be a means to the end. And he always simply act as a tool to the front office when it’s trader or portfolio managers. And because the in house technology is not a product offered to the entire market, and so there’s no competitive pressure on the technology team on technology designs to be very competitive, as long as it works. It’s okay, so being able to do a massive Greenfield project from scratch in Brandon Howard, had massive benefits. The other one I learned at Brandon Howard is what I would say, the brutal intellectual honesty, because for discretionary traders, they have this culture because every bad they make, the payoff in the end combines the skills with the luck. So it’s very important for the traders and the risk managers to understand why they make money. They try to explain their profits and losses, so that they know when they’re getting lucky or unlucky, or the trade simply doesn’t work. I think for systematic trading. Sometimes we can be a bit too data driven, too data focused. And sometimes we even build models that we don’t really fully understand or can interpret ourselves. This discussion is not new and can even get a bit philosophical. But at brevan It was quite important that for every single strategy we build, we had to have some proper explanation and confidence in why the thing works and what assumptions it works. And so that we know, when a seemingly stops working, is it because luck or just because some fundamental assumptions are no longer valid? I think that was quite important. I will say those are the three top lessons I’ve learned during my time at Brevan.

Corey Hoffstein 15:24

You initially spun out your firm CIG Tech with what you’ve described to me as a, quote, state of the art back testing engine. You talked a little bit about having the opportunity to build something from scratch internally at brevan. And this is what you eventually lifted out. I’m curious, in your opinion, what does that quote state of the art back testing engine actually look like today?

Bin Ren 15:52

I think even today, there’s no clear and standard definition of what a back test engine is. I think many people think it’s a quite a simple thing, which is just like buying and selling weights and returns and you do the linear algebra, you get your results. So it makes sense that there’s no standard definition for it. Because there’s no off the shelf, better standing, there’s no industry standard, they can’t just buy something and plug into your system. Everything is bespoke. So when I talk about our back test engine, I think there are certain features and design choices we made very early on, I think that made it a bit different from most of the efforts I’ve seen, the first one is truly multi asset. Again, that’s because the market, we were trading, we were trading pretty much everything. So we had to support different types of assets. And not just different asset classes, but also including Exchange Traded assets, but also over the counters, derivatives especially. And then we have to support the linear and the convex instruments. So we had to do spot contracts we have to do for FX forwards swaps, but then we have to do swaptions and at the index options and options on Treasury futures. So really kind of have a very heavy part of the whole system is to be able to do like proper pricing. And then we designed very flexible extractions, we want to support three different levels of abstractions in our system, which is like instrument strategy and portfolios for instrument is just any type of financial instrument. So strategy is just combined using a certain set of signals to buy and sell different kinds of instruments. And portfolios is just a basket of strategies that you can allocate your capital to whether based on dollar amount, or based on risk limit, or even based on performance, etc, etc. But then we decided that we want the system to be also to be quite easily composable. Meaning that what happens that we want to put some strategies in your portfolio, but we also want to add some instruments, should I just work? And what if we have a long only portfolio but then we want to turn it into a long short portfolio, can we just simply add a short leg to it. So this kind of composability was also quite important, because we don’t want to handcraft each one of these possible combinations. So we made a conscious decision to have a kind of unified abstraction to allow us to combine different types of instruments strategy, and even portfolios quite flexibly. And then we have to decide like any computer systems, what kind of compromise we want to make between the speed and the exact details. So here is like, I think we design the whole system in a way that at the core of the whole thing was, let’s just call it the whole engine. All it does is given a list of trading structions buying or selling instrument on this time. All it does, it’s okay. I don’t even need to know what kind of instrument This is. All I need to do is someone tell me the price and someone tell me how I should charge the execution costs. And I will do it. So this bid is almost like running. To borrow analogy. It’s almost like a running assembly or machine code can be very fast. So the abstraction is quite low level. So that bit is highly optimized. So it’s entirely written in saizen is compiled and it’s very fast. I think one thing that we certainly gave a lot of thought to was about how to model execution costs. Because some instruments if you think about it, the transaction cost is charged per unit. For example, when you trade the single stocks in the US as you’re paying a fixed amount per share, it doesn’t matter the shares are trading at $1, or some shares are trading at $300. So that makes a big difference. And reminds me of the reason why Warren Buffett refuses to split his stock for so many years, because to minimize the transaction cost on a percentage basis, but then there are lots of instruments whose transaction cost accurate as a percentage, especially in the derivatives world, and we do like FX forward, we have two options, the execution causes always quoted as basis points. So be able to do with that properly, is quite important. I think that’s where the idea that we have to really understand and be very detailed about the data models becomes a very important decision. Because imagine your normal, simpler, back test engine operating simply as the level of returns and weights, then you become essentially incapable of applying execution costs, which are per unit, you can only do maybe as a percentage, but then you won’t be able to model transaction cost or market impact on a per unit bases. So it’s a quite complicated system to build. And it’s hard to define the boundaries of where the back test ending starts. And the word of back testing kind of stops. But we certainly think that over the years, after multiple times of iterations, I think the entire code base is maybe just under 1 million lines of Python. But I think over the last eight years, we must have written and deleted like three or four millions of lines of code. So effectively has been rewritten three or four times, I think we’ve made a bunch of very important decisions. And so far, I think we’ve gotten most of them, right?

Corey Hoffstein 21:56

I’m sure every programmer can sympathize with the idea of deleting three or four lines of code for one day, actually, right. That’s I think that’s the typical process. But I want to drill into a point you made both at the beginning. And at the end, you sort of came full circle there, you joked at the beginning of that last answer that you don’t just abstract every instrument away into a time series and just do a simple linear algebra approach, which I think is sort of the naive first cut that most people would make in building a back testing engine. But your engine is actually very tightly coupled to the data models themselves. The example you gave there, one example you gave was about transaction costs being different for different instruments. I was wondering if there were other reasons as to why coupling the engine so tightly with the data models is really important for building a robust back testing engine.

Bin Ren 22:51

Yeah, I think the other major reason is, for example, when we do the data models for different instruments, we actually can control the timing of execution. Because for example, we know for listed instrument, let’s say single stocks is listed on a certain exchange in certain value, we actually, through the data models, we know what’s the time for the closing auctions. And when it does exchange holidays. So then we can model that, because that’s quite important, because sometimes the trading ends like half day, right? So you think that’s a one day return, but it’s more like a half day return. So that has an impact. And then there are certain global markets, especially like effects. So there’s no closing auction for FX just trades all the time. So you have to take snapshots of those instruments intraday to sync up. And then I think the timing thing also applies to the ETFs that you can get ETFs with global basket as underlyings. So some of the assets are not trading, some are trading. So how do you model the ETF versus the nav? How do the approximation so I think that gave us a lot of flexibilities. The other thing I will say is that the data models also allow us to support intraday since we are modeling the execution the timing so explicitly, then we actually can do intraday as well. For example, we can do a strategies that can have signals or execute at arbitrary like timestamp during the day, we actually support in terms of granularity of frequency, we support up to one minute bars, so we can execute or have a signal at any of these bars. But with the timing you are not constrained by the fact that you don’t need the type the step of each time series to be evenly spaced. You don’t have to worry about this our returns all day returns. Now you can say oh, this is like the price save over 15 minutes. But then the next time you think about maybe these like three minutes. So that gave us a lot of flexibilities. So our backtesting engine, and the whole system around it can support like the bread and butter, daily frequency, and the medium frequency strategy. But we can also support the live streaming, more sporadic, intraday trading.

Corey Hoffstein 25:22

The typical design I see with back testing engine across the industry is this pipeline approach where you have data ingestion, that gets fed into some sort of signal processing, that gets fed into a strategy that gets turned into an allocation that gets fed into the back testing engine. And your approach philosophically strikes me as a much more monolithic design where you’re somewhat blurring the lines between the strategy and the back testing engine itself. Why do you think this is an important design decision for you to make our philosophy is not vastly different from what you just described.

Bin Ren 26:02

Because our typical workflow of our user is still data ingestion, but that’s taken care of by us as thick tag into our data lake. So that’s taken care of, but it’s still like data ingestion to signals. So a portfolio manager use the data to generate signals independent, how he ends up expressing that idea or signal into the trades. So and then I think the biggest difference is where he expresses his signals or trade ideas into strategies, I think, we just offer a lot more control a lot more details to allow the portfolio manager to really express exactly he wants to implement. So this really goes back to what I said earlier, the lesson we learned at brevan. Howard is I think that a lot of St. Drew’s in terms of sizing trades, and structuring trades, especially the ones that offer you asymmetric payoff, some convexity on the payoff, I think be able to express sufficiently all these different kinds of sizing and structuring complexities. So we’ve really put a lot of thoughts into the strategy bit. And once you have that, again, it’s become quite similar to the normal approach, which is when you build portfolio that you can allocate capital in terms of dollar or risk to these different strategies. I think it’s really the key difference here is not so much on the workflow itself. But the fact that we put a lot more functionalities and lots of kind of different design decisions in a strategy bid to allow people to express sizing and structuring.

Corey Hoffstein 27:47

Can you talk a little bit about how you incorporate alternative, or maybe even more specifically unstructured data into this design?

Bin Ren 27:56

I think about alternative data in two broad categories. The first one is alternative data steel in a time series form. Right, so for these kinds of alternative data, it’s reasonably straightforward, because you’re still dealing with time series, then really, the challenge is a kind of control the quality, because when you validate and clean alternative data time series is very different from validate and clean financial assets, like return times errors, right? different distributions. So that can be a bit tricky. But the other bit, that’s a challenge is to successfully map this alternative time series to the right entity in a financial market that could be mapped to a company could be mapped to instrument. So that’s the first category are still in reasonably straightforward. When it comes down to the second one, which is unstructured, alternative data, we’re talking about non textual data, or even maybe images, there are lots more modalities to this kind of datasets, then there’s a lot of extra complexity of building essentially your completely different data ingestion and data pipeline and the entire different data infrastructure to deal with it. Like we had nothing to do with time series. So for a long time, people tried to apply some kind of machine learning techniques to unstructured data such as natural language processing, or image processing, with the goal to turn them into useful time series, I would say maybe with limited success, because this is transformation. Essentially, it’s part of the process of looking for signals. It’s quite complicated because the truth is when you apply some NLP or other machine learning techniques to turn them into time series, you can’t exactly measure the quality of those time series by themselves, right, you have to basically apply the time series and try to turn them into trading signals. And with similar needle strategy before you can see whether this works or not, right, this feedback loop is really long. So spirit by Luca very long, and the feedback is, therefore with a very low quality and when it happens, because you have your jumps through so many different steps, it’s hard for you to ascertain which stage is responsible for the poor quality. Is it because of the natural language processing technique? Not good enough? Or is it because what I did with the time series was not very good enough. So it’s just very challenging, I think, not until today, with the arrival of large language models, which is essentially build and train from scratch to deal with textual data. And now, multi modalities. Not until today with large language models, do we have, I believe, really promising solutions to bridging this huge gap between unstructured and unstructured time series?

Corey Hoffstein 31:15

Well, that’s a great segue, because you made a really significant shift in your business this year to integrate this back testing engine technology with large language models. And I’m curious, what was the catalyst for that decision? And how does it ultimately change the user experience of your technology?

Bin Ren 31:37

So we started SIG tag, four and a half years ago, and the biggest hurdles that our users encounter in their daily workflow. A, they need to access high quality data. And be they need to know rather proficiently how to write Python code to use our platform and analytics framework. And three, obviously, the user have to have very good market knowledge. And they all spend a tremendous amount of time every day just to stay up to date about the market and the world in general. So when I think about how can we help the users to be more productive in our workflow, the key thing here is a lot of them is very hard or very rare. To have someone who are very proficient in programming, and very professionally market knowledge, and somehow very proficient in dealing with data. It’s just very hard that we have users who told us, they love using our product. But the key his issue is wherever he went on a holiday for two weeks, he came back, and then he realized he forgot how to write some code. He had to reread the tutorials, the API references, it gets a lot of work. So we spent a huge amount of effort over the years to try to make it easier for our users and prospective users. But I will say all those efforts were incremental. On to at the beginning of this year. When I seriously started using the chat GBT in the chat GBT was released in, I think less than a year ago at the end of November 2022. It was came out with champ GPT three, which was impressive in a sense that a felt like 1415 year old teenager, but you can have a decent conversation with GBT. And then in March this year, they announced GPT four. I think that’s for me, it was the moment that I realized, this is the big deal, because GPT four, I think is about 10 times better in terms of the reasoning capabilities, it can write code. They scored, I think the top 10 or 20 percentiles in almost all the different domains, professional and a university entrance exams. I distinctly remember that jeopardy for passed the bar exam in a state of New York in the top 10 percentiles. So suddenly, from March this year, we have access to this, for the first time, essentially intelligence as a service. We have this maybe the number one polymath in the world attending slash 20 percentile expert in multiple domains at our fingertips, and we just had to pay $20 to access it. And we all know that chargeability turned out to be one of the most successful launch of any computer software in human history. And they had hundreds of millions of users and over 100 million paying users within a few months. So then they realized that the large language models since it was so good at writing code, and especially after I tested it with certain financial market knowledge like when this happened when English Jim goes up, how do you think the yield curve should move? It had all these concepts of like steepening flattening this idea of inflation impact on asset pricing. Undoubtedly, a lot of the economic literature was part of the 20 datasets. And then I realized why not make the leap of considering large language models as our potential users. So why do we have to constrict our business to providing the data and tools to human users? Why don’t we want to also do the same thing for large language models treating them as potential customers. And the implication of that is if we can do this, then people who may not have the necessary expertise in programming in data processing in market, they can actually interact with the large language models, school natural languages, and then leverage AI to solve the problems for them, maybe not at level of professional with decades of experience, but certainly magnitudes better than what they can deal with today. And so by making this jump is pivot, not only can we expand our total addressable market to include AI models, but we actually also able to reach a tremendous number of human customers, because just by introducing AI as the intermediaries, so it was one of the most important decisions are made at tic tac, very happy, the whole company got behind it very quickly, and only took us about four and a half months. From that email I sent to everyone to actually getting the whole infrastructure we architected and built and launched our first AI product for other firms that are evaluating open AI and chat GPT. And considering some form of integration,

Corey Hoffstein 37:00

can you talk a little bit about your diligence process and the path you took to get comfortable enough with the technology that you are willing to say this is a meaningful pivot that we need to make today,

Bin Ren 37:12

a major advantage for us is now sick Tech, we are not a financial institution, we are small, and we are not regulated. We just focus on technologies, and we make tools. So our consideration is understandably a lot less complex and nuanced than the financial institutions. But when we speak to some of the larger financial institutions, and we discussed this, the number one question we raised was data privacy. They always ask about, okay, and let’s say Okay, our employees use chargeability to do their job. What happens to these queries? Right? Is open AI going to use the queries to train the models? Are they collecting our queries, therefore, knowing the essentially a real time stream of inner workings of our business? And if you are like an investor or trader, and you ask the queries, that is unique, your alpha? So there’s a lot of question essentially, around data promises. So that’s something that’s like the number one question people ask. The other one is more about more like as skeptics, and they’re like, what’s the big deal? And the truth is, large language models didn’t just happen overnight. Open AI has been around for 10 years. So they have been working on this for 10 years, making steady progress, and then really reached a like inflection point, but the whole AI revolution going back decades. So before lunchtime, in the models, there were deep learning, you know, the neural networks were invented in the 70s. And people were thinking, Oh, my God, this is like the best thing and the major breakthrough and modeling our computer system, you know, like our own brains, and it’s so generic is so trainable. Industry went through what we call an AI winter, and people realized, actually, this generic neural networks couldn’t compete in terms of performance with the handcrafted algorithm. So they were abandoned, people went back to crafting known domain specific narrow, machine learning systems like image recognition for speech recognition. And then image net came in, in the early 2000s. And they actually, we spent multiple years I’ve been a professor at Stanford, her team spent multiple years collecting millions of images, we’re creating an image net and classifications and suddenly a deep neural network with sufficient data and the computation suddenly, outperforming all the handcrafted algorithms. So what I’m trying to say is the AI industry has been through multi decades of up and down there summers and winters and all the Four Seasons come and goes. So it’s natural for people to be very skeptical, what’s different this time? So that’s the kind of the second most commonly asked questions. So my answer to that is, well, the proof is in the pudding. So there is any entry we have today, it’s generic. And everybody can test it test tested, you are the expert in finance, just have a chat about finance. If you’re an expert in legal affairs, and just have a chat about legals, and you realize it’s really a game changer. So on the data privacy side, I want to circle back to that is we got comfortable with that, because that’s a very common feedback and a question asked by everybody. So openly, I came out very quickly to address it. They introduced like data privacy controls, they have data privacy policies, and on a per user basis, you can turn off sharing your queries and your conversations so that they are not used by them to improve their models. And then they are right in the middle of rolling out enterprise versions of the 10 GBT where data privacy is a given for enterprise customers is absolutely given. So that’s been resolved. And if you don’t want to use the API products, there’s a whole ecosystem of open source, large language models such as llama, number one, number two, and people have been fine tuning these foundational models into different domains. And the beautiful thing about using open source models is that you can host it yourself, you can host it in your environment. And you can deploy them in a secure private cloud environment. Think of it as a generative AI in a box that satisfies all the normal enterprise software requirements. So I think it’s kind of hard to imagine that we only have had chargeability for less than a year. But the speed at which the entire ecosystem around generative AI is moving is really tremendous. I think in the next weeks and months, a lot of the questions either will be answered or will become obsolete. So that’s one of the exciting aspects of be part of this revolution.

Corey Hoffstein 42:07

You mentioned on our pre call that there are different design philosophies that are necessary when designing API’s for large language models as your clients versus humans as your clients. Can you expand on what you meant by that?

Bin Ren 42:23

Before I talk about designing API’s for large language models, let me just highlight two major limitations with large language models. The first one is the lack of access to updated data, because by design, and AI models is trained on a vast corpus of data sets. And the moment the training is finished, the AI models knowledge base stops being updated. So there’s always a gap, knowledge gap in terms of the AI model and with the real world that we live in real time. So by construction, the AI models knowledge base is always out of date. The other limitation is large language models today, our reasoning engines, incredible reasoning engines, but they need tools to complete tasks in certain domains. It’s kind of similar to us humans, we have all learned math with school. So we know the concepts of math, we know how the formula works, we know how to do square root calculus, but we still need tools such as a calculator to actually crunch the numbers. So be able to use tools. It’s so important. I mean, I think people say two usages, like the one the major difference between humans and other animals or types of Homo sapiens. So to use, and access to up to date data. Those are two challenges that have to be addressed. So if we think about API’s, API’s is currently the most common way to provide tools to large language models. What that means is that, let’s say I ask the question, tell me how the SNP has performed this year. Now, it immediately runs into two issues. A, the knowledge cut off for the GBD for was September 2022. So by definition, it has no idea what happened in year 2023. Number two is even if it had access to the SMP time series, unless there’s some kind of article in textual form that talks about the exact returns from the first of January to today, the 15th of November. There’s no way for the large memory model to just easily work out these calculations, but largely With a modern No, not oh, maybe there’s some tool or API exists that allows me to deal with this. So what happened is, you can provide API ways of function signature, but also with a very detailed description. And then you put all this together in something called an open API spec. And you give this back to the large language models, and not longer models will go through this pretty detailed spec and understand, okay, there are 10 API’s. Each API does certain things. Each API accepts certain parameters. Some parameters are required, some are optional, big parameter has a certain type. And then each API has sent a response, the response coming certain format, most likely in a structured JSON, the JSON should look like this, that the schema two it allows you to express the API’s in a spec, which essentially the textual form, the last longer model actually is able to use it. So well. I asked the question about SMP returning know from this year, year to date, that large language models were able to generate function calls, it needs to invest the code on the fly, call the certain API’s to do the calculations, and then get results back because that’s how the true usage for large language models work today. So Cory, if I go back to your question about what are the major design decisions when we design API’s for large language model versus humans, when we design API’s for humans, we tend to follow a very modular approach, which is we’re trying to constrain that each API to do a very specific thing. So a very specific task very narrowly defined, a very well understood, signature can be simple. So we rely on the humans to figure out how to compose these API’s in the right way. If they make mistakes, they will use debuggers. And you will fix it. Happy days now with laughing with models is be different. Because if we present a large longer model with dozens or hundreds of API’s, now, think about what happens in the model. So when we make a query, First, there’s the question of two selections. So the large linear model has to figure out oh, given a query, which tools among these 100 tools have been given, actually use that’s relevant. And that’s the two selection. And let’s say it made the right choice. And then there’s the second stage, which is true coordination, which is like, okay, now I have this 10 tools I have to use. But in what order? Do I use this API first and use the response as the input of the next one, make? How do I chain them in a right way. So you see the complexity here is combinatorial. So very quickly, it’s getting out of control. So the performance of to usage by large mobile models goes down rapidly if they are presented with too many choices. So that become a huge consideration. In terms of do we expose fewer API’s with a more generic functionalities. And that actually is the case. I want to give example, like the one of the most popular plugins which LGBT is from Wolfram Alpha. If you look at their API, there’s only one API, there’s only one API is called Wolfram Alpha query. And then all it takes is a natural language, kind of a style input string, and then it do the entire computation of figuring out what that means on their side. The reason that’s a very extreme example, but that’s because Wolfram Alpha spent decades, essentially building a kind of a natural language powered infrastructure and system on their side. So when they expose an API, to large language models, all they do is expose one API, send me some natural language, we will sort it out. But I think for most systems, the way we expose our functionalities as tic tac, to large level models, we try to keep the API’s to less than a dozen, less than a dozen, probably like, lower single digits. Because the more we expose the more corner cases, the more easily confused. The large language models are. I think the other major consideration is latency. Because when you have, say, a large number of API’s exposed to the large language model, what happens is, let’s say the query requires 10 steps. So the larger we’re gonna have to make like 10 API calls in the right order with the right parameters to make it work. Now, that means 10 round trips between the large number intermodal and API endpoints There’s 10 long trips over the internet. And it’s a sequential, there’s no way for you to make it parallel, there are no way for you to speed it up. And if anything breaks in the middle, you kind of have to start from scratch. So when we try to minimize this latency and multiple round trips, that this kind of pressure to ship one more logic, internal API completion onto our end, so that we’re exposing 100, we expose a three. So the only three round trips at most, so I think that’s also become a hugely important considerations. And I think this is one of the reasons why I see Microsoft Azure Cloud Service to have a distinct advantage because they host the open AI models in Microsoft Azure environment. And you can also run our infrastructures like API endpoints and services in the same data center, right, the latency naturally is lower just being able to do everything in the same data centers, versus running your hosted models in an open AI data center with essentially the Microsoft data center and then running your services elsewhere in Amazon on Google Cloud, that you have to incur this massive cross internet latency. I think that’s a major consideration as well. And the last one is, I think, again, to make the large local models use the API’s correctly, there is what we call the fine tuning so that the models can generate more often correct and reliable, structured output. Because the API output has to be valid JSON objects, and that follows certain schemers, it takes some fine tuning of the modules to generate them correctly. I think the latest announcement by opening that day last week, they actually offered the developers a flag to say, I want the output to be a JSON. And then they actually building this native support for JSON output into the model decoders. So to make the model literally incapable of generating invalid JSON output, I think that’s a huge step in helping developers using natural language models to write code and making API calls.

Corey Hoffstein 52:22

So I want to make this all a little bit more concrete with an example. And so let’s say I fed a query into your interface that says something like plot the returns of going long, The Magnificent Seven and short, the NASDAQ. Can you walk me through how that process works in the background and how chat GPT figures out the right API calls?

Bin Ren 52:50

This is a super interesting example. So pretend I’m the GBT fornot as invalid and pretend I’m Native, GBT four. And then I received this query from quarry. And then first I will realize, oh, this sentence, I see the words, returns, okay, return coming, anything could be returned a possible Amazon returns in the context of Magnificent Seven, and particularly short, the NASDAQ. And it means financial returns. So this query is about financial returns, okay. And then I remember that I have a bunch of tools exposed to me by a company called Sick tech. And one of the API says, This API allows you to fetch the financial data of financial instruments and calculate their performance. Now I realize, okay, now, this query require me to have the performance matrix of two things. The first thing is Magnificent Seven. And then the second thing is now stuck. Now, I need to make two calls to this API to figure them out. But again, let me first figured out what’s Magnificent Seven. This is where I dive into my own amazing reasoning capabilities to figure out oh, Magnificent Seven in this context means the seven most popular largest stocks in the tech sector. And actually, it’s part of my knowledge base, because I was trained. And I know, so I know the names of the companies. Okay. Then I realized, no, I know the name of the companies. There’s API call exponents, tic tac says, when you want to look up the ticker, or like the unique ID of a financial instrument by passing the names of some contexts, call this so I call this API and say, Hey, these are seven companies I want to play with in terms of financial calculation. Tell me tell me the tickers. Tell me what do I do? So that the API will actually return? Okay, these are the seven tickers and then what do you need to do is to construct a total return a very simple total return strategy on these because some of these company pay dividends, you need to receive the dividends. So not just price return, but actually in total return. And these are the API to call to construct that. So I then proceed to call that API to construct seven total returns for the seven stocks. What the last would have those back, I realized I need to turn that into a portfolio. And then there’s an API for that so called API to construct the basket of the seven stocks, because I don’t have any extra context. So I will just assume, again, using my reasoning capabilities, it’s going to be equally weighted. It’s going to be rebalanced, maybe quarterly, and that’s other parameters I used to construct it. And Wallah, I got it back. I have now have a lonely single stock portfolios at my disposal. The next step, NASDAQ, I call the API from SICK tech asking NASDAQ, what’s the right tickers? What instruments are we talking about? The sector replies, NASDAQ, well, not too short it so it has to be tradable. So most likely, you have to use the rolling futures have the rule is NASDAQ 100 futures, or you can use ETF. I actually choose to use the futures this time. So I said okay, construct this rolling futures strategy on NASDAQ 100 index, tell me when you’re done, well, I get that back. Now I have two legs, the long leg and the short leg. Now I call the basket API again to say, I want to long as the long The Magnificent Seven basket portfolio, I want to short, the rolling futures strategy on NASDAQ, and then tell me the returns and then plot the graph. Finally, I get everything back as synthesize and generate a coherent description of the entire output, including the numbers, including the performance metrics, such as Sharpe ratio, drawdowns, returns volatility, and also during the chart. And that’s what I returned to you Cory, as a user in a chat community chat window. It’s very complicated, but the whole thing takes about 40 seconds.

Corey Hoffstein 57:03

So one of the most amazing things to me about the example you just gave his the large language models ability to implicitly infer context. There’s a lot of decisions default decisions that were made in that example, based on context, but when we’re talking about quant strategies, nuanced details matter, and any sort of incorrect inference could mean the difference between success or failure of the strategy. So how do you expose these inferences to researchers without the whole process just unwinding back into low level coding?

Bin Ren 57:41

Yeah, so I’m going to say something on this ability of large language models to make kind of assumptions quite often in the right way. That’s actually built into it. Because fundamentally, the large language models is trained to do one similar very simple thing, which is given a string of words, predict the most likely next word, and then keep going. That’s all he does. And then the seemingly simple ability turned out to be so good, and so effective at capturing the structure of a human language. And because the human language is nothing but a concrete expression of a human’s thoughts. So large, longer models also happens to capture the structure of human thoughts. So that’s where all this inference capabilities come from. So if you think about its ability to come up with the right context, I think you infer from like such a short like sentence of curry, it just because all these kind of extra information is the most likely words that you would have said, if you said it. So that’s how the language models that kind of accompanies your sentence almost like, make it longer to provide the context by itself goal was think of like, take a query, and the large longer models appended another 200 words to it, because that’s the most likely follow in 200 words, if you were to clarify what you actually mean. And that’s what drives all the actions afterwards. And going back to a question about this nuanced details, I don’t think that’s free lunch. So I mean, if you don’t express your nuanced preferences, or like certain very detailed decision, if you don’t express them to the larger level models, I mean, there’s no way especially when these are extraordinary decisions, which means those are not expected, meaning they are not predicted by default, because they’re kind of on the left or right tail of this distribution of words. Then, just like any human, if you don’t tell them, they are not going to guess it correctly. So there’s no free lunch here. So if a user however, I want to say luxrender model is highly moldable. Right, the more you tell them the Want to get through. So I think really the key here is, if we aren’t certain user that you don’t really have very nuanced opinions, it’s okay just have a not very nuanced discussion. And that’s what you get. But if you are calm trader, you’re very nuanced. You can go deep, you can go deep, have a very deep conversation, and you can zoom in and zoom out and then customize specific details, you can have super long conversation to get to what you want. I think what’s amazing about lifelong learning models is that it is completely topples. people’s traditional idea of what computer programming is, we grew up thinking of computer programming is a pretty dumb, especially deterministic piece of software, we run it in one way we get one results back, it’s very predictable, whereas not loving models. It’s totally malleable. So we have this new whole areas of called prompt engineering. I think for the most sophisticated users, they can not only have a very nuanced conversation or super nuanced conversation with the light language models, but they can actually ask the large local model to generate the code. And then they can take the code and use it as the starting point for them to really further customize, and control all aspects at all levels. So in general, I think people should take a more open kind of attitude towards larger library models as a tool, instead of thinking they’re like very narrow applications that only produce deterministic and predictable output.

Corey Hoffstein 1:01:43

One of the interesting ideas here is that in the future, strategies will no longer have to be shipped as code but could actually, in theory be shipped as text. And I want to contrast that with sort of some of the subtleties of language. And there’s this idea called contrastive stress, which highlights that the same sentence can have a completely different meaning depending upon the emphasis we place on each word. So for example, the sentence I only gave her flowers means something different if I say, I only gave her flowers versus I only gave her flowers. So one of my questions back to you would be if we move from this world of code to text, do we risk creating too much room for subtlety and misunderstanding,

Bin Ren 1:02:31

I actually think this is more of a feature than a bug. Again, there’s no free lunch. If the sentence is very short, and leaves a lot of room for interpretation, then, whether it’s a large language model, or is a human on the receiving end of it, they will have to do a lot of guessing unless the guessing is based on the combination shared context between the sender and the recipient. So if the human on the receiving end, if this person knows you very well, and he may be able to infer a lot of context around it. But if he is a large language models that you have been using, or conversing for problem solving, and intellectual discussions for a long period of time, the model could potentially already have a good understanding or memory of your preferences. And then it may also infer correctly, the context around it. But if the recipient is a stranger, or large language models, that has no idea about the complex, and then I think it’s fair game. And I think the feature, what I mean by the feature in this case is that if I receive a text a sentence, which represents the essence of idea from you, then instead of seeing there’s room for interpretation as room for confusion, I may just see it as drew for creativity, I may build upon the essence of the idea that you have shared with me and apply my preferences. And the my ideas on top of it. In some sense, we may be looking at a very interesting kind of application of large language models is almost like a very lossy compression. Besides the application of this to quant strategies, I mean, is there a way for example, when we go to a website, rather than having one picture of a cat that everybody see, actually the website is built in with a text saying has a cat sitting near a window with the sunset in the background. And then when you or I visit the website, the image is actually generated on the fly by our browsers using the large language models that have already learned certain preferences of us. So I may see your cat actually I will see a different cat from yours and a different window from yours and a different sunset. I think that opens up a huge room for being creative. But also by being the nature of the compression. The size of the sentence describing the essence of this picture, or strategy is far far, far smaller than the full picture. And the strategy itself, I think, that can have certain implications in terms of internet bandwidth in terms of latency, where the creativity part of it maybe opens up like a new possibility for customized advertisement on the user end. So pros and cons. But I tend to focus on the more interesting, productive side of it.

Corey Hoffstein 1:05:50

When working with code and different packages. At this point, there’s a well defined process for changing API’s over time so that any developers are aware of breaking changes. How would you expect this to work with something like an LLM? For example, if I tell GPT, for the example we used before go along the Magnificent Seven and short the NASDAQ, is it possible that GPT five, would give me an entirely different solution?

Bin Ren 1:06:19

I think the way humans deal with changes in API’s is entirely Manu. Right? So it’s very painful when someone introduces a breaking change, meaning that there are some changing the signature, or the change in the input and output. That’s the only way to deal with it is to manually find every single encouragement the old API has been used and replace it with a new one, it’s very painful. But the thing about Rm is that they write code on the fly, right? So when you ask the same query with GPT, five, and when GBD five is provided with a different set of API’s to solve the problems, it will actually ingest, and synthesize and utilize the signatures of these new API’s on the fly, and write the code on the fly. Because everything is written from nothing from scratch. So there’s no menu, migration of old code to new code. I think the code generation abilities, especially the cross multiple programming languages, such as Python, JavaScript, and SQL is a huge breakthrough in terms of the applications of RM to our daily lives. What do you think the future looks like for quant researchers now that LLM is our here and going to stay? I think for cold researchers, currently, we spend so much time on operations, such as data related operations and so much time on implementing our ideas, but not enough time to focus on essentially having new ideas and stay up to date with the market, especially when the macro environment is changing so fast. I think as we see more advancements in the artificial intelligence, the gentleman AI, I think what’s going to happen is we are going to see this gradual, maybe even accelerating shift in terms of the nature of the knowledge workers to shift from finding the right answers and implementing the right solutions to actually just asking the right questions. So think about how research institute work if you are the research director of the Institute, and you are leading a team of a dozen researchers, my How does the division of labor works? The researchers are responsible for finding the answers, they are responsible for getting things done, they do it, they see the results, they give you the feedback, but as the director, your job is to make sure that you are asking the right questions, you are spending the resources both in terms of people and in terms of time, on the worst the questions and you are traveling down the right path. If we are going down the wrong path, we want to cut the short term back and go down a different direction. So in this sense, as we see the applications of large language models, especially when we augment the reasoning engines with data and to select API’s, we are essentially be able to create what we call autonomous AI agents. And these AI agents are like AI researchers. And then each researcher human researcher today essentially is elevated into the role of a traditional research director. So that’s how I see knowledge workers job will be elevated because each individual will be amplified, there should lead to a huge boost in terms of productivity, and specifically not being financed. When this happens, there will be more participants in the financial market because more people will have better access to the data, to the knowledge to the analytical firepower that today, only the most advanced institutions and people have access to this democratization will lead to more participants in the market, and hopefully will ultimately make the market more efficient. At the end of the day, financial market is the cornerstone of a working and functioning capitalist society.

Corey Hoffstein 1:10:46

Well, Ben, we find ourselves at the end of the episode here, but I do have one more question for you. And it’s the question I’m wrapping up every episode this season with and with a new season you actually I know, you just listened to some of the old season, this isn’t gonna be a new question for you. So we’re catching a little bit off guard. And I suspect I might know what your answer is here. But the question I’m asking this season is, what are you obsessed with or obsessed about today?

Bin Ren 1:11:15

Currently, I’m quite obsessed with the different types of reactions to AI, especially from people from different cultural backgrounds. So, for example, I’m Chinese, I grew up with Asian culture. And in Asia, people are far more receptive to the ongoing revelation of artificial intelligence, people tend to be more positive about the change that AI will usher in in the next few years. Whereas I think in the West, in the UK, for example, based on my personal experiences, wherever I talk about AI with my friends, and my neighbors, they tend to have a very negative or pessimistic attitude toward this, they all tend to think that oh my god, AI is going to replace everyone’s jobs is Terminator coming true, or is like a matrix the film coming true, it’s a very polar opposite to the people who are from Asian cultural background like Chinese or Japanese. I find it quite fascinating. Because now I realize that in a Japanese culture, the Japanese society is traditionally extremely conservative. It’s a bit xenophobic, it’s quite conservative. But it embraces anything like AI and robots so enthusiastically. If you watch like the Japanese TV is a Japanese manga, the robots in the couch in the media are always the good guys. They are friends of the mankind, they help mankind conquer evil, and solve problems and make the world a better place. Whereas in the West, they look at the Sci Fi the films, and robots are always the bad guys, you have the terminators, you have the matrix that number machines, when is smart enough to enslave the mankind, I find this quite fascinating and then realize maybe the one explanation for this contrast is that in Asian culture, there is a much more elements of collectivism, we grew up used to the idea that the collective like organization or the institution, are more important than any individuals, we’ve sort of just accepted as a way of life. And institutions are more robust, they last longer. And individuals simply play a certain role in this big machine. Whereas in the West, I feel, maybe the culture is more about individuals and more individualistic. There’s a lot of strong preference towards individualism. Therefore, the idea of this AI all singing or dancing machine, which ultimately, it’s the aggregation of the knowledge and intelligence of everyone in the world. And the idea that this supreme kind of ultimate form of the collective is superior to every single individual and may even render the individuals obsolete, is terrifying. So a serious cultural differences and directly leading to this contrasting attitude toward arguably the most important development in human history. And I feel like this attitude has a huge impact on the sort of receptions and support for the AI technologies, that this mean that Asia will win the race because in some sense by embracing more the AI whereas the Western world will lag a bit behind because they are more skeptical. They’re more pessimistic. This does have a very Not just like intellectual or philosophical musing. But I think this has a huge impact on the world in the next few years, given this pace at which things are developing. And given the fact that the race to AGI, the application of AGI, we have been shrinking our estimate from, you know, 100 years to 10 years to now, three years and even 18 months. I’m just paying a lot of tension. And certainly be extra sensitive to the adoption of this due to cultural differences and see how that plays out. In a world where we are already seeing more frequent eruptions of geopolitical uncertainties. So that’s something I’ve been obsessed about. I hope to have given a good answer.

Corey Hoffstein 1:15:50

Not the one I expected but one I certainly enjoyed and one that will make me take pause and think more about as well. Bin, this has been phenomenal. I can’t thank you enough for joining me,

Bin Ren 1:16:02

Cory, such a pleasure to be here and all the good luck with the new season podcast. Thank you