Alternative data has quickly become an overcrowded technology niche for financial services firms and large institutional investors looking for non-traditional sources of information to gain a competitive advantage.
Powered by multimodal AI technology, alternative data providers are scraping information from social media, satellite imagery, customer audio logs and structured text from documents and streaming it to customers in terabytes.
Using alternate dates has many benefits, but it comes with some significant problems. This includes ensuring data quality and securing usage rights for video and photo images that contain people, trademarked products or other proprietary information.
Despite this, the alternative data market is expected to grow to $3.2 billion this year and reach $13.9 billion by 2026, at a compound annual growth rate of 44%, according to Research and Markets.
Top alternative data providers reported include 1010data, Advan Research, Eagle Alpha, Preqin, RavenPack, Earnest Research, Thinknum, UBS Evidence Lab, YipitData, Dataminr, M Science, 7Park Data, Convergence, Geotab, JWN Energy and TalkingData, Research and Markets .
In this Q&A, Julia Valentine, a fintech expert and managing partner of AlphaMille, a New York City-based strategy and consulting firm specializing in alternative data, multimodal AI, and conversational AI, explains what “alternative data” and multimodal AI work why they are so popular.
Who Uses Alternate Dates?
Julia Valentine: That’s probably the easiest answer because most companies do it – certainly financial services firms and investment management firms. If you’re investing and you have an investment thesis, then this data will first help you formulate that thesis. And second, when you have an investment, you get a preview, or you can almost use it as a leading indicator.
Before a company reports financial results, you can do so [use alternative data to] get a very good idea of their sales, how much they sell or anything else about the company.
Julia ValentinManaging Partner, AlphaMille
In other words, before a company reports financial results, you can [use alternative data to] get a very good idea of their sales, how much they sell or anything else about the company. You can learn what is going on with this company or what its customers think about it.
How do you determine the trustworthiness of alternative data?
Valentine: You determine it through analysis. Once you start using this data, you need to create your analyzes and you will use the data in some sort of analytical model that you create. If it’s forward-looking, if it gives you actual insight, you can see that there’s a correlation between what you’re seeing through the satellites and the price of what the company is selling. Then when you see that there is value in it, do a statistical analysis. If it generates a valid prediction, keep using it.
There are also modeling risks – models need to be thoroughly trained and tested – and regulatory issues. Users of alternative data must ensure that the models are unbiased and non-discriminatory.
As a data scientist or citizen data scientist in a company, can you create your own alternative data stream?
Valentine: You can. And the tool that allows you to create it is the multimodal AI. With multimodal AI you are entering something very sophisticated, because now you are not only supplementing the financial data that everyone has with alternative data that not everyone may have but could potentially buy, you are also investing in and creating your own. You collect your own streams of data. It’s very powerful. It is used by sophisticated financial institutions as well as mainstream companies that want to better understand their customers and do it in real-time.
Multimodal AI doesn’t care whether it’s processing text, video, computer vision, photos, or audio. The world is not just text.
For example, you can transcribe audio when you look at a company’s complex logistics chains and parts of the chains are in different geographic locations, with different languages and companies involved in production. Sometimes your information relates to a chain on different continents, and companies working on part of the value chain are from abroad and could report in Chinese, Spanish or other languages. Multimodal AI has built-in multi-language recognition.
How long has multimodal AI been around and to what extent is it part of alternative data?
Valentine: It probably really took off in the last five years. Multimodal AI exists on its own. Out on the data market, you buy that legacy data from a legacy data provider, while using multimodal AI essentially means you are the creator of that data.
If you buy alternative data, your alternative data provider most likely used multimodal AI somewhere in the process from the end-user perspective. If the alternative data provider is selling you credit card data, chances are they just bought it from credit card companies. But if they wanted to go to social media and supplement that data in some way, they could use a number of tools to do so. And one of those tools is multimodal AI.
How are AI and machine learning actually being used to provide alternative data?
Valentine: Anytime you work with data, you can use machine learning. Using machine learning and data, you can create an ontology. If there’s a lot of data and you don’t know what all the different categories for that data might be, instead of saying, “Here’s credit card data,” just use demographic data, or sort them all by male, female, or age group. This is an ontology – a way of grouping things. Or you can feed it into a machine learning model and let it suggest its own ontology.
We can use the model to look at any pattern that we can’t see ourselves because it’s overwhelming. It’s millions of records. Suddenly, [the ML model] can come up with something really insightful and cool because it recognizes the patterns and builds the ontology that we might not even have thought of.
Here is an example. Banks have call centers. Customers keep calling the call centers. Some of them have a quick problem to solve and that’s it. Others call with all sorts of complaints. Everything runs on a recorded line because the banks want to learn from it.
For our banking customers, we use multimodal AI to hear all of these anonymized conversations. Then there is an ontology for all these phone calls. If you listen to enough of it, eventually you’ll end up with something really cool. First of all, you will end up with a list of the biggest issues that annoy the customers of this bank to death. And secondly, here is your list of the most urgent product improvements. This gives you the opportunity to reduce customer churn.
Editor’s note: This interview has been edited for clarity and conciseness.