Interest in artificial intelligence (AI) is rocketing. And, as business editors at the New York Times pointed out recently, it’s not just tech companies that are talking about AI. Large language models (LLMs) fed with industry-specific data provide incredible search powers for companies looking to move ahead of their competitors – including new smart summary features.
Imagine a vast library where you can automatically retrieve not just the book you’re looking for, but the exact phrase, together with other supporting facts and figures. And that’s just the tip of the enterprise AI iceberg. Generative AI tools give companies the edge by digesting mind-blowing amounts of data and distilling all of that market intelligence into a smart summary that’s both insightful and time-saving.
Information is gold in investment circles. And a rising star in providing market analysis is AlphaSense. The US-headquartered firm, which has offices in London, Germany, Finland, and India – delivers insights from what it describes as ‘an extensive universe of public and private content—including company filings, event transcripts, news, trade journals, and equity research’.
For example, by analyzing data from more than 9000 publicly listed firms, which regularly host investor calls, AlphaSense determined that AI was mentioned twice as frequently in the first quarter of 2023 compared with the last quarter of 2022. And its enterprise AI tooling is helping the market intelligence provider go head-to-head with business analysis heavyweights such as Bloomberg.
In fact, it’s telling that Bloomberg has just announced BloombergGPT – a custom LLM that benefits from a 700 billion token corpus of curated financial data. The training data is equivalent to hundreds of millions of pages of text and Google’s Bard notes that a dataset of 700 billion tokens would be ‘a very valuable dataset for training LLMs’.
BloombergGPT’s training dataset –dubbed FinPile – consists of a range of English financial documents including news, filings, press releases, web-scraped financial documents, and social media drawn from the Bloomberg archives.
Company filings – data that AlphaSense and other analysis providers also mine for market insight – represent 14 billion tokens (or around 4 billion words, assuming that 3-4 tokens are used to represent each word) in BloombergGPT. And it’s worth noting that financial statements prepared by public companies, such as annual 10-K filings or quarterly 10-Q reports, are long PDFs that provide rich pickings for smart summary generators, as we’ll highlight shortly.
General LLMs – for example, OpenAI’s GPT-4, Google’s PaLM, and the open-source Falcon-40B are trained on data scraped from the web. And while they do include technical content from scientific research repositories and the US Patent and Trademark Office (USPTO), they haven’t been built to be domain-specific.
Falcon’s LLM team, based at the Technology Innovation Institute in the UAE, reports that filtering and deduplicating web data at very large scale – a pipeline that it dubs MacroData refinement – can produce LLMs capable of outperforming versions trained on curated corpora. But the power of having an LLM trained using domain-specific data can be seen by viewing the test results of BloombergGPT.
In four out of five tests, the domain-specific LLM came out on top. And on the one occasion when it was ranked second, the performance advantage of the winning LLM (the open-source GPT-NeoX, developed by ElutherAI) was slight. Training generative AI models on a refined diet of industry-specific data opens the door to superior smart summary performance.
In June, AlphaSense introduced AI-generated summarizations of key events in earnings calls to dramatically speed up the workflow for fund managers and other analysts keeping an eye on company performance.
The finance sector has long used AI and machine learning to try and spot patterns that would otherwise remain hidden. Natural language processing has been deployed for years to perform sentiment analysis on CEO statements, and other company reports, to determine how optimistic firms are about their future.
But generative AI straps a huge word engine onto the process and races ahead of what’s been done before. For example, tools don’t just put a number on the positives and the negatives, they can bullet the smart summary into columns depending on whether it represents good news or bad.
Table of Contents
Putting smart summaries to the test using Google Bard.
Having said that generative AI algorithms trained with domain-specific market data have the edge on general LLMs, readily available tools are still impressive.
Asking Google Bard for its thoughts on the sentiment (including a request to add emojis for extra sparkle – see image below) of the article that you are reading now, produces the following output –
- Overall, the sentiment of the article is positive. The author is excited about the potential of LLMs to revolutionize the way financial data is analyzed.
- There are a few instances of negative sentiment, such as when the author mentions that general LLMs are not as good as domain-specific LLMs at generating smart summaries.
Security guardrails and air-gapped LLMs
However, before you rush to ask LLMs for a smart summary, make sure you understand where the data you submit goes. Many companies have restricted employees from using generative AI tools, such as OpenAI’s publicly available ChatGPT service, to ring-fence their business data.
Recognizing that companies may want to guard their business operations closely, developers such as Yurts.ai – based in San Francisco, US – are offering air-gapped LLMs to provide clients with maximum security.
“We have seen an explosion of interest in generative AI for enterprise use, but most C-suites have genuine and rightful concerns about security and privacy,” said Ben Van Roo, CEO and Co-founder of Yurts.ai. “Our platform can be embedded within an enterprise and give companies private and secure access to generative AI-based assistants for writing, chat, and search.”
There are other options too. For example, on TechHQ we’ve written about how machine learning can work on data sets in a highly secure sandbox thanks to solutions such as BlindAI cloud.
Benefits beyond finance
The ability to generate a smart summary of vast amounts of data, automatically, in seconds, benefits not just the financial sector, but organizations of all kinds. Governments are taking a keen interest in measuring happiness to better allocate funding – rather than relying solely on conventional indicators that may not tell the whole story.
Back in 2020, before the current boom in LLMs, researchers showed that AI could be useful in understanding what makes us angry, happy, or sad – as reported by the World Economic Forum. And this is just one example of how valuable smart summaries could turn out to be, not just to firms, but more broadly.