GPT and other AI models can’t analyze an SEC filing, researchers find

Patronus AI co-founders Anand Kannappan and Rebecca Qian

Patronus AI

Large language models, similar to the one at the heart of ChatGPT, frequently fail to answer questions derived from Securities and Exchange Commission filings, researchers from a startup called Patronus AI found.

Even the best-performing artificial intelligence model configuration they tested, OpenAI’s GPT-4-Turbo, when armed with the ability to read nearly an entire filing alongside the question, only got 79% of answers right on Patronus AI’s new test, the company’s founders told CNBC.

Oftentimes, the so-called large language models would refuse to answer, or would “hallucinate” figures and facts that weren’t in the SEC filings.

“That type of performance rate is just absolutely unacceptable,” Patronus AI co-founder Anand Kannappan said. “It has to be much much higher for it to really work in an automated and production-ready way.”

The findings highlight some of the challenges facing AI models as big companies, especially in regulated industries like finance, seek to incorporate cutting-edge technology into their operations, whether for customer service or research.

The ability to extract important numbers quickly and perform analysis on financial narratives has been seen as one of the most promising applications for chatbots since ChatGPT was released late last year. SEC filings are filled with important data, and if a bot could accurately summarize them or quickly answer questions about what’s in them, it could give the user a leg up in the competitive financial industry.

In the past year, Bloomberg LP developed its own AI model for financial data, business school professors researched whether ChatGPT can parse financial headlines, and JPMorgan is working on an AI-powered automated investing tool, CNBC previously reported. Generative AI could boost the banking industry by trillions of dollars per year, a recent McKinsey forecast said.

But GPT’s entry into the industry hasn’t been smooth. When Microsoft first launched its Bing Chat using OpenAI’s GPT, one of its primary examples was using the chatbot to quickly summarize an earnings press release. Observers quickly realized that the numbers in Microsoft’s example were off, and some numbers were entirely made up.

‘Vibe checks’

Part of the challenge when incorporating LLMs into actual products, say the Patronus AI co-founders, is that LLMs are nondeterministic — they’re not guaranteed to produce the same output every time for the same input. That means that companies will need to do more rigorous testing to make sure they’re operating correctly, not going off-topic, and providing reliable results.

The founders met at Facebook parent company Meta, where they worked on AI problems related to understanding how models come up with their answers and making them more “responsible.” They founded Patronus AI, which has received seed funding from Lightspeed Venture Partners, to automate LLM testing with software, so companies can feel comfortable that their AI bots won’t surprise customers or workers with off-topic or wrong answers.

“Right now evaluation is largely manual. It feels like just testing by inspection,” Patronus AI co-founder Rebecca Qian said. “One company told us it was ‘vibe checks.'”

Patronus AI worked to write a set of more than 10,000 questions and answers drawn from SEC filings from major publicly traded companies, which it calls FinanceBench. The dataset includes the correct answers, and also where exactly in any given filing to find them. Not all of the answers can be pulled directly from the text, and some questions require light math or reasoning.

Qian and Kannappan say it’s a test that gives a “minimum performance standard” for language AI in the financial sector.

Here’s some examples of questions in the dataset, provided by Patronus AI:

  • Has CVS Health paid dividends to common shareholders in Q2 of FY2022?
  • Did AMD report customer concentration in FY22?
  • What is Coca Cola’s FY2021 COGS % margin? Calculate what was asked by utilizing the line items clearly shown in the income statement.

How the AI models did on the test

Patronus AI tested four language models: OpenAI’s GPT-4 and GPT-4-Turbo, Anthropic’s Claude 2 and Meta’s Llama 2, using a subset of 150 of the questions it had produced.

It also tested different configurations and prompts, such as one setting where the OpenAI models were given the exact relevant source text in the question, which it called “Oracle” mode. In other tests, the models were told where the underlying SEC documents would be stored, or given “long context,” which meant including nearly an entire SEC filing alongside the question in the prompt.

GPT-4-Turbo failed at the startup’s “closed book” test, where it wasn’t given access to any SEC source document. It failed to answer 88% of the 150 questions it was asked, and only produced a correct answer 14 times.

It was able to improve significantly when given access to the underlying filings. In “Oracle” mode, where it was pointed to the exact text for the answer, GPT-4-Turbo answered the question correctly 85% of the time, but still produced an incorrect answer 15% of the time.

But that’s an unrealistic test because it requires human input to find the exact pertinent place in the filing — the exact task that many hope that language models can address.

Llama 2, an open-source AI model developed by Meta, had some of the worst “hallucinations,” producing wrong answers as much as 70% of the time, and correct answers only 19% of the time, when given access to an array of underlying documents.

Anthropic’s Claude 2 performed well when given “long context,” where nearly the entire relevant SEC filing was included along with the question. It could answer 75% of the questions it was posed, gave the wrong answer for 21%, and failed to answer only 3%. GPT-4-Turbo also did well with long context, answering 79% of the questions correctly, and giving the wrong answer for 17% of them.

After running the tests, the co-founders were surprised about how poorly the models did — even when they were pointed to where the answers were.

“One surprising thing was just how often models refused to answer,” said Qian. “The refusal rate is really high, even when the answer is within the context and a human would be able to answer it.”

Even when the models performed well, though, they just weren’t good enough, Patronus AI found.

“There just is no margin for error that’s acceptable, because, especially in regulated industries, even if the model gets the answer wrong 1 out of 20 times, that’s still not high enough accuracy,” Qian said.

But the Patronus AI co-founders believe there’s huge potential for language models like GPT to help people in the finance industry — whether that’s analysts, or investors — if AI continues to improve.

“We definitely think that the results can be pretty promising,” said Kannappan. “Models will continue to get better over time. We’re very hopeful that in the long term, a lot of this can be automated. But today, you will definitely need to have at least a human in the loop to help support and guide whatever workflow you have.”

An OpenAI representative pointed to the company’s usage guidelines, which prohibit offering tailored financial advice using an OpenAI model without a qualified person reviewing the information, and require anyone using an OpenAI model in the financial industry to provide a disclaimer informing them that AI is being used and its limitations. OpenAI’s usage policies also say that OpenAI’s models are not fine-tuned to provide financial advice.

Meta did not immediately return a request for comment, and Anthropic didn’t immediately have a comment.

Don’t miss these stories from CNBC PRO:

Source link

#GPT #models #analyze #SEC #filing #researchers #find

AWS digital sovereignty pledge: A new, independent sovereign cloud in Europe

From day one, Amazon Web Services (AWS) has believed it is essential that customers have control over their data, and choices for how they secure and manage that data in the cloud. Last year, we introduced the AWS Digital Sovereignty Pledge, our commitment to offering AWS customers the most advanced set of sovereignty controls and features available in the cloud.

AWS offers the largest and most comprehensive cloud infrastructure globally. Our approach from the beginning has been to make AWS sovereign-by-design. We built data protection features and controls in the AWS cloud with input from financial services, health care and government customers — who are among the most security- and data privacy-conscious organizations in the world. This has led to innovations like the AWS Nitro System, which powers all our modern Amazon Elastic Compute Cloud (Amazon EC2) instances and provides a strong physical and logical security boundary to enforce access restrictions so that nobody, including AWS employees, can access customer data running in Amazon EC2. The security design of the Nitro System has also been independently validated by the NCC Group in a public report.

With AWS, customers have always had control over the location of their data. In Europe, customers who need to comply with European data residency requirements have the choice to deploy their data to any of our eight existing AWS Regions (Ireland, Frankfurt, London, Paris, Stockholm, Milan, Zurich and Spain) to keep their data securely in Europe. To run their sensitive workloads, European customers can leverage the broadest and deepest portfolio of services, including AI, analytics, compute, database, internet of things, machine learning, mobile services and storage. To further support customers, we’ve innovated to offer more control and choice over their data. For example, we announced further transparency and assurances, and new dedicated infrastructure options with AWS ‘Dedicated Local Zones’.

To deliver enhanced operational resilience within the EU, only EU residents who are located in the EU will have control of the operations and support.

Announcing the AWS European Sovereign Cloud

When we speak to public-sector and regulated-industry customers in Europe, they share how they are facing incredible complexity with an evolving sovereignty landscape. Customers tell us they want to adopt the cloud, but are facing increasing regulatory scrutiny over data location, European operational autonomy and resilience. We’ve learned that these customers are concerned that they will have to choose between the full power of AWS or feature-limited sovereign cloud solutions. We’ve had deep engagements with European regulators, national cybersecurity authorities, and customers to understand how the sovereignty needs of customers can vary based on multiple factors, like location, sensitivity of workloads, and industry. We recently announced our plans to launch the AWS European Sovereign Cloud, a new, independent cloud for Europe, designed to help public sector organizations and customers in highly-regulated industries meet their evolving sovereignty needs. We’re designing the AWS European Sovereign Cloud to be separate and independent from our existing ‘regions’, with infrastructure located wholly within the European Union, with the same security, availability and performance our customers get from existing regions today. To deliver enhanced operational resilience within the EU, only EU residents who are located in the EU will have control of the operations and support for the AWS European Sovereign Cloud. The AWS European Sovereign Cloud will launch its first AWS Region in Germany available to all European customers.

Built on more than a decade of experience operating multiple independent clouds for the most critical and restricted workloads.

The AWS European Sovereign Cloud will be sovereign-by-design, and will be built on more than a decade of experience operating multiple independent clouds for the most critical and restricted workloads. Like existing regions, the AWS European Sovereign Cloud will be built for high availability and resiliency, and powered by the AWS Nitro System, to help ensure the confidentiality and integrity of customer data. Customers will have the control and assurance that AWS will not access or use customer data for any purpose without their agreement. AWS gives customers the strongest sovereignty controls among leading cloud providers. For customers with enhanced data residency needs, the AWS European Sovereign cloud is designed to go further and will allow customers to keep all metadata they create (such as the roles, permissions, resource labels and configurations they use to run AWS) in the EU. The AWS European Sovereign Cloud will also be built with separate, in-region billing and usage metering systems.

Delivering operational autonomy

The AWS European Sovereign Cloud will provide customers with the capability to meet stringent operational autonomy and data residency requirements. To deliver enhanced data residency and operational resilience within the EU, the AWS European Sovereign Cloud infrastructure will be operated independently from existing AWS Regions. To assure independent operation of the AWS European Sovereign Cloud, only personnel who are EU residents, located in the EU, will have control of day-to-day operations, including access to data centers, technical support and customer service.

Control without compromise

Though separate, the AWS European Sovereign Cloud will offer the same industry-leading architecture built for security and availability as other AWS Regions. This will include multiple ‘Availability Zones’, infrastructure that is placed in separate and distinct geographic locations, with enough distance to significantly reduce the risk of a single event impacting customers’ business continuity.

Continued AWS investment in Europe

The AWS European Sovereign Cloud represents continued AWS investment in Europe. AWS is committed to innovating to support European values and Europe’s digital future. We drive economic development through investing in infrastructure, jobs and skills in communities and countries across Europe. We are creating thousands of high-quality jobs and investing billions of euros in European economies. Amazon has created more than 100,000 permanent jobs across the EU. Some of our largest AWS development teams are located in Europe, with key centers in Dublin, Dresden and Berlin. As part of our continued commitment to contribute to the development of digital skills, we will hire and develop additional local personnel to operate and support the AWS European Sovereign Cloud.

Our commitments to our customers

We remain committed to giving our customers control and choices to help meet their evolving digital sovereignty needs. We continue to innovate sovereignty features, controls and assurances globally with AWS, without compromising on the full power of AWS.



Source link

#AWS #digital #sovereignty #pledge #independent #sovereign #cloud #Europe

It’s time to hang up on the old telecoms rulebook

Joakim Reiter | via Vodafone

Around 120 years ago, Guglielmo Marconi planted the seeds of a communications revolution, sending the first message via a wireless link over open water. “Are you ready? Can you hear me?”, he said. Now, the telecommunications industry in Europe needs policymakers to heed that call, to realize the vision set by its 19th-century pioneers.

Next-generation telecommunications are catalyzing a transformation on par with the industrial revolution. Mobile networks are becoming programmable platforms — supercomputers that will fundamentally underpin European industrial productivity, growth and competitiveness. Combined with cloud, AI and the internet of things, the era of industrial internet will transform our economy and way of life, bringing smarter cities, energy grids and health care, as well as autonomous transport systems, factories and more to the real world.

5G is already connecting smarter, autonomous factory technologies | via Vodafone

Europe should be at the center of this revolution, just as it was in the early days of modern communications.

Next-generation telecommunications are catalyzing a transformation on par with the industrial revolution.

Even without looking at future applications, the benefits of a healthy telecoms industry for society are clear to see. Mobile technologies and services generated 5 percent of global GDP, equivalent to €4.3 trillion, in 2021. More than five billion people around the world are connected to mobile services — more people today have access to mobile communications than they do to safely-managed sanitation services. And with the combination of satellite solutions, the prospect of ensuring every person on the planet is connected may soon be within reach.

Satellite solutions, combined with mobile communications, could eliminate coverage gaps | via Vodafone

In our recent past, when COVID-19 spread across the world and societies went into lockdown, connectivity became critical for people to work from home, and for enabling schools and hospitals to offer services online.  And with Russia’s invasion of Ukraine, when millions were forced to flee the safety of their homes, European network operators provided heavily discounted roaming and calling to ensure refugees stayed connected with loved ones.

A perfect storm of rising investment costs, inflationary pressures, interest rate hikes and intensifying competition from adjacent industries is bearing down on telecoms businesses across Europe.

These are all outcomes and opportunities, depending on the continuous investment of telecoms’ private companies.

And yet, a perfect storm of rising investment costs, inflationary pressures, interest rate hikes and intensifying competition from adjacent industries is bearing down on telecoms businesses across Europe. The war on our continent triggered a 15-fold increase in wholesale energy prices and rapid inflation. EU telecoms operators have been under pressure ever since to keep consumer prices low during a cost-of-living crisis, while confronting rapidly growing operational costs as a result. At the same time, operators also face the threat of billions of euros of extra, unforeseen costs as governments change their operating requirements in light of growing geopolitical concerns.

Telecoms operators may be resilient. But they are not invincible.

The odds are dangerously stacked against the long-term sustainability of our industry and, as a result, Europe’s own digital ambitions. Telecoms operators may be resilient. But they are not invincible.

The signs of Europe’s decline are obvious for those willing to take a closer look. European countries are lagging behind in 5G mobile connectivity, while other parts of the world — including Thailand, India and the Philippines — race ahead. Independent research by OpenSignal shows that mobile users in South Korea have an active 5G connection three times more often than those in Germany, and more than 10 times their counterparts in Belgium.

Europe needs a joined-up regulatory, policy and investment approach that restores the failing investment climate and puts the telecoms sector back to stable footing.

Average 5G connectivity in Brazil is more than three times faster than in Czechia or Poland. A recent report from the European Commission — State of the Digital Decade (europa.eu) shows just how far Europe needs to go to reach the EU’s connectivity targets for 2030.

To arrest this decline, and successfully meet EU’s digital ambitions, something has got to give. Europe needs a joined-up regulatory, policy and investment approach that restores the failing investment climate and puts the telecoms sector back to stable footing.

Competition, innovation and efficient investment are the driving forces for the telecoms sector today. It’s time to unleash these powers — not blindly perpetuate old rules. We agree with Commissioner Breton’s recent assessment: Europe needs to redefine the DNA of its telecoms regulation. It needs a new rulebook that encourages innovation and investment, and embraces the logic of a true single market. It must reduce barriers to growth and scale in the sector and ensure spectrum — the lifeblood of our industry — is managed more efficiently. And it must find faster, futureproofed ways to level the playing field for all business operating in the wider digital sector.  

But Europe is already behind, and we are running out of time. It is critical that the EU finds a balance between urgent, short-term measures and longer-term reforms. It cannot wait until 2025 to implement change.

Europeans deserve better communications technology | via Vodafone

When Marconi sent that message back in 1897, the answer to his question was, “loud and clear”. As Europe’s telecoms ministers convene this month in León, Spain, their message must be loud and clear too. European citizens and businesses deserve better communications. They deserve a telecoms rulebook that ensures networks can deliver the next revolution in digital connectivity and services.



Source link

#time #hang #telecoms #rulebook