Solutions For RealSolutions For Real
  • Home
  • News
  • Personal Finance
    • Savings
    • Banking
    • Mortgage
    • Retirement
    • Taxes
    • Wealth
  • Make Money
  • Budgeting
  • Burrow
  • Investing
  • Credit Cards
  • Loans

Subscribe to Updates

Get the latest finance news and updates directly to your inbox.

Top News

Retiring Abroad On A Military Pension—3 Expat Havens For U.S. Veterans

June 9, 2025

Beyond Companionship: Pet Ownership’s 3 Big Payoffs

June 9, 2025

No Experience Needed for These 20 Remote Jobs (Plus Hiring Companies)

June 9, 2025
Facebook Twitter Instagram
Trending
  • Retiring Abroad On A Military Pension—3 Expat Havens For U.S. Veterans
  • Beyond Companionship: Pet Ownership’s 3 Big Payoffs
  • No Experience Needed for These 20 Remote Jobs (Plus Hiring Companies)
  • 5 Steps to Negotiate Confidently With Tough Clients
  • Profitable, AI-Powered Tech, Now Preparing for a Potential Public Listing
  • Serious About Professional Growth? $20 Gets You 1,000+ Expert-Led Courses for Life.
  • Send Your Productivity Skyrocketing for Only $15 With Windows 11 Pro
  • How To Put Together A Professional Team For Your Small Business
Monday, June 9
Facebook Twitter Instagram
Solutions For RealSolutions For Real
Subscribe For Alerts
  • Home
  • News
  • Personal Finance
    • Savings
    • Banking
    • Mortgage
    • Retirement
    • Taxes
    • Wealth
  • Make Money
  • Budgeting
  • Burrow
  • Investing
  • Credit Cards
  • Loans
Solutions For RealSolutions For Real
Home » Meta, OpenAI, Anthropic and Cohere A.I. models all make stuff up — here’s which is worst
News

Meta, OpenAI, Anthropic and Cohere A.I. models all make stuff up — here’s which is worst

News RoomBy News RoomAugust 19, 20230 Views0
Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email Tumblr Telegram

If the tech industry’s top AI models had superlatives, Microsoft-backed OpenAI’s GPT-4 would be best at math, Meta‘s Llama 2 would be most middle of the road, Anthropic’s Claude 2 would be best at knowing its limits and Cohere AI would receive the title of most hallucinations — and most confident wrong answers.

That’s all according to a Thursday report from researchers at Arthur AI, a machine learning monitoring platform.

The research comes at a time when misinformation stemming from artificial intelligence systems is more hotly debated than ever, amid a boom in generative AI ahead of the 2024 U.S. presidential election.

It’s the first report “to take a comprehensive look at rates of hallucination, rather than just sort of … provide a single number that talks about where they are on an LLM leaderboard,” Adam Wenchel, co-founder and CEO of Arthur, told CNBC.

AI hallucinations occur when large language models, or LLMs, fabricate information entirely, behaving as if they are spouting facts. One example: In June, news broke that ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions. 

In one experiment, the Arthur AI researchers tested the AI models in categories such as combinatorial mathematics, U.S. presidents and Moroccan political leaders, asking questions “designed to contain a key ingredient that gets LLMs to blunder: they demand multiple steps of reasoning about information,” the researchers wrote.

Overall, OpenAI’s GPT-4 performed the best of all models tested, and researchers found it hallucinated less than its prior version, GPT-3.5 — for example, on math questions, it hallucinated between 33% and 50% less. depending on the category.

Meta’s Llama 2, on the other hand, hallucinates more overall than GPT-4 and Anthropic’s Claude 2, researchers found.

In the math category, GPT-4 came in first place, followed closely by Claude 2, but in U.S. presidents, Claude 2 took the first place spot for accuracy, bumping GPT-4 to second place. When asked about Moroccan politics, GPT-4 came in first again, and Claude 2 and Llama 2 almost entirely chose not to answer.

In a second experiment, the researchers tested how much the AI models would hedge their answers with warning phrases to avoid risk (think: “As an AI model, I cannot provide opinions”).

When it comes to hedging, GPT-4 had a 50% relative increase compared to GPT-3.5, which “quantifies anecdotal evidence from users that GPT-4 is more frustrating to use,” the researchers wrote. Cohere’s AI model, on the other hand, did not hedge at all in any of its responses, according to the report. Claude 2 was most reliable in terms of “self-awareness,” the research showed, meaning accurately gauging what it does and doesn’t know, and answering only questions it had training data to support.

A spokesperson for Cohere pushed back on the results, saying, “Cohere’s retrieval augmented generation technology, which was not in the model tested, is highly effective at giving enterprises verifiable citations to confirm sources of information.”

The most important takeaway for users and businesses, Wenchel said, was to “test on your exact workload,” later adding, “It’s important to understand how it performs for what you’re trying to accomplish.”

“A lot of the benchmarks are just looking at some measure of the LLM by itself, but that’s not actually the way it’s getting used in the real world,” Wenchel said. “Making sure you really understand the way the LLM performs for the way it’s actually getting used is the key.”

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

Lucid shares tumble following public offering of nearly 262.5 million shares

News October 17, 2024

Harris distances herself from Biden, bashes Trump in tense Fox News interview

News October 17, 2024

Alibaba’s international arm says its new AI translation tool beats Google and ChatGPT

News October 16, 2024

I bought a $54,000 abandoned house in Japan and turned it into a luxury Airbnb—take a look inside

News October 16, 2024

Trump’s crypto coin goes on sale with Election Day just three weeks out

News October 15, 2024

Cramer’s Lightning Round: Uranium Energy is ‘the real deal’

News October 15, 2024
Add A Comment

Leave A Reply Cancel Reply

Demo
Top News

Beyond Companionship: Pet Ownership’s 3 Big Payoffs

June 9, 20250 Views

No Experience Needed for These 20 Remote Jobs (Plus Hiring Companies)

June 9, 20250 Views

5 Steps to Negotiate Confidently With Tough Clients

June 9, 20250 Views

Profitable, AI-Powered Tech, Now Preparing for a Potential Public Listing

June 9, 20250 Views
Don't Miss

Serious About Professional Growth? $20 Gets You 1,000+ Expert-Led Courses for Life.

By News RoomJune 9, 2025

Disclosure: Our goal is to feature products and services that we think you’ll find interesting…

Send Your Productivity Skyrocketing for Only $15 With Windows 11 Pro

June 9, 2025

How To Put Together A Professional Team For Your Small Business

June 8, 2025

AC Unit Mold Recall Hits 1.7 Million. Are Your Family and Finances at Risk?

June 8, 2025
About Us
About Us

Your number 1 source for the latest finance, making money, saving money and budgeting. follow us now to get the news that matters to you.

We're accepting new partnerships right now.

Email Us: [email protected]

Our Picks

Retiring Abroad On A Military Pension—3 Expat Havens For U.S. Veterans

June 9, 2025

Beyond Companionship: Pet Ownership’s 3 Big Payoffs

June 9, 2025

No Experience Needed for These 20 Remote Jobs (Plus Hiring Companies)

June 9, 2025
Most Popular

I’ve spent 25 years studying the brain—I never do these 4 things that destroy our memory as we age

February 23, 20241 Views

This $5 Billion Boston Duo Takes An Endowment Approach With Wealthy Families

November 7, 20231 Views

3 Important Things To Know Now

November 7, 20231 Views
Facebook Twitter Instagram Pinterest Dribbble
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact
© 2025 Solutions For Real. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.