Trusted

New Study Reveals ChatGPT Is Getting Dumber

3 mins
Updated by Ali Martinez
Join our Trading Community on Telegram

In Brief

  • Stanford and UC Berkeley research reveals erratic performance in ChatGPT versions GPT-3.5 and GPT-4 over just three months.
  • While GPT-4's accuracy in identifying prime numbers dramatically decreased, GPT-3.5 showed significant improvement.
  • Despite rapid updates, newer AI models do not always outperform older models, emphasizing the need for continuous monitoring.
  • promo

Recent research has sparked an intriguing discussion about the proficiency of ChatGPT, especially versions GPT-3.5 and GPT-4. These two iterations have dominated the market as large language model services.

However, with a perplexing mix of performance highs and lows between March and June 2023, some wonder, “Is ChatGPT getting dumber?”

ChatGPT Updates Don’t Outpace Older Versions

Esteemed scholars from Stanford University and the University of California, Berkeley, scrutinized ChatGPT’s proficiency in various tasks. The focal point of this comprehensive evaluation was the dramatic inconsistency observed in its performance over a span of three months.

The incongruity does more than raise eyebrows; it underscores the nature of the technology and the imperative to monitor its quality consistently.

“Our findings show that the behavior of the “same” [large language model] LLM service can change substantially in a relatively short amount of time,” reads the report.

ChatGPT-4 vs ChatGPT-3.5 Performance
ChatGPT-4 vs ChatGPT-3.5 Performance. Source: arXiv

Diving into the specifics, GPT-4’s mathematical problem-solving skills presented a shocking drop in proficiency when identifying prime numbers.

Indeed, accuracy rates plummeted from a commendable 97.6% in March to an alarming 2.4% in June. In contrast, its predecessor, GPT-3.5, showcased a substantial improvement in the same timeframe, surging from 7.4% to 86.8%.

Read more: What are ChatGPT Plugins? Check Our Top 20 Picks

The stark contrasts confuse industry experts, as one would anticipate newer versions to outpace their predecessors. This raises concerns about how “updates” and “improvements” truly impact the AI’s capability.

Lack of Detailed Explanation and Code Generation

When probed on sensitive questions, the research depicted another intriguing angle. GPT-4 demonstrated a significant reduction in directly answering sensitive queries from March to June. This is indicative of a bolstered safety layer.

However, there was a noticeable truncation in its generated explanations when declining to answer. This prompted speculations about whether the model is erring on the side of caution to the detriment of user engagement and clarity.

ChatGPT-4 vs ChatGPT-3.5 Verbosity
ChatGPT-4 vs ChatGPT-3.5 Verbosity. Source: arXiv

Yet, it was not all gloom. The study pinpointed a crucial area where GPT-4, and to an extent GPT-3.5, manifested marginal improvements: visual reasoning. Although the overall success rates remained relatively low, there was evidence of evolution in their performance.

Read more: Best ChatGPT Alternatives You Can Use in 2023

What truly stands out is the unpredictability of this technology. GPT-4’s code generation proficiency exhibited a decline in producing directly executable code. This raises red flags for industries relying on these models, as inconsistencies can wreak havoc in larger software ecosystems.

Complacency Cannot Be Afforded

The key takeaway from this in-depth analysis is not the fluctuations in GPT-4 and GPT-3.5’s performance but the overarching lesson on the impermanence of AI efficiency.

With rapid technological advances, there is an implicit assumption that newer models will surpass their predecessors. This study challenges that very notion.

The message for businesses and developers heavily vested in ChatGPT is to monitor and evaluate these models regularly. As AI technology continues its march forward, the study is a stark reminder that advancements are not linear.

Read more: 21 Best ChatGPT Prompts to Explore in 2023

Companies Worldwide Using ChatGPT
Companies Worldwide Using ChatGPT. Source: Statista

The assumption that newer is invariably better might be an oversimplification, a notion that the tech community needs to address head-on. The erratic behavior of GPT-4 and GPT-3.5 within a matter of months magnifies the urgency to stay vigilant, assess, and recalibrate, ensuring that the technology serves its intended purpose with consistent proficiency.

🎄Best crypto platforms in Europe | December 2024
eToro eToro Explore
Coinrule Coinrule Explore
Uphold Uphold Explore
Coinbase Coinbase Explore
3Commas 3Commas Explore
🎄Best crypto platforms in Europe | December 2024
eToro eToro Explore
Coinrule Coinrule Explore
Uphold Uphold Explore
Coinbase Coinbase Explore
3Commas 3Commas Explore
🎄Best crypto platforms in Europe | December 2024

Disclaimer

In adherence to the Trust Project guidelines, BeInCrypto is committed to unbiased, transparent reporting. This news article aims to provide accurate, timely information. However, readers are advised to verify facts independently and consult with a professional before making any decisions based on this content. Please note that our Terms and ConditionsPrivacy Policy, and Disclaimers have been updated.

Frame-2466.jpg
Bary Rahma
Bary Rahma is a senior journalist at BeInCrypto, where she covers a broad spectrum of topics including crypto exchange-traded funds (ETFs), artificial intelligence (AI), tokenization of real-world assets (RWA), and the altcoin market. Prior to this, she was a content writer for Binance, producing in-depth research reports on cryptocurrency trends, market analysis, decentralized finance (DeFi), digital asset regulations, blockchain, initial coin offerings (ICOs), and tokenomics. Bary also...
READ FULL BIO
Sponsored
Sponsored