Share

New Study Reveals ChatGPT Is Getting Dumber

Prefer us on Google

Written by

Bary Rahma

Edited by

Ali Martinez

20 July 2023, 16:30 UTC

•

Updated 21 July 2023, 09:22 UTC

Stanford and UC Berkeley research reveals erratic performance in ChatGPT versions GPT-3.5 and GPT-4 over just three months.
While GPT-4's accuracy in identifying prime numbers dramatically decreased, GPT-3.5 showed significant improvement.
Despite rapid updates, newer AI models do not always outperform older models, emphasizing the need for continuous monitoring.

#AI News

Recent research has sparked an intriguing discussion about the proficiency of ChatGPT, especially versions GPT-3.5 and GPT-4. These two iterations have dominated the market as large language model services.

However, with a perplexing mix of performance highs and lows between March and June 2023, some wonder, “Is ChatGPT getting dumber?”

Sponsored

ChatGPT Updates Don’t Outpace Older Versions

Esteemed scholars from Stanford University and the University of California, Berkeley, scrutinized ChatGPT’s proficiency in various tasks. The focal point of this comprehensive evaluation was the dramatic inconsistency observed in its performance over a span of three months.

The incongruity does more than raise eyebrows; it underscores the nature of the technology and the imperative to monitor its quality consistently.

“Our findings show that the behavior of the “same” [large language model] LLM service can change substantially in a relatively short amount of time,” reads the report.

ChatGPT-4 vs ChatGPT-3.5 Performance. Source: arXiv

Diving into the specifics, GPT-4’s mathematical problem-solving skills presented a shocking drop in proficiency when identifying prime numbers.

Indeed, accuracy rates plummeted from a commendable 97.6% in March to an alarming 2.4% in June. In contrast, its predecessor, GPT-3.5, showcased a substantial improvement in the same timeframe, surging from 7.4% to 86.8%.

Sponsored

The stark contrasts confuse industry experts, as one would anticipate newer versions to outpace their predecessors. This raises concerns about how “updates” and “improvements” truly impact the AI’s capability.

Lack of Detailed Explanation and Code Generation

When probed on sensitive questions, the research depicted another intriguing angle. GPT-4 demonstrated a significant reduction in directly answering sensitive queries from March to June. This is indicative of a bolstered safety layer.

However, there was a noticeable truncation in its generated explanations when declining to answer. This prompted speculations about whether the model is erring on the side of caution to the detriment of user engagement and clarity.

ChatGPT-4 vs ChatGPT-3.5 Verbosity. Source: arXiv

Yet, it was not all gloom. The study pinpointed a crucial area where GPT-4, and to an extent GPT-3.5, manifested marginal improvements: visual reasoning. Although the overall success rates remained relatively low, there was evidence of evolution in their performance.

What truly stands out is the unpredictability of this technology. GPT-4’s code generation proficiency exhibited a decline in producing directly executable code. This raises red flags for industries relying on these models, as inconsistencies can wreak havoc in larger software ecosystems.

Complacency Cannot Be Afforded

The key takeaway from this in-depth analysis is not the fluctuations in GPT-4 and GPT-3.5’s performance but the overarching lesson on the impermanence of AI efficiency.

With rapid technological advances, there is an implicit assumption that newer models will surpass their predecessors. This study challenges that very notion.

The message for businesses and developers heavily vested in ChatGPT is to monitor and evaluate these models regularly. As AI technology continues its march forward, the study is a stark reminder that advancements are not linear.

Companies Worldwide Using ChatGPT. Source: Statista

The assumption that newer is invariably better might be an oversimplification, a notion that the tech community needs to address head-on. The erratic behavior of GPT-4 and GPT-3.5 within a matter of months magnifies the urgency to stay vigilant, assess, and recalibrate, ensuring that the technology serves its intended purpose with consistent proficiency.

To read the latest cryptocurrency market analysis from BeInCrypto, click here .

Disclaimer

BeInCrypto is committed to unbiased, transparent reporting. This news article aims to provide accurate, timely information. However, readers are advised to verify facts independently and consult with a professional before making any decisions based on this content. Please note that our Terms and Conditions, Privacy Policy, and Disclaimers have been updated.

Sponsored