ChatGPT, the game-changing AI chatbot that has been helping millions of students worldwide cheat on their assignments, seems to be getting dumber with time – or at least that’s what the researchers from top universities in California believe.
Just months after The Washington Post noted a significant drop in the chatbot’s user base, a new research paper about GPT-4, which powers ChatGPT Plus, has made a rather interesting revelation: The AI chatbot is gradually losing its smarts.
In the yet-to-be-peer-reviewed study titled “How Is ChatGPT’s Behavior Changing over Time?” the researchers from Stanford University and the University of California, Berkeley analyzed the responses of ChatGPT 3.5 and ChatGPT 4 and observed a decline in the performance of the latest versions of the software.
“We find that the performance and behavior of both GPT-3.5 and GPT-4 vary significantly across these two releases and that their performance on some tasks has gotten substantially worse over time,” the researchers stated.
It is also worth mentioning that ChatGPT Plus costs costs $20 per month.
How often does OpenAI update ChatGPT?
For those unaware, OpenAI – the company behind the widely used AI chatbot – continually works on improving its AI models and systems. The improvement typically involves refining the underlying algorithms, training on more recent and extensive datasets, and addressing various issues and limitations.
However, as the study pointed out, large language models (LLM) such as ChatGPT 3.5 and ChatGPT 4 can be updated based on user data and feedback as well as design changes, but it’s unclear when these chatbots are updated and how the updates affect their performance.
“These unknowns make it challenging to stably integrate LLMs into larger workflows: if LLM’s response to a prompt (e.g. its accuracy or formatting) suddenly changes, this might break the downstream pipeline. It also makes it challenging, if not impossible, to reproduce results from the ‘same’ LLM,” the researchers added.
Read more: Friend or foe? Examining the pros and cons of ChatGPT
Taking a closer look at the performance decline
For the past few weeks, the AI community has been pretty vocal in sharing its dissatisfaction with the responses generated by the popular chatbot. Many of these complaints were made following the May 2023 update by OpenAI.
“The current GPT4 is disappointing, it’s like driving a Ferrari for a month then suddenly it turns into a beaten-up old pickup. I’m not sure I want to pay for it over GPT 3.5,” wrote a developer on the OpenAI’s online forum. Meanwhile, another pointed out that “Over the past few days, it seems like GPT-4 is struggling to do things it did well previously.”
That’s not all.
“I have also noticed a significant downgrade in the logic capabilities of the most recent GPT4 version when discussing/evaluating complex inverse problems, differential rates or patterns of change, and spatial-temporal variability,” complained another user. “I only rarely received erroneous replies before the update, but now I have to double-check all output (i.e., now double-negative conditions sometimes don’t get appropriately translated into a positive condition). I see these errors as more GPT3.5-like than prior GPT4 levels of reasoning. Very frustrating.”
What did the researchers find out?
GPT 3.5 and GPT 4 stand as two of the most widely recognized large language models in the game. Yet, the question of when and how these models receive updates has been somewhat elusive.
Nevertheless, to test the accuracy of their responses, the researchers evaluated the March 2023 and June 2023 versions of the two artificial intelligence models by testing them on the following tasks:
- Math problems
- Sensitive or dangerous questions
- Opinion surveys
- Multi-hop knowledge-intensive questions
- Generating code
- US Medical License tests
- Visual reasoning
The results validated what the users had been saying for quite some time.
“GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy). This is partly explained by a drop in GPT-4’s amenity to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in June than in March in this task,” the researchers explained.
The study also revealed that GPT 4 “became less willing to answer sensitive questions and opinion survey questions in June than in March.” However, it performed better at multi-hop questions in June. Meanwhile, the performance of GPT 3.5 saw a substantial drop when it came to these questions.
“Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March,” the study continued. “Overall, our findings show that the behavior of the “same” LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs.”
How did OpenAI respond?
OpenAI VP for Product Peter Welinder took to Twitter to address the dive in AI chatbot’s popularity and the allegations about its performance degradation.
“No, we haven’t made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one,” he tweeted. “Current hypothesis: When you use it more heavily, you start noticing issues you didn’t see before.”
Wrapping it up
Researchers from Stanford University and UC Berkeley delved into the world of ChatGPT and discovered that the AI chatbot may be losing its edge. The research paper also validated the user complaints about inaccurate responses. OpenAI, however, insists they’re aiming for smarter versions of the LLM and put forward a theory that the uptick in usage could be making users notice these hiccups more.
This highlights the rapidly growing landscape of artificial intelligence and underscores the importance of keeping a watchful eye on AI chatbots such as ChatGPT as they continue to evolve in unpredictable ways.
So, what do you think, is ChatGPT really getting stupider with time? Let us know your thoughts by commenting below.
On the other hand, if you are interested in learning about how ChatGPT and OpenAI handle user data or want to know how hackers can use artificial intelligence to steal your information, stay connected to PureVPN Blog.