Overall, ChatGPT was fairly good at solving problems in the different coding languages—but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively.
“However, when it comes to the algorithm problems after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,” Tang notes.
For example, ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66 percent after this time as well.
“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset,” Tang says.
Essentially, as coding evolves, ChatGPT has not been exposed yet to new problems and solutions.
It lacks the critical thinking skills of a human and can only address problems it has previously encountered. This could explain why it is so much better at addressing older coding problems than newer ones.
.
.
.
While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes.
“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems, thus, this simple error feedback information is not enough,” Tang explains.