Generative AI and large language model applications like ChatGPT have entered the toolbox of a growing segment of U.S. students.
Expecting the use of AI to only become more prevalent in schools with time, researchers at the College of William & Mary recently studied its impact on learning outcomes in early computer science courses.
“I feel like this is … not just a research topic,” said Janice Zhang, who led the research team. “It’s a very social topic. It has a social impact behind it.”
A Pew Research Center study from 2023 found about one in five U.S. teens who had heard of ChatGPT used it on schoolwork. A different survey from February 2024 found that more than a third of college students did so.
Zhang and her team published their study in July, the first study on the topic that looked at changes in subjects over a period of time.
The research centered around the introduction of an AI assistant called CodeTutor, which the researchers developed, to a group of 50 lower-division computer science students. This experimental group was able to access the tool throughout the semester to help with coding homework and answer questions about the course content.
The data collected from this group was compared internally and externally to another group of students that did not have access to CodeTutor.
The group with access to the tool, “the experimental group” as graduate student researcher Wenhan Lyu called them, “actually improved when compared to the control group without it.”
In addition to seeing higher grades in the group that could access CodeTutor, Zhang said students in the experimental group without prior experience with ChatGPT-style applications performed even better.
“We don’t have a conclusive response towards why we see that,” Zhang said. “That will really require a follow-up … study. I think that’s something we will plan to do in the upcoming fall.”
While those with access to CodeTutor achieved higher grades on average, researchers said students felt the tool became less effective the further they got into the course. Students turned more and more to human teaching assistants with questions instead as the material moved into units that required more critical thinking.
The study has led to future questions for researchers to consider, such as what role the quality of prompts entered into CodeTutor or ChatGPT plays in learning outcomes.
“Maybe the prompts [better performers] ask are better,” Zhang said.
Zhang said the “garbage in, garbage out” idiom rings true when it comes to the current iteration of applications like ChatGPT. She hypothesized that students who better understand the “art of asking good questions” typically received better responses from CodeTutor and performed better in the course.
This is backed up by the finding that nearly two-thirds of prompts entered by students were “unsatisfactory” to the research team, lacking in context necessary for relevant outputs.
Questions also remain around the use of generative AI in courses for different fields. Researchers expect tools like CodeTutor would have varying degrees of success.
“We need to design in which class, or in which scenario, this generative AI can be a good virtual assistant,” said Yimeng Wang, another graduate student researcher on the team.
Zhang hopes instructors can keep an open mind about its future in the classroom. Generative AI and large language model tools aren’t going away any time soon, she said.
“Decades ago, we [didn’t] even have Google,” Zhang said. “Now we have Google. Well, do we see a significant change in terms of critical thinking?”
“It’s not really about the tools, but more about how can we foster this critical thinking from the very beginning of education?”