News Daily


Men's Weekly

Australia

  • Written by The Conversation
Researchers created a chatbot to help teach a university law class – but the AI kept messing up

“AI tutors” have been hyped as a way to revolutionise education.

The idea is generative artificial intelligence tools (such as ChatGPT) could adapt to any teaching style set by a teacher. The AI could guide students step-by-step through problems and offer hints without giving away answers. It could then deliver precise, immediate feedback tailored to the student’s individual learning gaps.

Despite the enthusiasm, there is limited research testing how well AI performs in teaching environments, especially within structured university courses.

In our new study, we developed our own AI tool for a university law class. We wanted to know, can it genuinely support personalised learning or are we expecting too much?

Our study

In 2022, we developed SmartTest, a customisable educational chatbot, as part of a broader project to democratise access to AI tools in education.

Unlike generic chatbots, SmartTest is purpose-built for educators, allowing them to embed questions, model answers and prompts. This means the chatbot can ask relevant questions, deliver accurate and consistent feedback and minimise hallucinations (or mistakes). SmartTest is also instructed to use the Socratic method, encouraging students to think, rather than spoon-feeding them answers.

We trialled SmartTest over five test cycles in a criminal law course (which one of us was coordinating) at the University of Wollongong in 2023.

Each cycle introduced varying degrees of complexity. The first three cycles used short hypothetical criminal law scenarios (for example, is the accused guilty of theft in this scenario?). The last two cycles used simple short-answer questions (for example, what’s the maximum sentencing discount for a guilty plea?).

An average of 35 students interacted with SmartTest in each cycle across several criminal law tutorials. Participation was voluntary and anonymous, with students interacting with SmartTest on their own devices for up to ten minutes per session. Students’ conversations with SmartTest – their attempts at answering the question, and the immediate feedback they received from the chatbot – were recorded in our database.

After the final test cycle, we surveyed students about their experience.

An example of SmartTest's interaction with students.
An example of SmartTest’s interactions with students. Reproduced with permission from Snowflake Inc., Author provided (no reuse)

What we found

SmartTest showed promise in guiding students and helping them identify gaps in their understanding.

However, in the first three cycles (the problem-scenario questions), between 40% and 54% of conversations had at least one example of inaccurate, misleading, or incorrect feedback.

When we shifted to much simpler short-answer format in cycles four and five, the error rate dropped significantly to between 6% and 27%. However, even in these best-performing cycles, some errors persisted. For example, sometimes SmartTest would affirm an incorrect answer before providing the correct one, which risks confusing students.

A significant revelation was the sheer effort required to get the chatbot working effectively in our tests. Far from a time-saving silver bullet, integrating SmartTest involved painstaking prompt engineering and rigorous manual assessments from educators (in this case, us). This paradox – where a tool promoted as labour-saving demands significant labour – calls into question its practical benefits for already time-poor educators.

Inconsistency is a core issue

SmartTest’s behaviour was also unpredictable. Under identical conditions, it sometimes offered excellent feedback and at other times provided incorrect, confusing or misleading information.

For an educational tool tasked with supporting student learning, this raises serious concerns about reliability and trustworthiness.

To assess if newer models improved performance, we replaced the underlying generative AI powering SmartTest (ChatGPT-4) with newer models, such as ChatGPT-4.5, which was released in 2025.

We tested these models by replicating instances where SmartTest provided poor feedback to students in our study. The newer models did not consistently outperform older ones. Sometimes, their responses were even less accurate or useful from a teaching perspective. As such, newer more advanced AI models do not automatically translate to better educational outcomes.

What does this mean for students and teachers?

The implications for students and university staff are mixed.

Generative AI may support low-stakes, formative learning activities. But in our study, it could not provide the reliability, nuance and subject-matter depth needed for many educational contexts.

On the plus side, our survey results indicated students appreciated the immediate feedback and conversational tone of SmartTest. Some mentioned it reduced anxiety and made them more comfortable expressing uncertainty. However, this benefit came with a catch: incorrect or misleading answers could just as easily reinforce misunderstandings as clarify them.

Most students (76%) preferred having access to SmartTest rather than no opportunity to practise questions. However, when given the choice between receiving immediate feedback from AI or waiting one or more days for feedback from human tutors, only 27% preferred AI. Nearly half preferred human feedback with a delay and the rest were indifferent.

This suggests a critical challenge. Students enjoy the convenience of AI tools, but they still place higher trust in human educators.

A need for caution

Our findings suggest generative AI should still be treated as an experimental educational aid.

The potential is real – but so are the limitations. Relying too heavily on AI without rigorous evaluation risks compromising the very educational outcomes we are aiming to enhance.

Read more https://theconversation.com/researchers-created-a-chatbot-to-help-teach-a-university-law-class-but-the-ai-kept-messing-up-257551

Understanding Shrink Films for Packaging: What You Need to Know

In today’s fast-paced business environment, efficient and reliable packaging is more important than ever. One packaging solution that has gained widespread popularity across industries is shrink films for packaging. This versatile material offers excellent protection, enhances product presentation, and supports... Read more

How to Bulletproof Your Contracts Against Disputes

In the business world, contracts are the backbone of transactions, partnerships and collaborations. Yet even well-meaning agreements can lead to disputes if they’re not carefully drafted by business contract lawyers. To avoid costly legal battles and protect your interests, it's... Read more

Top 5 Providers of SEO Focused Guest Posts in Florida You Can Trust

Many companies today aim to increase their online presence, which is a good use for guest blogging. In guest blogging, you compose content for the benefit of other websites that link back to yours. This promotes your business and increases... Read more

The Role of Litigation Lawyers in Brisbane

Litigation lawyers in Brisbane play a crucial role in the legal landscape, ensuring justice is accessible and efficiently administered for the clients they represent. They have expertise in handling disputes that may result in court proceedings, with their work encompassing... Read more

Edge Computing: Revolutionising Connectivity in the Digital Age

Edge computing is rapidly transforming how organisations process and manage data, bringing computational power closer to where it's most needed. In an increasingly connected world, Microsoft Azure services are at the forefront of this technological revolution, enabling businesses to leverage... Read more

What You Need to Know About Towing a Caravan

Towing a caravan can be an exciting way to explore Australia's vast landscapes, but it also comes with its own set of challenges. Whether you’ve just purchased a new caravan or are browsing caravans for sale, understanding the ins and... Read more