Read The Times Australia

Daily Bulletin

Researchers created a chatbot to help teach a university law class – but the AI kept messing up

  • Written by: Armin Alimardani, Senior Lecturer in Law and Emerging Technologies, University of Wollongong
Researchers created a chatbot to help teach a university law class – but the AI kept messing up

“AI tutors” have been hyped as a way to revolutionise education.

The idea is generative artificial intelligence tools (such as ChatGPT) could adapt to any teaching style set by a teacher. The AI could guide students step-by-step through problems and offer hints without giving away answers. It could then deliver precise, immediate feedback tailored to the student’s individual learning gaps.

Despite the enthusiasm, there is limited research testing how well AI performs in teaching environments, especially within structured university courses.

In our new study, we developed our own AI tool for a university law class. We wanted to know, can it genuinely support personalised learning or are we expecting too much?

Our study

In 2022, we developed SmartTest, a customisable educational chatbot, as part of a broader project to democratise access to AI tools in education.

Unlike generic chatbots, SmartTest is purpose-built for educators, allowing them to embed questions, model answers and prompts. This means the chatbot can ask relevant questions, deliver accurate and consistent feedback and minimise hallucinations (or mistakes). SmartTest is also instructed to use the Socratic method, encouraging students to think, rather than spoon-feeding them answers.

We trialled SmartTest over five test cycles in a criminal law course (which one of us was coordinating) at the University of Wollongong in 2023.

Each cycle introduced varying degrees of complexity. The first three cycles used short hypothetical criminal law scenarios (for example, is the accused guilty of theft in this scenario?). The last two cycles used simple short-answer questions (for example, what’s the maximum sentencing discount for a guilty plea?).

An average of 35 students interacted with SmartTest in each cycle across several criminal law tutorials. Participation was voluntary and anonymous, with students interacting with SmartTest on their own devices for up to ten minutes per session. Students’ conversations with SmartTest – their attempts at answering the question, and the immediate feedback they received from the chatbot – were recorded in our database.

After the final test cycle, we surveyed students about their experience.

An example of SmartTest's interaction with students.
An example of SmartTest’s interactions with students. Reproduced with permission from Snowflake Inc., Author provided (no reuse)

What we found

SmartTest showed promise in guiding students and helping them identify gaps in their understanding.

However, in the first three cycles (the problem-scenario questions), between 40% and 54% of conversations had at least one example of inaccurate, misleading, or incorrect feedback.

When we shifted to much simpler short-answer format in cycles four and five, the error rate dropped significantly to between 6% and 27%. However, even in these best-performing cycles, some errors persisted. For example, sometimes SmartTest would affirm an incorrect answer before providing the correct one, which risks confusing students.

A significant revelation was the sheer effort required to get the chatbot working effectively in our tests. Far from a time-saving silver bullet, integrating SmartTest involved painstaking prompt engineering and rigorous manual assessments from educators (in this case, us). This paradox – where a tool promoted as labour-saving demands significant labour – calls into question its practical benefits for already time-poor educators.

Inconsistency is a core issue

SmartTest’s behaviour was also unpredictable. Under identical conditions, it sometimes offered excellent feedback and at other times provided incorrect, confusing or misleading information.

For an educational tool tasked with supporting student learning, this raises serious concerns about reliability and trustworthiness.

To assess if newer models improved performance, we replaced the underlying generative AI powering SmartTest (ChatGPT-4) with newer models, such as ChatGPT-4.5, which was released in 2025.

We tested these models by replicating instances where SmartTest provided poor feedback to students in our study. The newer models did not consistently outperform older ones. Sometimes, their responses were even less accurate or useful from a teaching perspective. As such, newer more advanced AI models do not automatically translate to better educational outcomes.

What does this mean for students and teachers?

The implications for students and university staff are mixed.

Generative AI may support low-stakes, formative learning activities. But in our study, it could not provide the reliability, nuance and subject-matter depth needed for many educational contexts.

On the plus side, our survey results indicated students appreciated the immediate feedback and conversational tone of SmartTest. Some mentioned it reduced anxiety and made them more comfortable expressing uncertainty. However, this benefit came with a catch: incorrect or misleading answers could just as easily reinforce misunderstandings as clarify them.

Most students (76%) preferred having access to SmartTest rather than no opportunity to practise questions. However, when given the choice between receiving immediate feedback from AI or waiting one or more days for feedback from human tutors, only 27% preferred AI. Nearly half preferred human feedback with a delay and the rest were indifferent.

This suggests a critical challenge. Students enjoy the convenience of AI tools, but they still place higher trust in human educators.

A need for caution

Our findings suggest generative AI should still be treated as an experimental educational aid.

The potential is real – but so are the limitations. Relying too heavily on AI without rigorous evaluation risks compromising the very educational outcomes we are aiming to enhance.

Authors: Armin Alimardani, Senior Lecturer in Law and Emerging Technologies, University of Wollongong

Read more https://theconversation.com/researchers-created-a-chatbot-to-help-teach-a-university-law-class-but-the-ai-kept-messing-up-257551

Business News

Is Your Brand Showing Up in AI Search? Most Melbourne Brands Aren't.

The New Front Door Nobody Told You About Something changed. Quietly. Without a press release. The way buyers find businesses in Australia has been rewired. Not replaced, rewired. Google isn't dead...

Daily Bulletin - avatar Daily Bulletin

How Australian Businesses Can Measure SEO ROI

SEO can feel vague when you are staring at a dashboard full of numbers that do not clearly connect to revenue. The key is to measure the right signals in the right order, then tie them back to outcome...

Daily Bulletin - avatar Daily Bulletin

How Commercial Roller Shutters Improve Site Security Without Slowing Operations

Security upgrades can be frustrating when they make everyday work harder. A door that takes too long to open, creates bottlenecks at shift change, or fails at the worst time can turn “better protectio...

Daily Bulletin - avatar Daily Bulletin

Why a Document Destruction Service Still Matters for Modern Businesses

Businesses generate large volumes of information every day, from staff records and contracts to invoices, reports and customer files. While attention often focuses on how documents are stored, the way...

Daily Bulletin - avatar Daily Bulletin

Bicycle Rack Safety and Space-Smart Storage

Bike storage problems usually show up as small annoyances first: tangled handlebars, scratched frames, and bikes that topple when you pull one out. Over time, those issues become safety risks, especia...

Daily Bulletin - avatar Daily Bulletin

How to Tell if a Childcare Centre Is a Good Fit for Your Child

Choosing childcare can feel like you’re making a huge decision with limited information. Tours are short, centres are often on their best behaviour, and your child might act differently in a new space...

Daily Bulletin - avatar Daily Bulletin

Car Import Timeline: What Usually Happens at Each Stage

Importing a car into Australia can feel confusing because multiple agencies and checkpoints are involved, and the timeline is shaped as much by paperwork quality as it is by shipping speed. The most u...

Daily Bulletin - avatar Daily Bulletin

Portable Toilet Hygiene Standards Explained: Clean vs Sanitised vs Disinfected

In portable toilet servicing, the words clean, sanitised, and disinfected often get used as if they mean the same thing. They don’t. And that difference matters because a unit can look tidy and still ...

Daily Bulletin - avatar Daily Bulletin

Options Available When a Company Faces Financial Distress

Financial distress can develop gradually or arrive suddenly, and when it does, the decisions made in the early stages often determine what options remain available later. Directors who act promptly ...

Daily Bulletin - avatar Daily Bulletin

The Daily Magazine

What Actually Makes a Good Criminal Lawyer in Melbourne

Most people only think about this question once. That is usually too late. Most people charged wi...

Why Working With A Chatswood Tutor Can Improve Academic Performance

Academic expectations continue increasing for students across primary school, high school, and senio...

Is It Worth Getting Solar Panels in Melbourne?

The real question is not whether solar works in Melbourne. It works. The question is what it is co...

How A Diploma Of Project Management Builds Practical Skills For Modern Work Environments

Developing the ability to plan, execute, and deliver outcomes efficiently is a key requirement in to...

How to Choose the Right Football for Every Level

Choosing a football may seem straightforward, but the right option depends on who will be using it a...

What to Ask a Wedding Photographer Before You Book

Booking a wedding photographer can feel deceptively simple: you like the photos, you like the vibe...

Why Stress Relief For Dogs Is Essential For Emotional Balance And Long-Term Wellbeing

Managing emotional health is just as important as physical care when it comes to pets, which is why ...

Australia’s Best Walking Trails and the Shoes You Need to Tackle Them

Australia is not short on spectacular walks. You can follow ocean cliffs in Victoria, cross ancien...

Why Pre-Purchase Building Inspections Are Essential Before Buying a Home in Australia

source Have you ever walked through an open home and started picturing your furniture, family d...