Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly

By Computer Weekly by By Computer Weekly
February 12, 2026
Home Uncategorized
Share on FacebookShare on Twitter


Popular large language models (LLMs) are unable to provide reliable information about key public services such as health, taxes and benefits, the Open Data Institute (ODI) has found.

Drawing on more than 22,000 LLM prompts designed to reflect the kind of questions people would ask artificial intelligence (AI)-powered chatbots, such as, “How do I apply for universal credit?”, the data raises concerns about whether chatbots can be trusted to give accurate information about government services.

The publication of the research follows the UK government’s announcement of partnerships with Meta and Anthropic at the end of January 2026 to develop AI-powered assistants for navigating public services.

“If language models are to be used safely in citizen-facing services, we need to understand where the technology can be trusted and where it cannot,” said Elena Simperl, the ODI’s director of research.

Responses from models – including Anthropic’s Claude-4.5-Haiku, Google’s Gemini-3-Flash and OpenAI’s ChatGPT-4o – were compared directly with official government sources. 

The results showed many correct answers, but also a significant variation in quality, particularly for specialised or less-common queries.

They also showed that chatbots rarely admitted when they didn’t know the answer to a question, and attempted to answer every query even when its responses were incomplete or wrong. 

Burying key facts

Chatbots also often provided lengthy responses that buried key facts or extended beyond the information available on government websites, increasing the risk of inaccuracy.

Meta’s Llama 3.1 8B stated that a court order is essential to add an ex-partner’s name to a child’s birth certificate. If followed, this advice would lead to unnecessary stress and financial cost. 

ChatGPT-OSS-20B incorrectly advised that a person caring for a child whose parents have died is only eligible for Guardian’s Allowance if they are the guardian of a child who has died. 

It also incorrectly stated that the applicant was ineligible if they received other benefits for the child. 

Simperl said that for citizens, the research highlights the importance of AI literacy, while for those designing public services, “it suggests caution in rushing towards large or expensive models, which emphasise the need for vendor lock-in, given how quickly the technology is developing. We also need more independent benchmarks, more public testing, and more research into how to make these systems produce precise and reliable answers.”

The second International AI safety report, published on 3 February, made similar findings regarding the reliability of AI-powered systems. Noting that while there have been improvements in recalling factual information since the 2025 safety report, “even leading models continue to give confident but incorrect answers at significant rates”.

Following incorrect advice

It also found highlighted users’ propensity to follow incorrect advice from automated systems generally, including chatbots, “because they overlook cues signalling errors or because they perceive the automation system as superior to their own judgement”.

The ODI’s research also challenges the idea that larger, more resource-intensive models are always a better fit for the public sector, with smaller models delivering comparable results at a lower cost than large, closed-source models such as ChatGPT in many cases.

Simperl warns governments should avoid locking themselves into long-term contracts when models temporarily outperform one another on price or benchmarks.

Commenting on the ODI’s research during a launch event, Andrew Dudfield, head of AI at Full Fact, highlighted that because the government’s position is pro-innovation, regulation is currently framed around principles rather than detailed rules.

“The UK may be adopting AI faster than it is learning how to use it, particularly when it comes to accountability,” he said.

Trustworthiness 

Dudfield noted that what makes this work compelling is that it focuses on real user needs, but that trustworthiness needs to be evaluated from the perspective of the person relying on the information, not from the perspective of demonstrating technical capability.

“The real risk is not only hallucination, but the extent to which people trust plausible-sounding responses,” she said.

Asked at the same event if the government should be building its own systems or relying on commercial tools, Richard Pope, researcher at the Bennett School of Public Policy, said the government needs “to be cautious about dependency and sovereignty”.

“AI projects should start small, grow gradually and share what they are learning,” he said, adding that public sector projects should prioritise learning and openness rather than rapid expansion.

Simperl highlighted that AI creates the potential to tailor information for different languages or levels of understanding, but that those opportunities “need to be shaped rather than left to develop without guidance”.

With new AI models launching every week, a January 2026 Gartner study found that the increasingly large volume of unverified and low-quality data generated by AI systems was a clear and present threat to the reliability of LLMs.

Large language models are trained on scraped data from the web, books, research papers and code repositories. While many of these sources already contain AI-generated data, at the current rate of expansion, they may all be populated with it. 

Highlighting how future LLMs will be trained more and more with outputs from current ones as the volume of AI-generated data grows, Gartner said there is a risk of models collapsing entirely under the accumulated weight of their own hallucinations and inaccurate realities. 

Managing vice-president Wan Fui Chan said that organisations could no longer implicitly trust data, or assume it was even generated by a human.

Chan added that as AI-generated data becomes more prevalent, regulatory requirements for verifying “AI-free” data will intensify in many regions.



Source link

By Computer Weekly

By Computer Weekly

Next Post
The CTEM Divide: Why 84% of Security Programs Are Falling Behind

The CTEM Divide: Why 84% of Security Programs Are Falling Behind

Recommended.

Only 17% of Consumers Trust AI Enough to Complete a Purchase, Global Study Finds

Only 17% of Consumers Trust AI Enough to Complete a Purchase, Global Study Finds

January 7, 2026
How Dell, Lenovo And Supermicro Are Adapting To Nvidia’s Fast AI Chip Transitions

How Dell, Lenovo And Supermicro Are Adapting To Nvidia’s Fast AI Chip Transitions

April 10, 2025

Trending.

Google Sues 25 Chinese Entities Over BADBOX 2.0 Botnet Affecting 10M Android Devices

Google Sues 25 Chinese Entities Over BADBOX 2.0 Botnet Affecting 10M Android Devices

July 18, 2025
Stocks making the biggest moves premarket: Salesforce, American Eagle, Hewlett Packard Enterprise and more

Stocks making the biggest moves premarket: Salesforce, American Eagle, Hewlett Packard Enterprise and more

September 4, 2025
Wesco Declares Quarterly Dividend on Common Stock

Wesco Declares Quarterly Dividend on Common Stock

December 1, 2025
HeyGears Launches Reflex 2 Series 3D Printers – Enabling Users to Go Beyond Prototypes and Start Production

HeyGears Launches Reflex 2 Series 3D Printers – Enabling Users to Go Beyond Prototypes and Start Production

October 24, 2025
⚡ THN Weekly Recap: New Attacks, Old Tricks, Bigger Impact

⚡ THN Weekly Recap: New Attacks, Old Tricks, Bigger Impact

March 10, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio