Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

Interview: Using AI agents as judges in GenAI workflows | Computer Weekly

By Computer Weekly by By Computer Weekly
September 16, 2025
Home Uncategorized
Share on FacebookShare on Twitter


Around 40 years ago, a bank branch manager probably knew the name of every customer and was able to offer personalised advice and guidance. But as Ranil Boteju, chief data and analytics officer at Lloyds Banking Group, points out, in today’s world, that model cannot scale.

“In the world of financial planning, most people in the UK cannot afford to see a financial planner,” he says.

There is also an insufficient number of trained financial advisers to help everyone seeking advice, which is why financial institutions are looking at how they can deploy generative artificial intelligence (GenAI) to support customers directly.

But the large language models (LLMs) and GenAI from hyperscalers are rather like black boxes and can deliver incorrect responses, known as hallucinations in AI terms. None of these things are acceptable in a sector regulated by the Financial Conduct Authority (FCA).

What excites Boteju is the ability to scale the 40-year-old model of a bank manager to meet current demand by using artificial intelligence in a way that provides the bank with confidence that the AI is able to understand what people need and give them the right guidance in a way that can be assessed and meets FCA guidelines.



“It would be a great ‘unlock’ for the UK in terms of giving access to high-quality financial guidance to a much broader and larger set of the population,” he says.

As Boteju notes, banks have been using AI for many years. “We’ve been using all sorts of machine learning algorithms for things like credit risk assessments and fraud screening for more than 15 years,” he says. “We’ve also been using chatbots for at least 10 years.”

As such, AI is a very well-used capability in financial services. What’s new, however, is generative AI and agentic AI. “Generative AI burst on the scene in late 2022 with ChatGPT. It’s been about for almost two-and-a-half years now,” says Boteju.

While banks have experience with AI, they have needed to figure out how to use generative AI and large language models. Speaking of his own experience, Boteju says: “We think about things like model performance and whether we are using the right algorithm.”

There is also transparency, ethics, guardrails and how the AI models are deployed. Boteju says: “These are common both to large language models and traditional AI. But generative AI has specific challenges in financial services because we are a regulated industry.”

Since generative AI can often lead to hallucinations, he says banks have to be very cautious about how they expose large action models directly to customers. “We put a lot of effort into ensuring that the outputs of the large language models are correct, accurate and transparent, and there’s no bias.”

In a regulated industry, it is vital to ensure the AI models are not hallucinating. “That’s probably one of the key things we need to be really cognisant of,” he says.

The need for specialist AI models

As Boteju notes, a model like Google Gemini is trained on everything. “If you ask it a question, the output will be based on its knowledge of everything. It’s been trained on lots and lots of data.”

Not all of this data is relevant to financial services, however. By restricting the AI model to data specific to financial services, the model should, in theory, hallucinate less.

“We felt quite strongly that we wanted to use a language model or a group of models that were specifically trained on financial services data relevant to the UK,” says Boteju.

This led to Lloyds Banking Group approaching Scottish startup Aveni to support the development of FinLLM, a financial services-specific large language model. In 2024, the company secured £11m of investment from Puma Private Equity, with participation from Lloyds and Nationwide.

Discussing the work with Aveni, Boteju says Lloyds Banking Group did not want to be tied to one specific model, so it decided to take an open approach to foundation models. From an AI sovereignty perspective, he says: “We don’t want to be limited to the large hyperscale models. There’s a fantastic ecosystem of open source models that we want to encourage, and the fact that we could create a FinLLM that is UK-centric in the UK is something we found very appealing.”

The bank has been testing FinLLM in its audit team, where an audit chatbot virtual assistant developed by Group Audit & Conduct Investigations (GA&CI) at Lloyds Banking Group is transforming how auditors access and interact with audit intelligence. The chatbot integrates generative AI with the group’s internal documentation system, Atlas, making information retrieval faster, smarter and more intuitive.

Boteju says the bank effectively trained the chatbot using FinLLM and its knowledge of audits, based on all the audit data it has collected.

He describes the approach Lloyds Banking Group has taken to reduce errors as “agent as a judge”. “You may have a specific model or agent that comes up with a specific outcome,” he says. “Then we’ll develop different models and different agents that review those outcomes and effectively score them.”

The bank has been working closely with Aveni to develop the approach of using AI agents as judges to assess the output of other AI models.

Each outcome is independently assessed by a set of different models. The review of the outputs from the AI models enables Lloyds to ensure they are aligned with FCA guidelines as well as the bank’s internal regulations.

Checking the outputs of AI models is a really good way to double-check that the customer is not being given bad advice, according to Boteju, who adds: “We’re in the process of refining these guardrails, and it’s imperative that we have [this process] in place.”

Boteju points out that having a human in the loop will remain important regardless of the “agent as a judge” approach. “There is still very much a place for humans in the loop in the future,” he says.

The power of different AI models in agentic AI

While an AI model like FinLLM has been tuned to understand the ins and outs of banking, Boteju says other models are much better at understanding human behaviour. This means the bank could, for instance, use one of the AI models from a hyperscaler, such as ChatGPT 5 or Google Gemini, to understand what the customer is actually saying.

“We would then use different models to break down what they’re saying into component parts,” he says. Different models are then tasked with tackling each distinct part of the customer query. “The way we think about this is that there are different models with different strengths, and what we want to do is to use the best model for each task.”

This approach is how the bank sees agentic AI being deployed. With agentic AI, says Boteju, problems are broken down into smaller and smaller parts, where different agents respond to each part. Here, having an agent as a judge is almost like a second-line colleague acting as an observer.



Source link

By Computer Weekly

By Computer Weekly

Next Post
CrowdStrike CEO George Kurtz: ‘Huge Service Opportunity’ Ahead For Partners

CrowdStrike CEO George Kurtz: ‘Huge Service Opportunity’ Ahead For Partners

Recommended.

Ex-CIA Analyst Sentenced to 37 Months for Leaking Top Secret National Defense Documents

Ex-CIA Analyst Sentenced to 37 Months for Leaking Top Secret National Defense Documents

June 18, 2025
The hidden cost of “good enough”: Why CIOs must rethink data risk in the AI era

The hidden cost of “good enough”: Why CIOs must rethink data risk in the AI era

August 4, 2025

Trending.

⚡ Weekly Recap: Oracle 0-Day, BitLocker Bypass, VMScape, WhatsApp Worm & More

⚡ Weekly Recap: Oracle 0-Day, BitLocker Bypass, VMScape, WhatsApp Worm & More

October 6, 2025
Cloud Computing on the Rise: Market Projected to Reach .6 Trillion by 2030

Cloud Computing on the Rise: Market Projected to Reach $1.6 Trillion by 2030

August 1, 2025
Stocks making the biggest moves midday: Autodesk, PayPal, Rivian, Nebius, Waters and more

Stocks making the biggest moves midday: Autodesk, PayPal, Rivian, Nebius, Waters and more

July 14, 2025
The Ultimate MSP Guide to Structuring and Selling vCISO Services

The Ultimate MSP Guide to Structuring and Selling vCISO Services

February 19, 2025
Translators’ Voices: China shares technological achievements with the world for mutual benefit

Translators’ Voices: China shares technological achievements with the world for mutual benefit

June 3, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio