Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

AI model performance improvements show no signs of slowing down

By CIO Dive by By CIO Dive
April 7, 2025
Home Enterprise IT
Share on FacebookShare on Twitter


This audio is auto-generated. Please let us know if you have feedback.

Dive Brief:

  • AI model performance improved significantly over the past two years, according to the latest AI Index report from the Stanford Institute for Human-Centered AI. The research, education, industry and policy group analyzed 29 benchmarks, evaluations and leaderboards to create the 400-plus page study.
  • One benchmark evaluating a model’s ability to resolve GitHub issues from popular open-source Python repositories found the best-performing model at the end of 2023 scored 4.4%, while OpenAI’s o3, released to researchers and developers in December, solved nearly 72% of problems by early 2025.
  • OpenAI’s o1, which was introduced in September, landed the top spot in a multidiscipline task benchmark evaluating multimodal models on deliberate reasoning and college-level subject knowledge. The o1 model scored 4.4 points below the human benchmark and 18.8 points higher than last year’s state-of-the-art score. 

Dive Insight:

AI model costs, accessibility and other areas have room to grow even as analysis of model performance suggests drastic improvement.

The Stanford Institute for Human-Centered AI research found energy efficiency has increased by 40% each year, while hardware costs have declined by 30% annually. Models are also becoming smaller and more efficient. Microsoft’s 3.8 billion parameter Phi-3-mini scored higher than 60% on a widely used benchmark where the smallest model to reach the threshold had 540 billion parameters. 

Cost and accessibility have moved to the forefront of criteria enterprises are assessing in model decisions. China-based AI startup DeepSeek captured attention earlier this year when it claimed its R1 model rivaled leading U.S. models at a fraction of the training cost, underlining the enterprise friction with existing cost structures. 

Responsible AI is another area where CIOs are taking a closer look. Researchers have created new benchmarks and sounded the alarm on poorly constructed tests, according to Stanford’s analysis. 

The HELM Safety, which provides a comprehensive evaluation of language models, and AIR-Bench, which focuses on government regulations, are two examples of benchmarks that evaluate models based on responsible AI metrics. Anthropic’s Claude 3.5 Sonnet is considered the safest in the HELM Safety test, followed closely by OpenAI’s o1.  

Analysts have cautioned CIOs against going all-in on one model or vendor, and instead recommended striving for model-agnostic platforms as the pace of innovation persists. 

Expedia Group developed an internal experimentation platform with that in mind. 

“We really want to make sure we can take advantage of the latest, coolest model,” Shiyi Pickrell, SVP of data and AI at Expedia Group, told CIO Dive. “Some of them have better infrastructure or capabilities, so we built this generic layer allowing us to use different models based on the use case or cost.”

The overall performance gaps between model competitors have narrowed, too, underlining the need for flexibility. 

There was an 11.9% performance gap between the highest and 10th-ranked model in one assessment included in Stanford’s AI Index report last year. The difference shrank to 5.4% this year. Meanwhile, the gap between the top U.S. models and the best Chinese model was 9.26% last year and dwindled to 1.70% in a February assessment.



Source link

By CIO Dive

By CIO Dive

Next Post
Fintech body calls on government for national anti-fraud centre | Computer Weekly

Fintech body calls on government for national anti-fraud centre | Computer Weekly

Recommended.

Stocks making the biggest moves midday: Meta Platforms, AMD, Cisco, Roku, MGM Resorts and more

Stocks making the biggest moves midday: Meta Platforms, AMD, Cisco, Roku, MGM Resorts and more

June 16, 2025
Powell indicates tariffs could pose a challenge for the Fed between controlling inflation and boosting growth

Powell indicates tariffs could pose a challenge for the Fed between controlling inflation and boosting growth

April 16, 2025

Trending.

VIDIZMO Earns Microsoft Solutions Partner Designations for All Three Areas of Azure, Solidifying its Expertise in Delivering AI Solutions

VIDIZMO Earns Microsoft Solutions Partner Designations for All Three Areas of Azure, Solidifying its Expertise in Delivering AI Solutions

June 28, 2025
Tilson Continues to Perform for Clients; Shares Substantial Progress in Chapter 11 Process

Tilson Continues to Perform for Clients; Shares Substantial Progress in Chapter 11 Process

June 27, 2025
OneClik Malware Targets Energy Sector Using Microsoft ClickOnce and Golang Backdoors

OneClik Malware Targets Energy Sector Using Microsoft ClickOnce and Golang Backdoors

June 27, 2025
DHS Warns Pro-Iranian Hackers Likely to Target U.S. Networks After Iranian Nuclear Strikes

DHS Warns Pro-Iranian Hackers Likely to Target U.S. Networks After Iranian Nuclear Strikes

June 23, 2025
Le nombre d’utilisateurs de la 5G-A atteint les dix millions en Chine : Huawei présente le développement de la 5G-A et la valeur de l’IA basée sur des scénarios

Le nombre d’utilisateurs de la 5G-A atteint les dix millions en Chine : Huawei présente le développement de la 5G-A et la valeur de l’IA basée sur des scénarios

June 27, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio