Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

AI model performance improvements show no signs of slowing down

By CIO Dive by By CIO Dive
April 7, 2025
Home Enterprise IT
Share on FacebookShare on Twitter


This audio is auto-generated. Please let us know if you have feedback.

Dive Brief:

  • AI model performance improved significantly over the past two years, according to the latest AI Index report from the Stanford Institute for Human-Centered AI. The research, education, industry and policy group analyzed 29 benchmarks, evaluations and leaderboards to create the 400-plus page study.
  • One benchmark evaluating a model’s ability to resolve GitHub issues from popular open-source Python repositories found the best-performing model at the end of 2023 scored 4.4%, while OpenAI’s o3, released to researchers and developers in December, solved nearly 72% of problems by early 2025.
  • OpenAI’s o1, which was introduced in September, landed the top spot in a multidiscipline task benchmark evaluating multimodal models on deliberate reasoning and college-level subject knowledge. The o1 model scored 4.4 points below the human benchmark and 18.8 points higher than last year’s state-of-the-art score. 

Dive Insight:

AI model costs, accessibility and other areas have room to grow even as analysis of model performance suggests drastic improvement.

The Stanford Institute for Human-Centered AI research found energy efficiency has increased by 40% each year, while hardware costs have declined by 30% annually. Models are also becoming smaller and more efficient. Microsoft’s 3.8 billion parameter Phi-3-mini scored higher than 60% on a widely used benchmark where the smallest model to reach the threshold had 540 billion parameters. 

Cost and accessibility have moved to the forefront of criteria enterprises are assessing in model decisions. China-based AI startup DeepSeek captured attention earlier this year when it claimed its R1 model rivaled leading U.S. models at a fraction of the training cost, underlining the enterprise friction with existing cost structures. 

Responsible AI is another area where CIOs are taking a closer look. Researchers have created new benchmarks and sounded the alarm on poorly constructed tests, according to Stanford’s analysis. 

The HELM Safety, which provides a comprehensive evaluation of language models, and AIR-Bench, which focuses on government regulations, are two examples of benchmarks that evaluate models based on responsible AI metrics. Anthropic’s Claude 3.5 Sonnet is considered the safest in the HELM Safety test, followed closely by OpenAI’s o1.  

Analysts have cautioned CIOs against going all-in on one model or vendor, and instead recommended striving for model-agnostic platforms as the pace of innovation persists. 

Expedia Group developed an internal experimentation platform with that in mind. 

“We really want to make sure we can take advantage of the latest, coolest model,” Shiyi Pickrell, SVP of data and AI at Expedia Group, told CIO Dive. “Some of them have better infrastructure or capabilities, so we built this generic layer allowing us to use different models based on the use case or cost.”

The overall performance gaps between model competitors have narrowed, too, underlining the need for flexibility. 

There was an 11.9% performance gap between the highest and 10th-ranked model in one assessment included in Stanford’s AI Index report last year. The difference shrank to 5.4% this year. Meanwhile, the gap between the top U.S. models and the best Chinese model was 9.26% last year and dwindled to 1.70% in a February assessment.



Source link

By CIO Dive

By CIO Dive

Next Post
Fintech body calls on government for national anti-fraud centre | Computer Weekly

Fintech body calls on government for national anti-fraud centre | Computer Weekly

Recommended.

Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

June 5, 2025
Stocks making the biggest moves after hours: DoorDash, Lyft, Upstart, Super Micro Computer and more

Stocks making the biggest moves after hours: DoorDash, Lyft, Upstart, Super Micro Computer and more

February 11, 2025

Trending.

⚡ Weekly Recap: Oracle 0-Day, BitLocker Bypass, VMScape, WhatsApp Worm & More

⚡ Weekly Recap: Oracle 0-Day, BitLocker Bypass, VMScape, WhatsApp Worm & More

October 6, 2025
Cloud Computing on the Rise: Market Projected to Reach .6 Trillion by 2030

Cloud Computing on the Rise: Market Projected to Reach $1.6 Trillion by 2030

August 1, 2025
Stocks making the biggest moves midday: Autodesk, PayPal, Rivian, Nebius, Waters and more

Stocks making the biggest moves midday: Autodesk, PayPal, Rivian, Nebius, Waters and more

July 14, 2025
The Ultimate MSP Guide to Structuring and Selling vCISO Services

The Ultimate MSP Guide to Structuring and Selling vCISO Services

February 19, 2025
Translators’ Voices: China shares technological achievements with the world for mutual benefit

Translators’ Voices: China shares technological achievements with the world for mutual benefit

June 3, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio