Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

OpenAI Finally Launched GPT-5. Here’s Everything You Need to Know

By Wired by By Wired
August 7, 2025
Home AI & ML
Share on FacebookShare on Twitter


OpenAI’s blog post claims that GPT-5 beats its previous models on several coding benchmarks, including SWE-Bench Verified (scoring 74.9 percent), SWE-Lancer (GPT-5-thinking scored 55 percent), and Aider Polyglot (scored 88 percent), which test the model’s ability to fix bugs, complete freelance-style coding tasks, and work across multiple programming languages.

During the press briefing on Wednesday, OpenAI post-training lead Yann Dubois prompted GPT-5 to “create a beautiful, highly interactive web app for my partner, an English speaker, to learn French.” He tasked the AI to include features like daily progress, a variety of activities like flashcards and quizzes, and noted that he wanted the app wrapped up in a “highly engaging theme.” After a minute or so, the AI-generated app popped up. While it was just one on-rails demo, the result was a sleek site that delivered exactly what Dubois asked for.

“It’s a great coding collaborator, and also excels at agentic tasks,” Michelle Pokrass, a post-training lead, says. “It executes long chains and tool calls effectively [which means it better understands when and how to use functions like web browsers or external APIs], follows detailed instructions, and provides upfront explanations of its actions.”

OpenAI also says in its blog post that GPT-5 is “our best model yet for health-related questions.” In three OpenAI health-related LLM benchmarks—HealthBench, HealthBench Hard, and HealthBench Consensus—the system card (a document that describes the product’s technical capabilities and other research findings) states that GPT-5-thinking outperforms previous models “by a substantial margin.” The thinking version of GPT-5 scored 25.5 percent on HealthBench Hard, up from o3’s 31.6 percent score. These scores are validated by two or more physicians, according to the system card.

The model also allegedly hallucinates less, according to Pokrass, a common issue for AI where it provides false information. OpenAI’s safety research lead Alex Beutel adds that they’ve “significantly decreased the rates of deception in GPT-5.”

“We’ve taken steps to reduce GPT-5-thinking’s propensity to deceive, cheat, or hack problems, though our mitigations are not perfect and more research is needed,” the system card says. “In particular, we’ve trained the model to fail gracefully when posed with tasks that it cannot solve.”

The company’s system card says that after testing GPT-5 models without access to web browsing, researchers found its hallucination rate (which they defined as “percentage of factual claims that contain minor or major errors”) 26 percent less common than the GPT-4o model. GPT-5-thinking has a 65 percent reduced hallucination rate compared to o3.

For prompts that could be dual-use (potentially harmful or benign), Beutel says GPT-5 uses “safe completions,” which prompts the model to “give as helpful an answer as possible, but within the constraints of remaining safe.” OpenAI did over 5,000 hours of red teaming, according to Beutel, and testing with external organizations to make sure the system was robust.

OpenAI says it now boasts nearly 700 million weekly active users of ChatGPT, 5 million paying business users, and 4 million developers utilizing the API.

“The vibes of this model are really good, and I think that people are really going to feel that,” head of ChatGPT Nick Turley says. “Especially average people who haven’t been spending their time thinking about models.”



Source link

Tags: Artificial Intelligencechatgptopenaisam altman
By Wired

By Wired

Next Post
Stocks making the biggest moves midday: Fortinet, Duolingo, Eli Lilly, AppLovin and more

Stocks making the biggest moves midday: Fortinet, Duolingo, Eli Lilly, AppLovin and more

Recommended.

NS&I seeks Bank of England counsel over project disaster | Computer Weekly

NS&I seeks Bank of England counsel over project disaster | Computer Weekly

March 4, 2026
ZTE recognized with EcoVadis Gold Medal for sustainability excellence, ranked among the top 4% globally

ZTE recognized with EcoVadis Gold Medal for sustainability excellence, ranked among the top 4% globally

January 21, 2025

Trending.

Veeam Debuts Data Resiliency Maturity Model To Assess, Improve Customers’ Cyber Resiliency

Veeam Debuts Data Resiliency Maturity Model To Assess, Improve Customers’ Cyber Resiliency

April 23, 2025
CELLCOM ISRAEL LTD. Announcement of A Special General Meeting of The Shareholders of The Company

CELLCOM ISRAEL LTD. Announcement of A Special General Meeting of The Shareholders of The Company

May 21, 2025
Pia Debuts Automation Hub, A Centralized Marketplace For MSPs: Exclusive

Pia Debuts Automation Hub, A Centralized Marketplace For MSPs: Exclusive

November 19, 2025
Insurance Modernization at Risk as Workforce Strategies Fall Behind, Says Info-Tech Research Group

Insurance Modernization at Risk as Workforce Strategies Fall Behind, Says Info-Tech Research Group

May 8, 2026
VNET Wins 40MW Wholesale Order from Leading Internet Company for Its New Strategic IDC Campus

VNET Wins 40MW Wholesale Order from Leading Internet Company for Its New Strategic IDC Campus

September 11, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio