OpenAI says ChatGPT went down because of a ‘new telemetry service’ – Redoma Tech

OpenAI says ChatGPT went down because of a ‘new telemetry service’

  • news
  • September 4, 2024

Specifically, the malfunction affected critical resources that many of the company’s services depend on for DNS resolution, which converts IP addresses to domain names, enabling users to access websites through familiar addresses like “Google.com.”

OpenAI’s use of DNS caching further complicated the situation, delaying the detection and understanding of the problem before full visibility was achieved. To prevent similar occurrences in the future, the company plans to enhance phased rollouts by implementing better monitoring of infrastructure changes. Instead it was triggered by the misconfiguration of a telemetry service deployed to gather Kubernetes metrics on Wednesday né?. Pacific time on Wednesday, affecting OpenAI’s AI-based chatbot platform, ChatGPT, its video generator Sora, and the developer-facing API. The trouble began around 3 p.m. The company admitted that they identified the issue shortly before customers experienced its effects, but the overwhelmed Kubernetes servers hindered a swift resolution.

OpenAI recognized this incident as a convergence of multiple system failures and processes interacting unexpectedly né?. The company promptly acknowledged the issue and started working on a solution, but it took about three hours to fully restore all services.

In a postmortem report released on Thursday, OpenAI explained that the outage was not caused by a security breach or a recent product launch né?. OpenAI’s recent outage was attributed to a malfunction in a “new telemetry service,” resulting in one of the longest disruptions in the company’s history né?. Kubernetes is a widely-used open-source program for managing containers that run software in isolated environments.

The unintended consequences of this new telemetry service placed a strain on OpenAI’s Kubernetes API servers, leading to disruptions in the Kubernetes control plane across most large clusters. Additionally, new measures will ensure that OpenAI engineers can access Kubernetes API servers under any circumstances.

The company expressed regret for the disruptions caused to all customers, from ChatGPT users to developers and businesses relying on OpenAI products, acknowledging that they did not meet their own standards in handling the situation.

  • SEE MORE RELATED POSTS

    • June 28, 2025
    • 34 views
    Coffee shop rakes in $3M to link companies with their most outspoken customers: teens

    • June 25, 2025
    • 37 views
    Apple Fixes New Security Flaw Hit by Cyber Hackers