According to Ars Technica, US District Judge Sidney Stein has denied OpenAI’s objections to a previous order, forcing the company to produce 20 million de-identified ChatGPT user logs to news organizations like The New York Times. The news plaintiffs allege these logs will show evidence of copyright infringement, and they are now calling for sanctions against OpenAI. They claim OpenAI destroyed a “quite substantial” fraction of user conversation data after litigation began, including about one-third of all logs in the month after The New York Times filed suit. The plaintiffs want the court to order OpenAI to explain the scope of the destroyed data and whether those millions of deleted chats can be retrieved. Meanwhile, co-defendant Microsoft has agreed to share 8.1 million Copilot logs but hasn’t specified a timeline, leading to further frustration from the plaintiffs.
The privacy argument crumbles
Here’s the thing: OpenAI‘s main defense against handing over the logs was user privacy. They argued that running search terms for the plaintiffs would be the “least burdensome” path. But Judge Stein wasn’t buying it. He pointed out that the sample was already reduced from tens of billions to 20 million, and all identifying info is stripped. So the privacy shield they’re waving feels a bit thin now. The judge also noted that even logs without direct reproductions of news articles are relevant to OpenAI’s “fair use” defense. Basically, the plaintiffs get to see the whole sample. OpenAI says it’s reviewing options, but this looks like the end of the road for that fight.
The deleted chat problem
Now, this is where it gets messy. The news groups are furious, accusing OpenAI of a “playbook” to dodge claims. They say OpenAI kept destroying user logs for 11 months after the lawsuit started. And get this—they allege the data being deleted at a “disproportionately higher rate” was likely from free-tier users, which is exactly where you’d find people trying to skirt paywalls. OpenAI blamed “technical issues” for two spikes in mass deletion and said conversation volume was low around New Year’s 2024. But come on. That’s a pretty weak excuse for losing a third of your relevant data. The real kicker? The plaintiffs claim OpenAI carefully preserved chats from the news organizations’ own test accounts (which help its defense) but didn’t extend that care to third-party user data (which could hurt it). That looks… bad.
What happens next
So what’s the trajectory here? First, the 20 million logs are almost certainly getting handed over. You can read Judge Stein’s full order here. Second, the court will have to decide on sanctions. The plaintiffs want answers: How much data was really deleted? Can it be restored? They’ve laid out their arguments in a supplemental memo. If the judge agrees OpenAI acted in bad faith, the penalties could be significant. And let’s not forget Microsoft, which is dragging its feet on those 8.1 million Copilot logs. The plaintiffs want them “immediately” in a searchable format. This case is becoming a brutal discovery slog, and it’s exposing how these companies handle—or mishandle—user data when under legal pressure. OpenAI’s public stance is on its blog, but the courtroom filings tell a much more aggressive story.
A broader reckoning
Look, this is about more than just 20 million chat logs. It’s a precedent. If the plaintiffs succeed in forcing OpenAI to retrieve deleted data, it sends a shockwave through the entire tech industry. How permanent is “deleted”? Users might think their ephemeral chats are gone forever, but this litigation suggests otherwise. The technical capability to restore mass deletions likely exists. And if the court orders it, we’re talking about pulling millions more conversations into evidence that people never expected to resurface. This case is slowly turning into a forensic examination of AI’s training and output pipeline. It’s not just about copyright anymore; it’s about data governance, preservation obligations, and what “user privacy” really means when a giant legal battle kicks off. The stakes for OpenAI—and for how all AI companies operate—just got a lot higher.
