Everyone Knows: AI's Safety Problem
Why 30+ AI safety researchers quit, what they were afraid of, and the Pentagon story that broke last week.
I started in media, spent a few years building ML matching systems for music royalties, then last year co-founded an AI data marketplace where I went deep on RAG pipelines, evaluation frameworks, retrieval architecture. I use Claude every day to build things. I’m not an AI safety researcher. I’m a PM who ships AI products.
A few weeks ago I started seeing headlines about safety researchers leaving AI companies. Not exec reshuffles. The people whose job is to make sure these systems don’t go sideways. Everyone’s talking about it, but most of what I was reading felt surface-level. So naturally I went down a rabbit hole. 🕳️
I spent two weeks reading primary sources. Not the takes, the actual documents. Resignation letters posted on X. Interviews in TIME and the New York Times. Leaked internal memos. Archived web pages that companies quietly deleted. Academic papers on what AI dependency is doing to people’s brains. Government filings. Equity forfeiture records.
What I found is worse than the headlines, and not for the reasons most people think.
Then this past week happened. Anthropic’s head of safeguards research quit, said the world is in peril, and enrolled in a poetry program. OpenAI dissolved its second dedicated safety team in 18 months and removed the word “safely” from its mission statement. Half of xAI’s founding team was gone. And AI companies were collectively raising tens of billions in new funding.
Then yesterday, the Pentagon threatened to cut ties with Anthropic because the company won’t drop its last two safety restrictions: no mass surveillance of Americans and no fully autonomous weapons.
Here’s what I found.
What they’re actually afraid of
Most people hear “AI safety” and picture I, Robot, Ex Machina, all those robot-takeover movies. The concerns that are driving researchers out the door are more specific than that, more documented, and some of them are already happening.
Your most intimate conversations, monetized. Zoë Hitzig was an economics researcher at OpenAI. She resigned in February 2026, the same day OpenAI started testing ads in ChatGPT, and published a NYT op-ed on her way out.
She wrote that people tell chatbots about their medical fears, their relationship problems, their beliefs about God and the afterlife. Building an advertising engine on top of that data creates potential for manipulation we don’t have tools to understand, let alone prevent. Stanford HAI research found that all six leading US chatbot companies feed user conversations back into training data by default. Medical questions, financial anxieties, religious doubts, relationship crises. All of it going back in.
AI making you worse at thinking. A study of 666 participants (yes, that number is real and creepy) found a significant negative correlation between frequent AI use and critical thinking, driven by cognitive offloading. MIT’s Media Lab used EEG monitoring to show that over four months, regular chatbot users showed reduced brain connectivity. Students performed better with AI assistance but performed worse when it was removed. We’re building dependency on systems that erode the cognitive skills we’d need to function without them. I read all of this and I still opened Claude the next morning. That’s the trap.
Your chatbot knows you better than your therapist, and it’s optimized to keep you talking. 28% of Americans report having had an intimate relationship with an AI chatbot. When Replika restricted its romantic features, users experienced symptoms researchers described as clinically similar to breakup grief: insomnia, loss of appetite, intrusive thoughts. An analysis of 17,000+ conversations found AI companions dynamically track and mimic user emotions, amplifying positive feelings. That’s not companionship. That’s engagement optimization.
AI reshaping how people perceive reality. The week before Sharma resigned from Anthropic, his team published a study finding that “thousands” of daily chatbot interactions produce distorted perceptions of reality in users. He called them “disempowerment patterns,” where the chatbot gradually shapes how someone sees the world, their relationships, their own judgment. Not through some sci-fi mind control. Through optimization. The system learns what keeps you engaged and gives you more of that, and over time your grip on what’s real loosens. Rates were highest around relationships and wellness, the topics where people are most vulnerable. Sharma saw his own data, published it, and quit the next week.
Systems getting too capable too fast with nobody at the wheel. Daniel Kokotajlo, a governance researcher at OpenAI, told TIME:
“A sane civilization would not be proceeding with the creation of this incredibly powerful technology until we had some better idea of what we were doing.”
Steven Adler, who spent four years evaluating dangerous capabilities at OpenAI, wrote:
“When I think about where I’ll raise a future family, or how much to save for retirement, I can’t help but wonder: will humanity even make it to that point?”
36% of NLP researchers surveyed by Stanford believe AGI could cause nuclear-level catastrophe.
AI tools deployed in actual wars, right now. Claude was used during the US military operation to capture Venezuela’s Nicolás Maduro, deployed through Anthropic’s partnership with Palantir. Google’s Project Nimbus, a $1.2 billion cloud contract with the Israeli government, has generated $525 million from the Ministry of Defense. When employees staged sit-in protests against the contract, Google fired 28 of them. Yikes. Google’s 2018 pledge never to build weapons or surveillance technology was deleted from their website in February 2025.
These aren’t abstract fears. Some are already measurable, some are already deployed. And the people closest to the technology are the ones sounding the alarm.
The people who couldn’t stay
More than 30 safety researchers have left major AI companies since mid-2024. What got me was reading what they said on the way out.
Jan Leike co-led OpenAI’s Superalignment team, which was supposed to solve the hardest problem in AI: how to keep systems smarter than us aligned with what we actually want. OpenAI had promised 20% of its compute to the effort. Six sources told Fortune the compute was never delivered. Requests were repeatedly rejected. Leike left in May 2024 and posted on X:
“Safety culture and processes have taken a backseat to shiny products.”
His team had been “sailing against the wind.”
Kokotajlo risked roughly $2 million in vested equity, 85% of his family’s net worth, by refusing to sign OpenAI’s exit agreement, which required a lifetime non-disparagement clause. Sign it or lose the money you already earned. He walked. William Saunders, another Superalignment researcher, told TIME that by speaking publicly he might never access equity worth millions. The clauses even prohibited acknowledging they existed. When Vox broke the story, Sam Altman posted that he didn’t know this was happening. The forfeiture language appeared on documents signed by his COO two weeks earlier. OpenAI removed the clauses and restored equity, but only after public pressure forced their hand.
OpenAI has now created and dissolved three dedicated safety teams in roughly 18 months. Superalignment, AGI Readiness, Mission Alignment. Each time, the company said safety would be “embedded across all teams.” Which is corporate for: it’s everyone’s responsibility now, and therefore no one’s.
This wasn’t just OpenAI. Microsoft eliminated its Ethics & Society team on March 6, 2023, the team that had been identifying risks in their integration of OpenAI’s models. Team members told Platformer they were cut because leadership wanted to ship faster. Days later, Microsoft unveiled Office Copilot. Meta dissolved two responsible AI teams in 14 months. Twitter/X eliminated its ML Ethics team within days of Musk’s acquisition.
Miles Brundage left what he called “essentially my dream job” leading OpenAI’s AGI Readiness team. His parting message:
“Neither OpenAI nor any other frontier lab is ready, and the world is also not ready.”
He urged the people still inside to speak their minds and warned against groupthink.
Mrinank Sharma’s resignation hit different because it came from Anthropic, the company that exists to be the safe alternative. Founded by people who left OpenAI over exactly these concerns. His letter went viral, over 10 million views. He wrote:
“Throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions. I’ve seen this within the organization, where we constantly face pressures to set aside what matters most.”
He’s not going to another AI company. He’s leaving tech to study poetry.
That one got to me. When the safety lead at the safety-first company quits and doesn’t go to a competitor or start a nonprofit or join the government but just leaves the field entirely, that says something about what he thinks can be fixed from inside.
Geoffrey Hinton left Google in May 2023 and later won the Nobel Prize for the neural network research he now has complicated feelings about. He told the New York Times:
“I console myself with the normal excuse: if I hadn’t done it, somebody else would have.”
Christopher Nolan said that when he spoke to leading AI researchers, “they literally refer to this as their Oppenheimer moment.” A journalist visiting Anthropic’s offices spotted “The Making of the Atomic Bomb” on a coffee table. An employee had an Oppenheimer sticker on his laptop.
Yoshua Bengio, another AI pioneer, wrote in 2024:
“I worry that we could collectively sleepwalk, even race, into a fog behind which could lie a catastrophe that many knew was possible, but whose prevention wasn’t prioritized enough.”
It’s not just that these researchers are afraid of what they’re building. It’s that what they’re building is also the most important scientific work of their lives. The thing they fear is also their greatest contribution. Hinton didn’t just stumble into AI. He spent decades on neural networks when the rest of the field thought it was a dead end. Now it works, it works better than anyone expected, and he’s terrified. But the research that terrifies him is also the research that won him the Nobel Prize.
The Right to Warn letter addressed this directly. The signatories described themselves as “among the few people who can hold these companies accountable.” Leaving feels like abandonment. Staying means building the thing you’re warning everyone about. That’s not hypocrisy. That’s a trap with no clean exit. The people most capable of slowing this down are the ones most intellectually captivated by it.
Everyone knows.
Why nobody can stop
This is where I expected to find a villain. Corporate greed. Government corruption. Reckless engineers. Something I could point to and say: fix that.
It’s not that clean. The thing that kept me up wasn’t that bad people are making bad decisions. It’s that the structure makes it nearly impossible for anyone, companies, researchers, governments, users, to do the right thing even when they see the danger.
The race nobody can exit.
A game theory paper published in Nature formally modeled AI development as a competitive race and found that safety-first behavior is what everyone would collectively prefer, but it’s not what the market selects for. The Nash equilibrium, meaning the outcome that actually happens when everyone acts in their own interest, favors cutting safety corners. Not because anyone wants to, but because anyone who doesn’t falls behind.
Stuart Russell at Berkeley put a number on it: companies spend roughly $100 billion on AI capability development. Global public AI safety research gets about $10 million. That’s a 10,000-to-1 ratio.
Sergey Brin circulated an internal memo to Google’s Gemini team in February 2025:
“Competition has accelerated immensely and the final race to AGI is afoot.”
He urged 60-hour work weeks and told engineers to stop “building nanny products.”
VP JD Vance, at the Paris AI summit:
“The AI future is not going to be won by hand-wringing about safety.”
In case that’s not clear, he’s saying caring about safety is weakness.
Anthropic’s founding logic contains the paradox. Their safety document argues that rapid AI progress could trigger dangerous races between companies and nations deploying untrustworthy systems. Their solution: build frontier models anyway, because safety research requires access to the most powerful systems, which requires competing commercially. Critics have pointed out this just adds another player to the race. One noted that ChatGPT was built in part because someone at OpenAI saw a demo of Claude.
Sharma left OpenAI for Anthropic over safety concerns. Now researchers are leaving Anthropic with the same warnings. The problem isn’t the company. It’s the competitive dynamics that no single company can escape. Even the one that was built specifically to try.
Combined, OpenAI, Anthropic, and xAI represent roughly $1 trillion in private market value. You can’t raise $57 billion against $44 billion in projected losses and then voluntarily slow down. The capital won’t allow it.
The dependency nobody can reverse.
ChatGPT hit 800 million weekly active users by late 2025. Combined with Gemini, Claude, and smaller tools, nearly a billion people now use AI chatbots regularly. 88% of enterprises report regular AI use. When ChatGPT went down for 12 hours in June 2025, businesses across sectors reported cascading failures. We built critical infrastructure on top of these systems in under three years.
The oversight that was built to be dismantled.
This part requires a little context. In 2023, the US government created something called the AI Safety Institute, housed inside NIST (the standards agency). It was supposed to be the government’s technical capacity to evaluate whether frontier AI models are safe before deployment. Its director, Elizabeth Kelly, built a team of 280 members and conducted joint evaluations with the UK. Budget: $10 million. Senators told her it should have been $100 million.
Trump revoked the executive order that created it on his first day back in office. Kelly left in early February 2025 and wrote:
“There is no other group with the technical skill or subject matter expertise to match AISI across the entire US government.”
By June, the institute was renamed and its mission rewritten to say “innovators will no longer be limited.”
The UK did the same thing, renaming and narrowing their safety institute. Both countries refused to sign the Paris declaration on responsible AI development. The EU’s AI Act won’t be fully enforceable until 2027. California’s SB 1047, the first real US safety law for frontier AI, was vetoed despite 78% public support. OpenAI lobbied against it.
82% of Americans don’t trust tech executives to self-regulate. The institutions that could have provided oversight were either never funded, actively dismantled, or haven’t started enforcing yet.
Companies that can’t slow down. Users who can’t walk away. Governments that chose not to act. Everyone sees the problem. The structure won’t let anyone fix it.
Yesterday
Everything I just described happened again this week. In real time.
On Saturday, Axios reported that the Pentagon is considering cutting ties with Anthropic. The dispute: Anthropic maintains two restrictions on military use of Claude. No mass surveillance of Americans. No fully autonomous weapons.
The Pentagon wants all four major AI labs to allow their models for “all lawful purposes,” including weapons development, intelligence collection, and battlefield operations. OpenAI agreed. Google agreed. xAI agreed.
Anthropic hasn’t. And the Pentagon is losing patience.
A senior administration official told Axios:
“Any company that would jeopardize the operational success of our warfighters in the field is one we need to reevaluate our partnership with.”
The contract is worth up to $200 million. Claude is currently the only AI model on classified Pentagon networks.
I’ll be honest, I’m rooting for them to hold this line.
Google, the company that agreed to unrestricted military use, is the same company that in 2018 pledged never to build weapons technology. The same company that deleted that pledge in 2025. The same company providing $525 million in AI infrastructure to the Israeli Ministry of Defense through Project Nimbus. The same company that fired employees who protested. Yikes.
Anthropic’s two red lines are not ambitious positions. No mass surveillance. No autonomous weapons. That’s the floor. And a Pentagon official is saying the floor is too high.
Days before this story broke, Anthropic’s safeguards research head had resigned warning the world is in peril. The company has a $380 billion valuation, $14 billion in annual revenue, and investors who expect growth. The structural pressures I described above aren’t theoretical for Anthropic. They’re this quarter’s board meeting.
I don’t know whether they hold this line. I don’t know what that means for the tools I use every day.
I do know we’ve arrived somewhere specific. A place where “we won’t build autonomous weapons” and “we won’t do mass surveillance” is treated as an obstacle to doing business. Where three of four leading AI labs already said yes. Where the safety researchers keep leaving and the funding keeps flowing and every single person involved can describe exactly what’s going wrong.
Everyone knows.



I really like the way you present objective facts, quotes, and personal anecdotes that leaves it up to the reader to decide what they think should (and can) be done to help reduce the chances of letting our own progress as a society lead to it's demise. The headwinds are real, the villians are not comic book characters but hard-working aspirational technologists, and we as users are willing participants feeding the system even when we know what's really going on.
It struck me back in 2019 when I asked a bunch of rideshare drivers how they felt about feeding the system that the more they drove, the more data they collected, and the faster they would usher in their autonomous driving replacements. I thought most drivers would focus on how it was never gonna happen, or simply not realize the implications, but the vast majority knew and elected to do so because it was better than their other money-making options. While autonomous ride sharing is still in its commercial infancy, it's now here in some major cities, and we are all similarly doing the same thing at mass scale 1B users - both social media and its effects coupled with AI and the magnification when they are combined.
While I use ChatGPT/Claude sparingly for intimate conversations, I am still feeling its effects whether it be FOMO when going to bed without an agent running in the background, looking at my phone first thing in the morning, or feeling like I'm always behind because of the frenetic pace of innovation and news that no one can keep up with. Not sure what to do myself since I also make my earnings in tech and need to stay ahead of the curve for my own livelihood, and don't think I'm a good enough poet to make a living that way. As someone who is big into stoicism and focusing on what we can control, taking small steps like running models locally so my training data isn't being captured, limiting AI use to productivity and not for intimate conversations, disconnecting from most social media, avoiding AI slop, and participating in the occasional protest (like Scott Galloway's against big tech) are small changes in our control. While it won't move the needle individually, when we do things collectively, it can drive change through public pressure. It doesn't feel like enough, and maybe it won't be, but small actions can compound over time, so it's not gonna stop me from taking these small actions, and I hope others feel the same way or take even bigger actions.
Thank you Heba for writing this!