OpenAI’s Deep Research Agent Is Coming for White-Collar Work

openai’s-deep-research-agent-is-coming-for-white-collar-work

Isla Fulford, a researcher at OpenAI, had a hunch that Deep Research would be a hit even before it was released.

Fulford had helped build the artificial intelligence agent, which autonomously explores the web, deciding for itself what links to click, what to read, and what to collate into an in-depth report. OpenAI first made Deep Research available internally; whenever it went down, Fulford says, she was inundated with queries from colleagues eager to have it back. “The number of people who were DMing me made us pretty excited,” says Fulford.

Since going live to the public on February 2, Deep Research has proven to be a hit with many users outside the company too.

“Deep Research has written 6 reports so far today,” Patrick Collison, the CEO of Stripe posted on X a few days after the product was released. “It is indeed excellent. Congrats to the folks behind it.”

“Deep Research is the AI product that really got a meaningful chunk of the policymaking community in DC to start feeling the AGI,” wrote Dean Ball, a fellow at George Mason University who specializes in AI policy.

Deep Research is available as part of the ChatGPT Pro plan, which costs $200 per month. It takes a query, such as “Write me a report on the Massachusetts health insurance industry,” or “Tell me about WIRED’s coverage of the Department of Government Efficiency,” and then comes up with a plan, searching for relevant websites, combing through their content, and deciding what links to click and what information deserves further investigation. After exploring for sometimes tens of minutes, it synthesizes its findings into a detailed report, which may include citations, data, and charts.

Many tools currently branded as AI agents are essentially chatbots connected to simple programs without much sophistication. The Deep Research model itself goes through an artificial kind of reasoning before devising a plan and moving forward with each step. The model provides details of this reasoning behind its research in a side window.

“Sometimes it’s like ‘I need to backtrack, this doesn’t seem that promising,’” says Josh Tobin, another OpenAI researcher involved in building Deep Research. “It’s pretty cool to read some of those trajectories, just to understand how the model is thinking.”

OpenAI evidently sees Deep Research as a tool that could take on more office work. “This is a thing that we can scale,” Tobin says, adding that the agent could be trained to complete specific white-collar work. An agent with access to a company’s internal data could quickly prepare a report or presentation, for instance. Tobin says the longer goal is to “build an agent that is not just good at building reports through searching the web, but is good at many other types of tasks too.”

Because Deep Research was trained to analyze and summarize human-written text, Tobin says his team was surprised to see many people using it to generate code. “It’s an interesting thread to pull,” he says. “We’re not totally sure what to make of it.”

Tobin admits, however, that the tool still has important blind spots. “It may struggle with distinguishing authoritative information from rumors,” he says. “It currently shows a weakness in confidence calibration, often failing to convey uncertainty accurately.”

Age of Reasoning

Deep Research shows how more-capable AI models could automate white-collar work, says Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania who studies business adoption of AI.

Mollick, who uses Deep Research regularly, says that although the tool is imperfect and most effective when used by experts who can check its work, it has impressed professionals he has spoken to. “For senior-level people it’s not that it’s flawless or that it beats the best people,” Mollick says. “It’s that it can do 40 hours of medium-level work, and it only takes an hour to check,” Mollick says.

Whether companies will view such tools as a way to augment their workers or simply replace them wholesale remains to be seen. “That’s what worries me the most,” Mollick says.

The prospect of selling tools that can automate large amounts of highly skilled office work perhaps explains why OpenAI is considering offering advanced agents at a steep premium. The company has told investors that agents capable of doing “PhD-level work” could eventually cost $20,000 per month, according to a recent report from The Information, although details of such a plan remain unclear. OpenAI spokesperson Kayla Wood describes the report as “purely speculation.”

Besides hinting at changes in white-collar work, Deep Research illustrates how frontier AI research is increasingly focused on both agents and so-called reasoning models that break problems down into constituent parts in order to better parse and solve them.

OpenAI’s main rivals are all developing reasoning models of their own, as well as tools similar to Deep Research. Google DeepMind released a web research agent with the same name as OpenAI’s tool on December 10, 2024. Elon Musk’s Grok offers a similar feature.

Deep Research appears to be the most sophisticated offering currently, partly because it is based on OpenAI’s most advanced reasoning model, called OpenAI o3. While a conventional large language model just generates text in response to a query, Deep Research uses a form of simulated reasoning to decide what actions to take next. Such “agentic” abilities are widely seen as the next evolutionary step for AI, although getting models to take actions without making mistakes remains challenging.

“Deep Research is a natural extension of these reasoning models,” says Ruslan Salakhutdinov, a computer scientist at Carnegie Mellon University who is also working on web agents. Salakhutdinov says, however, that AI agents are still at an early stage, are still error prone, and there is likely to be a lot of experimentation and innovation ahead.

OpenAI hired graduate students and other highly skilled professionals to help train Deep Research. These users give queries and then correct mistakes, providing training data for a reinforcement learning algorithm that lets the model learn to become a better research assistant.

WIRED spoke to several Deep Research trainers who also seemed impressed by the tool. “The first thing it does now, it asks for clarification and that’s huge,” says Olga Schrivner, a linguist at the Rose-Hulman Institute of Technology who is helping train Deep Research. “It’s almost like communication, and all of a sudden it becomes like your assistant.”

“My grandpa is a mathematician,” says Alexander Zerkle, a graduate student in microbiology at UC San Diego who has been providing training data for Deep Research. “He wanted it to prove what’s called the Schroeder-Bernstein theorem. I gave that to Deep Research, and it spat out a very long proof. I don’t understand any of it, but it’s very exciting to him as a mathematician.”

As tools like Deep Research become more widespread, they may start to change how many people use the web, even as the mania that accompanied the chatbot boom starts to fade.

Amelia Glaese, who leads work on alignment at OpenAI, says that no matter how clever a chatbot is, a model that goes beyond generating text by taking actions and does valuable work is a different proposition. “You have a model that has this very big utility—that has learned how to do some of the manual work involved with research,” she says. “Then I think there’s a new set of people that are like, ‘Wow, this is really useful.’”

How do you feel about AI agents like Deep Research? Are there tasks you’d be interested to see them perform? Tell me all about it in the comments below.

Related Posts

Leave a Reply