Artificial Intelligence Is Working Remotely
A new tool from OpenAI shows how COVID-19 set the scene for a revolution in white-collar work.
In 2023, I wrote that AI and remote work were a match made in heaven. The COVID-19 lockdowns forced and expedited the online transition of many work processes. Today, most white-collar work and collaboration happen within digital environments: In a Slack conversation or a hybrid Zoom meeting, inside a Google sheet or a GitHub repository. Regardless of whether we are working at an office, hybrid, or fully remote, work happens in environments that are digitally accessible.
This accessibility makes it easier for humans to work from anywhere. But, more importantly, it makes it easier for non-humans to step right in and pick up human tasks. Specifically, it means that new AI models can plug directly into our conversations, documents, and work environments and make things happen. In 2023, that means connecting a Google Sheet to ChatGPT or bringing an AI assistant to summarize a Zoom meeting. In 2025, it would mean much more.
OpenAI recently showcased two new tools that illustrate what this means in practice. In late January, it announced "Operator," an AI agent that can use a browser to perform tasks on the user's behalf. For example, you can ask the Operator to "buy some healthy dog food for my 150-pound Great Dane dog," and it would then browse on its own to Target, search and compare different products, fill in the necessary forms, and complete the order on your behalf. You can also ask it to order food, book an Uber, book hotels and flight tickets, and more.
Yesterday, OpenAI announced "deep research," a new tool that can browse the web and use advanced reasoning to conduct detailed, multi-step research tasks and summarize them in detailed reports with tables and citations — just like a human would. In the example below, an OpenAI employee asks deep research, evaluate, and compare mobile adoption rates and usage patterns across different markets:
Just like a human employee, the tool responds to any requests with a series of questions. It wants to make sure it understands you perfectly.
The tool is available in the $200/month pro tier of ChatGPT and will be gradually rolled out to more users. And just like a human, it can deal with vague instructions and specific requests to rely on its own judgment. Once it is ready to begin, the model displays a feed that allows the user to see what it is doing: the websites and sources it visits, the questions it is pondering, and whatever sections of the final answers it is working on.
The ability to adopt a human workstyle — to respond to human requests, use tools that humans use, and deliver reports in the same formats and channels humans use — is a huge advantage. The quality of the work is also impressive.
In January, a group of researchers launched a new benchmark to test the quality of AI models. Named "Humanity's Last Test," it consists of around 3,000 challenges on a hundred topics. The challenges range from solving math and computer science problems to deciphering ancient Roman inscriptions and recalling details from Greek mythology.
In January, OpenAI's latest GPT-4O model managed to solve Humanity's Last Test with 3.3% accuracy, while comparable models like Claude 3.5 Sonnet and Google Gemini Thinking achieved 4.3% and 6.2%, respectively. DeepSeek-R1, the Chinese model that made waves last week, achieved 9.4%. Now, the new OpenAI deep research model has managed to achieve 26.6% on the test, and it did so without browsing the web to look for answers and without using code to solve math problems.
This is an incredible result and an incredible amount of progress in a matter of weeks. How long would it take for AI models to accurately answer nearly all questions that humans can come up with? By the looks of it, not very long.
Employers and landlords are still arguing with employees about "return to the office," with Elon Musk leading the latest battle to get government employees back at their desks — or out of their jobs. While the battle rages on, we might be missing the bigger picture. As we look back on the COVID-19 lockdowns, their most significant impact will be in paving the way for the replacement rather than the displacement of many white-collar jobs. It doesn't mean everyone will be unemployed, but it does mean people will be doing very different things, and they will likely do them in spaces that are designed and managed differently.
Deep research is available to ChatGPT's $200/month pro tier. Have you tried it? I would love to hear about your experience.
Have a great week,
🎤 How will AI reshape our cities, offices, and markets? My speaking schedule for the spring is filling up. Visit my speaker profile and get in touch to learn more.
Dror Poleg Newsletter
Join the newsletter to receive the latest updates in your inbox.