ChatGPT can now read some of your Mac’s desktop apps

2 months ago 50

ARTICLE AD

OpenAI’s ChatGPT is starting to work with other apps on your computer.

On Thursday, the startup announced the ChatGPT desktop app for MacOS can now read code in a handful of developer-focused coding apps, such as VS Code, Xcode, TextEdit, Terminal, and iTerm2.

That means developers will no longer have to copy and paste their code into ChatGPT, which has become a common way to use the chatbot. Now, when the feature is enabled, OpenAI will automatically send the section of code you’re working on through its chatbot as context, alongside your prompt.

However, unlike popular AI coding tools such as Cursor or GitHub Copilot, ChatGPT is currently unable to write code directly into developer apps on your behalf.

The feature, called Work with Apps, is far from an AI agent, but OpenAI says getting ChatGPT to understand other apps is a “key building block” towards building agentic systems. One of the biggest challenges facing AI agents today is getting them to understand the rest of your computer screen, as opposed to prompts or their own responses.

OpenAI says it’s focusing this feature on coding apps to start; this is likely because AI coding assistants have taken off as one of the most popular use cases for LLMs. The feature is available to Plus and Teams users today, and will roll out to Enterprise and Edu in the next few weeks. OpenAI says ChatGPT will be able to work with other types of apps moving forward, specifically, text-based apps that could be used for writing tasks.

In a demo with TechCrunch, an OpenAI employee opened the ChatGPT app and an Xcode environment containing a simple project modeling the solar system – although it was missing the Earth. The employee selected an Xcode tab within ChatGPT, which tells the AI chatbot to look at the app, and prompted the chatbot to “add the missing planets.” The chatbot was able to complete the task, writing a line of code to represent the Earth that matched the rest of the project’s format. They still had to paste ChatGPT’s answer back into their environment, though.

In order to read different apps, OpenAI is mostly relying on the MacOS Accessibility API to read text and translate it to ChatGPT, according to OpenAI desktop product lead Alexander Embiricos. MacOS’s screen reader, which helps Apple’s VoiceOver feature work, has been around for nearly two decades. It’s generally considered pretty reliable for most common apps, but not everything.

For some apps, such as Microsoft’s VS Code, Work with Apps requires users to install a special extension to query content. And, as the name suggests, Apple’s screen reader can only read text, so it can’t help ChatGPT understand visual elements – such as photos, the orientation of objects, or videos.

Work with Apps with send your last 200 lines of code through ChatGPT alongside every prompt for certain apps. For others, all the code in your foremost window will be used as input for the chatbot. You can highlight sections of code or text to help ChatGPT focus on the right part of the project, but ChatGPT will also include text surrounding it. This all sounds like it will use a lot of input tokens.

It’s unclear how OpenAI plans to branch this feature out to other apps that are not compatible with Apple’s screen reader. Anthropic, one of OpenAI’s competitors, released an AI system that analyzes screenshots of a user’s desktop to understand and use other apps. To be frank, Anthropic’s approach leaves a lot to be desired in its current state: it’s slow and makes a lot of mistakes. However, it’s a more general purpose version of an AI agent that doesn’t rely on APIs, and can do more than just read text in another window.

“This isn’t meant to be an agent, it’s a way to collaborate with coding tools to start, and there will be more tools coming soon” said OpenAI desktop product lead Alexander Embiricos in a briefing with TechCrunch. “On the side of agents, I think this is a really key building block. This idea that ChatGPT understands or can work with all the content that you have so that it can help with it.”

This step towards agents is especially notable given recent reports that OpenAI is nearing the release of a general purpose AI agent, codenamed “Operator,” according to Bloomberg. The tool is expected to arrive in early 2025, and would rival other early attempts at general purpose AI agents, such as Anthropic’s Computer use or Google’s reported “Jarvis” agent.

OpenAI is first releasing these features on MacOS, shortly before Apple launches an integration with ChatGPT in December. It’s unclear when Work with Apps will come to Windows, the operating system created by OpenAI’s largest backer, Microsoft.

Read Entire Article