ARTICLE AD
Nvidia unveiled a prototype AI avatar at CES 2025 that lives on your PC’s desktop. The AI assistant, R2X, looks like a video game character, and it can help you navigate apps on your computer.
The R2X avatar is rendered and animated using Nvidia’s AI models, and users can run the avatar on popular LLMs of their choice, such as OpenAI’s GPT-4o or xAI’s Grok. Users can talk with R2X through text and voice, upload files to it for processing, or even enable the AI assistant to view what’s happening live on your screen or camera.
Tech companies are creating a lot of AI avatars recently, not just in video games but also for enterprise and consumer customers. The early demoes are strange, but some think these avatars are a promising user interface for AI assistants. With R2X, Nvidia is trying to combine generative video game capabilities with cutting-edge AI assistants to create an AI assistant that looks and feels like a human.
Much like Microsoft’s Recall feature (which has been delayed due to privacy concerns), R2X can take constant screenshots of your screen and run them through an AI model for processing, though this feature is turned off by default. When on, it can offer feedback on applications running on your computer and, for example, help you work through a complex coding task.
R2X is still a prototype, and even Nvidia admits there are still some bugs to work out. In demos with TechCrunch, Nvidia’s avatar had an uncanny-valley feel to it — its face sometimes got stuck in odd positions, and its tone felt a little aggressive at times. And broadly, i think it’s odd to have a little humanoid avatar stare at me while I do my work.
It generally offered helpful instructions and accurately viewed what was on the screen. But at one point, the avatar gave us incorrect instructions, and later on, the avatar stopped being able to view the screen at all. This may be an issue with the underlying AI model (in this case, GPT-4o), but the example shows the limitations of this early technology.
In one demo, an Nvidia product lead showed how R2X can view, and assist users with, the apps on your screen. Specifically, R2X helped us use Adobe Photoshop’s generative fill feature. The photo we selected was Nvidia CEO Jensen Huang, standing in an Asian restaurant with two restaurant workers. Nvidia’s avatar hallucinated and gave the wrong instructions for where to find the generative fill feature. But after switching the AI model we used to xAI’s Grok, the avatar regained its screen viewing abilities.
In another demo, R2X was able to ingest a PDG from the desktop and then answer questions about it. This process is powered by a local retrieval augmented generation feature, which gives these AI avatars the ability to pull information from a document and process it using its underlying LLM.
Nvidia is using some AI models from its video game division to power the way these avatars look. To generate avatars, Nvidia uses its RTX neural faces algorithm. To automate the face, lip, and tongue movement, Nvidia is using a new model called Audio2Face™-3D. That model seemed to stall at some points, holding the avatars face in awkward positions.
The company also says these R2X avatars will be able to join Microsoft Teams meetings, acting as a personal assistant.
An Nvidia product lead says the company is working to give these AI avatars agentic abilities as well, so that R2X could one day take actions on your desktop. These abilities seem to be a long way out, and they would likely require partnerships with software makers like Microsoft and Adobe, who are trying to develop similar agentic systems themselves.
It’s not immediately clear how Nvidia is generating the voices in these products. R2X’s voice when using GPT-4o sounds unique from any of ChatGPT’s preset voices, whereas xAI’s Grok chatbot doesn’t have a voice mode at all yet.
The company plans to open-source these avatars in the first half of 2025. Nvidia sees this as a new user interface for developers to build with, allowing users to plug in their favorite AI software products or even run these avatars locally.
Maxwell Zeff is a senior reporter at TechCrunch specializing in AI and emerging technologies. Previously with Gizmodo, Bloomberg, and MSNBC, Zeff has covered the rise of AI and the Silicon Valley Bank crisis. He is based in San Francisco. When not reporting, he can be found hiking, biking, and exploring the Bay Area’s food scene.