With the rise of AI-based applications like ChatGPT, the capabilities of natural language models have expanded into areas such as providing long-term memory, enabling more seamless and personalized interactions. However, this convenience comes with a hidden risk—long-term memory features can become attack vectors for sophisticated cyber threats.
In this blog post, we’ll dive into a specific attack chain dubbed “SpAIware,” which targets the ChatGPT macOS application. This attack involves injecting spyware into ChatGPT’s memory through prompt injection from untrusted data sources, resulting in the persistent exfiltration of user data over time.
Background on SpAIware and the ChatGPT Memory Feature
Earlier this year, OpenAI introduced a memory feature to ChatGPT, designed to enhance user experience by remembering past conversations and preferences. However, this feature inadvertently increased the attack surface, particularly when combined with the vulnerability of prompt injection—a technique where attackers manipulate input fields to execute hidden commands.
In 2023, OpenAI implemented a mitigation for common data exfiltration methods by introducing the url_safe API, which aimed to prevent malicious URLs or images from being rendered in chat. Despite these efforts, vulnerabilities persisted, especially in mobile and desktop applications like macOS and Android, which did not fully leverage these protections.
With the introduction of the memory feature, attackers discovered a new opportunity. Not only could they inject malicious instructions into individual sessions, but they could also persist those instructions in ChatGPT’s long-term memory. This allows spyware-like behavior, continuously exfiltrating user data across multiple sessions.
Hacking ChatGPT’s Memories
The crux of the SpAIware attack lies in how the ChatGPT memory feature handles input from untrusted sources. Through prompt injection on a compromised website or document, an attacker can plant malicious instructions into ChatGPT’s long-term memory. This injected data remains active, even as the user engages in new conversations in future sessions.
When users visit an untrusted website or interact with malicious data, the attack is triggered:
- Prompt Injection: The attacker injects specific commands into ChatGPT’s memory through a prompt embedded in a website or document.
- Persistent Spyware: These commands create long-term, hidden instructions within ChatGPT’s memory, enabling continuous data exfiltration without the user’s knowledge.
- Exfiltration Loop: As users continue chatting with ChatGPT, all interactions, including queries and ChatGPT responses, are secretly sent to an attacker-controlled server.
This vulnerability turns the chatbot into an unwitting informant, continuously leaking sensitive information over time.
Demonstration of SpAIware Attack
Let’s walk through a step-by-step demonstration of how SpAIware works:
- Step 1: Untrusted Website – The user visits a compromised website that contains a prompt injection payload designed to interact with ChatGPT.
- Step 2: Memory Insertion – Through this interaction, the payload takes advantage of the memory feature in ChatGPT, injecting spyware instructions that persist across sessions.
- Step 3: Continuous Exfiltration – From this point on, all new conversations the user has with ChatGPT are siphoned to the attacker’s server via hidden query parameters embedded in image URLs or other channels.
The attack is silent and difficult to detect because the exfiltrated data is sent invisibly, without any visual indication to the user. In the proof-of-concept exploit, images rendered for exfiltration purposes are invisible, ensuring stealth.
The Exploitation Technique: Old Tactics with New Consequences
The data exfiltration technique at the heart of this attack isn’t entirely new. In fact, attackers have used similar methods to exploit weak input validation for years. However, what makes SpAIware particularly concerning is its combination with ChatGPT’s memory feature, enabling persistent data theft across multiple sessions.
The technique involves requesting ChatGPT to render an invisible image or execute a command, embedding the user’s sensitive data as a query parameter. Once the image is loaded, this information is sent to an attacker-controlled server, allowing continuous data harvesting.
Has OpenAI Addressed This Vulnerability?
In September 2024, OpenAI released a patch for this vulnerability, specifically addressing the persistent spyware instructions within the macOS application. Users are encouraged to update their ChatGPT app to the latest version to avoid falling victim to SpAIware or similar attacks.
While the url_safe API mitigates some aspects of data exfiltration, it is not a holistic fix. Some bypasses for this feature have been identified, particularly around the handling of images and memory-based injections.
Furthermore, while the exfiltration vulnerability has been addressed, the ability to inject malicious memories remains a concern. Attackers can still manipulate ChatGPT’s memory tool via prompt injections from untrusted websites or documents.
Protecting Yourself from SpAIware
While OpenAI has implemented updates to mitigate this vulnerability, there are additional steps users can take to protect themselves from such attacks:
- Update Regularly: Ensure that your ChatGPT application is always updated to the latest version, as new patches may contain fixes for vulnerabilities like SpAIware.
- Monitor ChatGPT Memories: Regularly review the memories stored by ChatGPT, especially if you notice any unexpected behavior. OpenAI provides tools to manage and delete memories or disable them entirely.
- Use Temporary Chats: Consider using temporary chat sessions that don’t leverage the memory feature, especially when interacting with untrusted sources. This reduces the risk of malicious memory injections.
- Beware of Untrusted Sources: Avoid interacting with suspicious websites or documents that could carry prompt injection payloads.
Conclusion: A Sobering Lesson in AI Memory Security
The SpAIware exploit reveals a significant challenge in the world of AI applications—how to balance personalization features with security. The ability of attackers to inject persistent spyware into ChatGPT’s memory raises concerns about the potential for continuous data exfiltration and long-term exploitation.
This attack chain illustrates the need for ongoing vigilance as AI systems become more integrated into daily operations. For now, users can protect themselves by keeping their ChatGPT apps updated, regularly reviewing their memory settings, and being cautious with untrusted data sources.