OpenAI’s current ChatGPT watermark feature is 99.9 percent accurate, but it can be fooled
OpenAI reportedly has a system to watermark ChatGPT-generated text, along with a fairly accurate tool to identify such texts. While it has had the tech for sometime, OpenAI isn’t willing to make it public just yet.
The tool works by tweaking the way ChatGPT generates words. In doing so it leaves a trail, essentially watermarking the text. Importantly, the watermark isn’t visible to readers, and will exist even if the text is copied elsewhere. It also doesn’t negatively affect the quality of the output.
The watermarked text is then picked up by a detector tool, specifically designed for this purpose. According to OpenAI’s internal documents, the detector tool is 99.9 percent effective in picking up watermarked text.
Pros and cons of ChatGPT watermark
While AI-generated texts are popping up everywhere, their use in academia is particularly alarming. Earlier this year, plagiarism detector Turnitin revealed that it found millions of papers laced with AI-generated content. Not surprisingly, they would jump at a system such as the ChatGPT watermark tool.
Read | ChatGPT 2024 updates: What’s new and what’s to come
However, one of the reasons OpenAI hasn’t released the watermarking tool is that it isn’t yet perfect. While 99.9 percent is way better than the accuracy of current AI detector tools, it still leaves room for a lot of false positives.
It’s also reported that OpenAI conducted a survey that shows a majority felt a less than perfect tool could falsely accuse honest students.
For starters, nearly 30 percent said they would stop using ChatGPT if it starts watermarking its generated text. OpenAI also believes a watermarking tool might hurt non-native English speakers, who use ChatGPT as a legitimate writing tool.
In a recently updated blog post OpenAI says that its watermarking tool can withstand paraphrasing. Yet it isn’t completely tamper-proof, at least in its current form.
For instance, running the watermarked ChatGPT-generated text through a translation system like Google Translate would fool the detector. It would also fail if someone asks ChatGPT to insert emojis into the watermarked text, and then manually remove them.
The metadata method
It’s easy to understand why OpenAI hasn’t rolled out the ChatGPT watermark feature yet. But it’s exploring several other mechanisms to identify AI-generated text. One such method involves adding cryptographically signed metadata in its output.
Read: How to use ChatGPT for business growth
It is quite a mouthful, but OpenAI is confident this method won’t lead to any false positives. “We expect this will be increasingly important as the volume of generated text increases,” writes OpenAI.
It explains the solution would be similar to how OpenAI watermarks AI-generated images via DALL-E 3. OpenAI tags C2PA metadata, which help people identify when and how images were modified.
It illustrates by showing an AI-generated picture of a caterpillar. The image is then edited to have the caterpillar wearing a Santa hat. Every time the image is edited, the C2PA metadata is rewritten, essentially keeping a log of the changes. Looking at the C2PA metadata reveals the entire history of the image.
There’s still time
The European Union (EU) has drafted the EU AI Act that applies to providers of all AI systems. It also covers general-purpose AI systems that generate synthetic audio, image, video or text content.
Chapter 4, Article 50 of the act says that all AI systems must “ensure that the outputs of the AI system are marked in a machine-readable format and detectable as artificially generated or manipulated”.
The Act came into force on August 1, 2024, but it will be implemented in a phased manner.
The section that deals with watermarking AI-generated content will be fully applicable 24 months after the act comes into force. This means it will become mandatory for developers of AI systems to help users identify synthetic content sometime around August 2026.
In a recent post, OpenAI promised it’s “committed to complying” with the EU’s AI Act. So while the current ChatGPT watermark tool might not be up to scratch, the company will introduce one that is within the next 24 months.
Elsewhere, Google’s AI research lab DeepMind has developed a tool to watermark AI-generated text that’s undergoing beta testing.
Their SynthID toolkit can watermark and identify all kinds of AI-generated content. It does this by embedding digital watermarks into synthetic content which is “imperceptible to humans but detectable for identification”.
For more technology stories, click here.