Be wary! ChatGPT data privacy is nightmarish
You are at risk and vulnerable if you have ever posted online
ChatGPT has taken the world by storm within two months of its release. In a mindboggling penetration, it has garnered 100 million active users, earning it the distinction of being the world's fastest-growing consumer application in such a short span. Users are attracted to the tool's advanced capabilities – and concerned by its potential to cause disruption in various sectors.
However, a much less discussed implication is the privacy risks ChatGPT poses to each and every one of us. Just the other day, Google unveiled its own conversational AI called Bard. Others will surely follow. Technology companies working on AI have well and truly entered an arms race. The problem is that it fuelled by our personal data. 300 billion words! How many are yours?
ChatGPT is underpinned by a large language model that requires massive data to function and improve. The more data the model is trained on, the better it gets at detecting patterns, anticipating what will come next and generating plausible text. OpenAI, the company behind ChatGPT, fed the tool some 300 billion words systematically scraped from the internet: books, articles, websites and posts, including personal information obtained without consent. If you've ever written a blog post or product review or commented on an article online, there's a good chance this information was consumed by ChatGPT. So why is that an issue?
The data collection used to train ChatGPT is problematic for several reasons. First, none of us were asked whether OpenAI could use our data. This is a clear violation of privacy, especially when data are sensitive and can be used to identify us, our family members or our location. Even when data are publicly available, their use can breach what we call textual integrity.
This is a fundamental principle in legal discussions of privacy. It requires that individuals' information is not revealed outside of the context in which it was originally produced. Also, OpenAI offers no procedures for individuals to check whether the company stores their personal information or to request it be deleted. This is a guaranteed right in accordance with the European General Data Protection Regulation (GDPR), although it's still under debate whether ChatGPT is compliant with GDPR requirements. This 'right to be forgotten' is particularly important in cases where information is inaccurate or misleading, which seems to be a regular occurrence with ChatGPT.
Moreover, the scraped data ChatGPT was trained on can be proprietary or copyrighted. For instance, when I prompted it, the tool produced the first few paragraphs of Peter Carey's novel "True History of the Kelly Gang" – a copyrighted text. Finally, OpenAI did not pay for the data it scraped from the internet. Individuals, website owners and companies that produced it were not compensated. This is particularly noteworthy considering OpenAI was recently valued at $29 billion, more than double its value in 2021. OpenAI has also just announced ChatGPT Plus, a paid subscription plan that will offer customers ongoing access to the tool, faster response times and priority access to new features. This plan will contribute to expected revenue of $one billion by next year. None of this would have been possible without data – our data – collected and used without our permission.
(The author is Professor in Business Information Systems, University of Sydney)