The AI language model story continues, as it will. Faster Networks previously wrote extensively about the need to monitor and evaluate AI language models and content generators to maintain some scepticism and find truth in reality, where you can find it.
We particularly had some concern and reservation about where and how the data was being collected, collated and ultimately, regurgitated. In February, The Conversation publishing site asked the question, ‘how many of the 300 billion words would be yours?’ AI content generators are trained by existing internet content, that is: books to blogs to articles to art to a status on Facebook or Instagram – personal information or public information. The AI language models are reliant on posts, comments and much copyrighted material because to give context to the questions they need to access the broadest range of content. But there are tight rules around context and how much content can be stripped from its original source and used elsewhere. When you type in a question to ChatGPT, the answer does not include a reference or source. The Conversation piece includes an example that is block text from Catcher in the Rye, a copyrighted source. Considering the value of OpenAI, in the billions, would it be fair to assume that they could pay for data?
Then there is the privacy issue and why Italy recently has taken steps to ban Open AI’s generative text tool, ChatGPT4 from operating in their jurisdiction, for now. A Wired piece here alerts the reader to the Italian plight for privacy. Wherein private information has been used as part of a data set for ChatGPT. Europe has strict regulations on data privacy and there has been an investigation by the Italian regulator, Garante per la Protezione dei Dati Personali. This same investigation is being followed closely by other European countries that could follow suit.
The Future of Life Institute that was established in 2015 with a mission to ‘steer transformative technology towards benefitting life and away from extreme large-scale risks’ penned an Open Letter on March 22 asking big tech companies developing AI generated models, that are more powerful than ChatGPT4, to pause and reflect for 6 months.
The rush to roll out AI language and content models has meant that some of the ordinary checks and balances to keep things behind closed doors has not been tried and tested and they are at high risk of being hacked. Not by using code but instead using carefully worded prompts that can disrupt the language model and get in behind the scenes.
For example, according to WIRED, a jailbreak can trick systems into explaining how to hot wire a car or make methamphetamines by subverting the intention of the program, like having AI models talking to one another in one word sentences to get around illegal content, including hate speech. It took barely a day following ChatGPT4 release in 2022 before a security expert had encouraged the software to spout homophobic messages and create phishing emails and other inappropriate content. Let’s be reminded here there are no age restrictions for using ChatGPT4.
These are good reasons to take a moment to pause the development of language generators to check if we are on the right path or if this technology has the power to go horribly wrong. You can become a signatory to the open letter here.