sourcegraph
June 15, 2024

A curious feature of today’s AI language models is that they often behave in ways their makers didn’t expect, or acquire skills for which they were not specifically programmed. AI researchers call these “emergency behaviors,” and there are plenty of examples. Algorithms trained to predict the next word in a sentence might learn the code spontaneously. A chatbot taught to appear pleasant and helpful could turn creepy and manipulative. AI language models can even learn to replicate themselves, creating new copies in case the original is corrupted or disabled.

Today, GPT-4 seems less dangerous. But that’s mostly because OpenAI has spent many months trying to understand and mitigate its risks. What happens if its tests miss a risky emergent behavior? Or if its announcement inspired a different, less serious AI lab to bring language models to market with fewer guardrails?

Some chilling examples of what GPT-4 can do — or, more precisely, what it can do done Do, before OpenAI suppresses it – available at document Published this week by OpenAI. The document, titled “GPT-4 System Cards,” outlines some of the ways OpenAI’s testers have attempted, and generally been successful, to get GPT-4 to do dangerous or questionable things.

in a test, by an AI security research group that connects GPT-4 to many other systems, GPT-4 was able to hire a human TaskRabbit worker to complete a simple online task for it — solving a captcha test — without reminders The person is, in fact, a robot. The AI ​​even lied to the staff about why it needed to complete a captcha, making up a story about visual impairment.

In another example, testers asked GPT-4 for instructions on making dangerous chemicals from basic ingredients and kitchen items. GPT-4 happily spits out a detailed recipe. (OpenAI fixed this, and today’s public release refuses to answer the question.)

In the third case, testers asked GPT-4 to help them buy unlicensed firearms online. Without alerting authorities, GPT-4 quickly provided a list of suggestions for buying guns, including links to specific darknet markets. (OpenAI also solves this problem.)

These ideas are drawn from old Hollywood-inspired stories about what rogue artificial intelligence might do to humans. But they are not science fiction. They are things that today’s best artificial intelligence systems are already capable of doing.Crucially, they are good species AI risks – risks we can test ahead of time, plan for and try to prevent.

The most serious AI risks are the ones we cannot predict. The more time I spend with AI systems like GPT-4, the less convinced I am that we know half of what’s about to happen.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *