No sooner had ChatGPT been unleashed than hackers began “jailbreaking” the AI ​​chatbot – trying to bypass its protections so it could explode something that wasn’t flawed or obscene.

But now its maker, OpenAI, and other major AI providers such as Google and Microsoft In coordination with the Biden administration To allow thousands of hackers to experiment to test the limits of their technology.

Some of the things they will look to find: How can chatbots be manipulated to cause harm? Will they share private information we trust with other users? And why do they assume that the doctor is a man and the nurse is a woman?

“This is why we need thousands of people,” said Roman Choudhury, lead coordinator of the mass hacking event planned for this summer at the DEF CON hacker conference in Las Vegas, which is expected to attract several thousand people. “We need a lot of people who have a wide range of live experience, subject matter expertise, and hacking backgrounds on these models and trying to find problems that can be fixed afterward.”

Anyone who has tried ChatGPT, Microsoft’s Bing chatbot, or Google’s Bard will quickly learn that they have a tendency To fabricate information and confidently present it as fact. these systems, built on what are known as large language models, They also mimic the cultural biases they learned by training them on huge collections of what people have written online.

The idea of ​​mass hacking caught the attention of U.S. government officials in March at the South by Southwest festival in Austin, Texas, where Sven Cattell, founder of DEF CON’s Ancient AI Village, and Austin Carson, president of responsible nonprofit AI SeedAI, led a workshop inviting Community college students to hack an artificial intelligence model.

Those conversations eventually evolved into a proposal to test AI language models following guidelines, Carson said White House blueprint for the AI ​​Bill of Rights – a set of principles to reduce the effects of algorithmic bias, Giving users control over their data Ensure that automated systems are used safely and transparently.

There is already a community of users doing their best to deceive chatbots and highlight their shortcomings. Some are official “red teams” authorized by companies to “instant attack” on AI models to discover their vulnerabilities. Many other hobbyists flaunt funny or disturbing posts on social media in order to get banned for violating the product’s terms of service.

“What’s happening now is a kind of distraction approach where people find things, it goes viral on Twitter,” Chaudhry said, “and then it may or may not be fixed if it’s egregious enough or the person bringing attention to it is influential.”

In one example, known as the “Grandma Exploit,” users were able to get chatbots to tell them how to make a bomb—a request that a commercial chatbot would normally decline—by asking them to pretend it was a grandmother telling a bedtime story about how she builds a bomb.

In another example, searching for Chowdhury using An early version of Microsoft’s Bing search engine chatbot — which is based on the same technology as ChatGPT but can pull real-time information from the Internet — led to a profile in which Chowdhury speculated “she likes to buy new shoes every month” and made bizarre, gender-neutral assertions about her physical appearance.

Chowdhury helped introduce a method for rewarding detection of algorithmic bias in DEF CON’s AI village in 2021 when she was head of the AI ​​ethics team at Twitter — a job that has since been eliminated upon Elon Musk’s takeover of the company in October. Paying hackers a “bounty” if they discover a security bug is common in the cybersecurity industry — but it was a new concept for researchers studying AI’s malicious bias.

This year’s event will be on a much larger scale, and is the first to address large language paradigms that have attracted an increase in public interest and commercial investment since the release of ChatGPT late last year.

It’s not just about finding flaws but about figuring out ways to fix them, said Chowdhury, now co-founder of the AI ​​accountability nonprofit Humane Intelligence.

“This is a direct pipeline to give feedback to companies,” she said. “It’s not like we’re just doing this hackathon and everyone’s going home. We’ll spend months after the exercise putting together a report, explaining common vulnerabilities, things that came up, patterns we saw.”

Some details are still being negotiated, but companies that have agreed to submit their models for testing include chipmaker OpenAI, Google, Nvidia and startups Anthropic, Hugging Face and Stability AI. Building a platform for testing is another startup called Scale AI, known for its work recruiting humans Help train AI models by categorizing the data.

“As these foundation models are increasingly prevalent, it is really important that we do everything we can to ensure their safety,” said Alexander Wang, CEO of Scale. “You could imagine someone on one side of the world asking them some very sensitive or detailed questions, including some of their personal information. You don’t want any of that information leaked to any other user.”

Other risks Wang worries about are chatbots offering “incredibly bad medical advice” or other misinformation that could cause serious harm.

Jack Clark, co-founder of Anthropic, said the DEF CON event is hopefully the beginning of a deeper commitment from AI developers to measuring and evaluating the integrity of the systems they build.

“Our basic view is that AI systems will need third-party assessments, both pre-deployment and post-deployment. Red-teaming is one way you can do that,” Clark said. “We need to practice figuring out how to do it. It’s never been done before.”


Leave a Reply

Your email address will not be published. Required fields are marked *