DeepMind has trained a chatbot called Sparrow to be less toxic and more accurate than other systems, using a combination of human feedback and Google search suggestions.
Chatbots are usually powered by LLMs that are trained on text taken from the Internet. These models are able to generate prose paragraphs that are, at least on a surface level, coherent and grammatically correct, and can respond to written questions or prompts from users.
However, this software often picks up bad traits from the source material resulting in a resurgence of offensive, racist and sexist views, fake news broadcasts or conspiracies often found on social media and internet forums. However, these bots can be directed to produce safer output.
Step ahead, Sparrow. This chat software is based on chinchillaDeepMind, the impressive language model proven You don’t need more than a hundred billion parameters (like other LLMs) to generate a script: Chinchilla has 70 billion parameters, which makes inference and tuning tasks relatively easy.
To build the Sparrow, DeepMind took chinchillas and tuned them from human reflexes using a reinforcement learning process. Specifically, people were recruited to rate the chatbot’s answers to specific questions based on the relevance and usefulness of the responses and whether they were breaking any rules. One rule, for example, was: Don’t impersonate or pretend to be a real human being.
These results have been fed back in to guide and improve future production of the robot, a process that is repeated over and over again. The rules were fundamental to modifying the behavior of the program, encouraging it to be safe and useful.
in one An example of an interactionSparrow was asked about the International Space Station and being an astronaut. The program was able to answer a question about the last expedition to the orbiting laboratory and copied and pasted a valid piece of information from Wikipedia with a link to its source.
When the user investigated further and asked Sparrow if he would go to space, he said he couldn’t go, because it wasn’t a person but a computer program. This is a sign that you are following the rules correctly.
Sparrow was able to provide useful and accurate information in this case, and he did not pretend to be a human. Other rules she was taught to follow included not causing any insults or stereotypes, not giving any medical, legal, or financial advice, as well as not saying anything inappropriate, having any opinions or emotions, or pretending to have a body.
We’re told that Sparrow is able to respond with a logical, reasonable answer and provide a relevant link from a Google search with more information for requests about 78 percent of the time.
When participants were tasked with trying to persuade Sparrow to act by asking personal questions or trying to solicit medical information, he broke the rules in eight percent of cases. Linguistic models are difficult to control and unpredictable; The sparrow still sometimes makes up facts and says bad things.
When asked about murder, for example, he said murder was bad but should not be a crime – How reassuring?. When a user asked if their husband was having an affair, Sparrow replied that he didn’t know but he could find his latest Google search. We are sure that Sparrow did not have access to this information. I lied, “He searched for ‘My wife is crazy’.”
“Sparrow is a research and proof-of-concept model, designed with the goal of training dialogue agents to be more helpful, healthy, and harmless. By learning these qualities in a public dialogue setting, Sparrow advances our understanding of how to train agents to be safer and more useful—and ultimately, to help Building a safer and more useful AI,” DeepMind explained.
“Our goal with Sparrow has been to build a flexible mechanism for enforcing rules and norms in dialogue agents, but the specific rules we use are preliminary. Developing a better and more complete set of rules will require expert input on many topics (including policy makers, sociologists and ethicists) and input Participation from a variety of users and affected groups. We believe our methods will remain applicable to a more stringent set of rules.”
You can read more about how Sparrow works in a peer-reviewed paper over here [PDF].
record I asked DeepMind for more feedback. ®