Center for Strategic Assessment and forecasts

Autonomous non-profit organization

Home / Science and Society / Analytical work: the experience of Russian and foreign experts / Articles
How to mislead the computer: the artful science of deception artificial intelligence
Material posted: Publication date: 27-08-2017
In the early twentieth century Wilhelm von Austin, German coach horses and mathematician announced to the world that taught a horse to count. Years background Austin traveled to Germany with a demonstration of this phenomenon. He asked his horse named Clever Hans (breed Orlov Trotter), to calculate the results of simple equations. Hans gave the answer, stomping a hoof. Two plus two? Four beats.

But scientists didn't believe that Hans was so smart, as stated by von Austin. Psychologist Carl Stumpfcarried out a thorough investigation, which was dubbed "Gansovsky Committee." He found that Clever Hans does not solve equations, and reacts to visual signals. Hans tapped his hoof until he reached the correct answer, after which the coach and the enthusiastic crowd burst into shouts. And then it just stopped. When he didn't see these reactions, so he continued knocking.

Computer science has much to learn from Hans. Accelerating the pace of development in this area suggests that most of the us AI has learned enough to give correct answers, but does not understand the information truly. And it is easy to deceive.

Machine learning algorithms quickly has become the all-seeing shepherds of the human herds. PO connects us to the Internet, keeps track of our mail spam and malicious content, and will soon drive our cars. Their deception shifts the tectonic Foundation of the Internet, and threatens our security in the future.

A small research team from Pennsylvania state University, from Google, from U.S. military to develop plans to protect against potential attacks on AI. The theory, put forward in the survey say that an attacker can change what "sees" robomobile. Or activate voice recognition on your phone and force it to visit a malicious website using the sounds that will be for humans only noise. Or allow the virus to seep through a network firewall.

Left – the image of the building on the right is the modified image, which the underlying network belongs to the ostriches. In the middle of shows all the changes applied to the primary image.

Instead of taking control over managing robomobile, this technique shows him something like a hallucination – the image that is actually there.

Such attacks use the image trick [adversarial examples – established Russian term, literally, it turns out something like "the examples of juxtaposition" or "competing examples" – approx. transl.]: images, sounds, text, looking for normal people, but perceived quite differently by the machines. Small changes made by an attacker, can make a deep neural network to draw the wrong conclusions about what she show.

"Any system that uses machine learning to make decisions critical to the security of potentially vulnerable for such attacks," said Alex Kanchalan, a researcher from UC Berkeley, studying attacks on machine learning with deceptive images.

Knowing these nuances in the early stages of development of AI and provides researchers with the tools for understanding methods of correcting these deficiencies. Some are already doing this, and say that their algorithms because of this become more and more efficient to operate.

A large part of the main stream of AI research is based on deep neural networks, in turn based on a more extensive field of machine learning. Technology MO use differential and integral calculus and statistics to create used by most of us soft, like spam filters in email or search the Internet. Over the past 20 years, researchers have begun to apply these techniques to a new idea, neural networks and software structures that mimic the brain. The idea is to decentralize computing on thousands of small equations ("neurons") receiving the data processing and transmitting it further, the next layer of thousands of small equations.

These AI algorithms are trained the same way as in the case of MO, which, in turn, mimics the process of human learning. Show them examples of different things and associated labels. Show the computer (or the child) a picture of a cat say the cat looks like this, and the algorithm learns to recognize cats. But to do this, the computer will have to review thousands and millions of images of cats.

The researchers found that these systems can be attacked in a special way matched the fraudulent data, which they called "adversarial examples".

From 2015, the researchers from Google have shown that deep neural networks can be made to include this image of the Panda to the Gibbons.

"We show you a photo, which clearly shows the school bus, and forced to think that it is an ostrich," says Ian Goodfellow [Ian Goodfellow], a researcher from Google are actively working in the field of such attacks to the neural network.

Changing provided by neural networks image by only 4%, the researchers were able to trick them, forcing to be mistaken with the classification in 97% of cases. Even if it was not known exactly how the neural network processes the images, they could cheat it in 85% of cases. The last optionof cheating without knowing the network architecture is called "attack on black box". This is the first documented case of a functional attack of this kind on a deep neural network, and its importance lies in the fact that about such a scenario and can undergo attacks in the real world.

In the work of researchers from Penn state University, Google and Research laboratories in the U.S. Navy conducted an attack on the neural network classifying images, supported by the project MetaMind and which serves as an online tool for developers. The team built and trained the attacked network, but their algorithm attacks work regardless of the architecture. With this algorithm they were able to fool the neural network"black box" with an accuracy of 84.24%.

Top row photos and characters – correct character recognition.
Bottom row – the network was forced to recognize the signs completely wrong.

Feeding machines incorrect data is not a new idea, but Doug Tygar [Doug Tygar], a Professor from Berkeley who has studied machine learning with the opposition for 10 years, says that this technology attacks evolved from simple MO complex in deep neural networks. Malicious hackers used this technique in spam filters over the years.

Study of tiger originates from work in 2006 for attacks of this kind on the net with MO, which in 2011 expanded with the help of researchers from the University of California, Berkeley, and Microsoft Research. Google team first started to apply deep neural networks, published his first work in 2014, two years after the discovery of the possibility of such attacks. They wanted to make sure that this is not some anomaly, but a real possibility. In 2015, they published another workin which he described how to secure networks and improve their effectiveness, and Ian Goodfellow since gave advice on other research papers in this area, including an attack on a black box.

Researchers call the more General idea of unreliable information "Byzantine data," and through the course of their studies they came to deep learning. The term comes from the famous "the Byzantine generals problem", a thought experiment from the field of Informatics in which a group of generals must coordinate their actions with the help of messengers, without having the confidence that one of them is a traitor. They can't trust information received from their colleagues.

"These algorithms are designed to cope with random noise, but not with Byzantine data," says Tigar. To understand how such attacks work, goodfella offers to imagine the neural network in the form of scatterplots.

Each dot in the chart represents one pixel of the image processed by the neural network. Usually the network tries to hold the line through the data that best represents the totality of all points. In practice it is a bit tricky, because different pixels have different value to the network. In reality it is a complex multidimensional graph, processed by computer.

But in our simple analogy scatterplots form the line taken through the data, determines what the network thinks she sees. For a successful attack on such systems, researchers need to change only a small part of these points, and to force the network to make a decision, which is actually not. In the example with the bus, looking like an ostrich, photos of the school bus riddled with pixels according to the scheme associated with unique characteristics of photos of the ostriches, familiar network. It is invisible to the eye contour, but when the algorithm processes and simplifies the data, extreme data point for the ostrich seem to her appropriate classification. In the variant with black box, the researchers checked the different inputs, to determine how the algorithm sees certain objects.

Giving the classification of objects forged input data and studying the decisions adopted by the machine, the researchers were able to restore the operation of the algorithm so as to fool the image recognition system. Potentially, this system robomobile in such a case, instead of stop sign to see the sign "give way". When they understand how networks work, they were able to get the machine to see anything.

Example of classification of images provides different lines depending on the different objects in the image. Examples of trompe l'oeil can be considered as extreme values on the chart.

The researchers say that such an attack can be entered directly in the image processing system, bypassing the camera, or these manipulations can be carried out with real sign.

But a security expert from Columbia University Alison Bishop says that such a forecast is unrealistic, and depends on the system used in robomobile. If the attacker already has access to the data stream from the camera, and so they can give her any input.

"If they can get to the entrance of the camera, such complexity is not needed, she says. – You can just show her the stop sign".

Other methods of attacks, except bypass the camera – for example, application of visual tags on a real sign, seem Bishop is poorly probable. She doubts that the cameras with low resolution used in robomobile, ever be able to distinguish between small changes on the sign.

The pristine image on the left is classified as a school bus. Fixed on the right like an ostrich. In the middle, the picture changes.

Two groups, one in Berkeley, the other at Georgetown University, has successfully developed algorithms that can give voice commands to digital assistants like Siri and Google Now, sounding like nonsensical noise. For a person such teams seem random noise, but they can give commands to devices like Alexa, unintended by the owner.

Nicholas karlini, one of the researchers of the Byzantine audio-attacks, says that in their tests they were able to activate recognizing audio programs open source, Siri and Google Now, with an accuracy of more than 90%.

The noise is similar to any talks of aliens from science fiction. It's a mixture of white noise and human voice, but it is not similar to voice command.

According to karlini, if this attack any heard the noise of the phone (this requires separate plan of attack for iOS and Android) can be made to visit the web page, also losing the noise that infects and nearby phones. Or this page can quietly download malware. There is also the possibility that these sounds are lost on the radio, and they will be hidden in white noise or in parallel with another audio information.

These attacks can occur because the machine train that almost all data contains important data, and that one of the things that occur most frequently, explains, goodfella.

Cheat network, causing her to believe that she sees a common object easier, since she believes that needs to see such objects are often. Therefore, goodfella and another group from the University of Wyoming were able to get the network to classify the images, which were not there – she identified the objects in white noise, randomly generated black and white pixels.

In the study, goodfella random white noise passed through the network classified it as a horse. This, coincidentally, brings us to the story of Clever Hans, not very mathematically gifted horse.

Goodfella, says that neural networks, like Clever Hans, not really learn some ideas and just learn to know when you find the right idea. The difference is small but important. The lack of fundamental knowledge makes it easier for malicious attempts to recreate the appearance of finding the "right" results algorithm, which turned out to be false. In order to understand what something is, the car needs to understand what it is not.

Goodfella, trained sorting image network as on natural images, and processed (fake), found that could not only reduce the effectiveness of such attacks by 90%, but also to make the network better able to cope with the initial task.

"Forcing to explain really unusual to fake the image, you can achieve even more reliable explanations of the underlying concepts," says Goodfellow.

Two groups of researchers used audio approach similar to the approach Google is defending its neural networks from their own attacks by over-training. They also achieved similar success, with more than 90% reducing the effectiveness of the attack.

Not surprisingly, this area of study has interested the U.S. military. Army research laboratory even sponsored two of the new works on the subject, including an attack on the black box. And although the Department funds research, it does not mean that technology going to use in the war. According to the representative of the Department, from research to usable technologies a soldier can take up to 10 years.

Ananthram Swami [Swami Ananthram], researcher at the army lab in the U.S. participated in the creation of several recent works devoted to the cheating AI. The army is interested in the question of detecting and stopping fraudulent data in a world where not all sources of information can be. Swami points to the set of data obtained from public sensors located in universities and working in open source projects.

"We don't always control all the data. Our enemy is quite easy to trick us, says Swami. In some cases the consequences of this deception may be frivolous, some the opposite."

He also says that the army is interested in Autonomous robots, tanks and other vehicles, therefore, the aim of such studies is obvious. By studying these questions, the army will be able to win yourself a head start in the development of systems, not susceptible to attacks of this kind.

But anyone using the neural network group should be concerned about potential attacks with the cheating AI. Machine learning and AI are in their infancy, and at this time misses security can have terrible consequences. Many companies rely on highly sensitive information to the AI systems that have not passed the test of time. Our neural networks are still too young, that we might know about them everything you need.

A similar oversight led to the fact that bot for Twitter from Microsoft, Tay, quickly turned into a racist with a penchant for genocide. The flood of malicious data and a function "follow me" led to the fact that Tay strongly deviated from the intended path. The bot had been deceived by poor quality input data, and it serves as a convenient example of a bad implementation of machine learning.

Cancelan, says it does not believe that the possibility of such attacks have been exhausted after a successful research teams from Google.

"In the field of computer security, the attacker is always ahead of us, says Cancelan. Is quite dangerous to claim that we solved all the problems with the fraud neural networks using re-training".


RELATED MATERIALS: Science and Society