Center for Strategic Assessment and forecasts

Autonomous non-profit organization

Home / Science and Society / Future of the Information Society / Articles
Fundamental limitations of machine learning
Material posted: Publication date: 10-09-2017
Recently my aunt sent to my colleagues email with the subject "math problem! What's the right answer?" The letter was a deceptively simple puzzle:

1 + 4 = 5
2 + 5 = 12
3 + 6 = 21
8 + 11 =?

For her the decision was obvious. But her colleagues decided that the right was their decision – not coinciding with her decision. The problem was with one of their answers or the puzzle?

My aunt and her colleagues have stumbled upon a fundamental problem of machine learning, a discipline that studies learning computers. Almost all of the training that we expect from computers – and yourself – is to reduce information to the main patterns on the basis of which to draw conclusions about something unknown. And her mystery was the same.

For humans the task is to find any patterns. Of course, our intuition limits the range of our guesses. But computers have no intuition. From the computer's perspective, the difficulty in recognizing the patterns in their abundance: if there are an infinite number of equally valid patterns, which some are correct and some don't?

And this problem just recently moved to the practical level. Before the 1990s, AI systems rarely engaged in machine learning. For example, the chess computer Deep Thought, predecessor of Deep Blue, did not learn chess by trial and error. Instead, chess grandmasters and wizards of programming have carefully created the rules by which it was possible to understand how good or bad it turns out a chess position. This rigorous manual adjustment was typical of "expert systems" of that time.

To address the mystery of my aunt using the approach of expert systems, it is necessary that man squinted, looked at the first three rows of the examples and noticed the following pattern:

1 * (4 + 1) = 5

2 * (5 + 1) = 12

3 * (6 + 1) = 21

Then the man would give the computer the command to follow laws x * (y + 1) = z. Applying this rule to the last result, we have the solution – 96.

Despite the early success of expert systems, manual labor was required for their development, tuning and upgrades, has become prohibitive. Instead, the researchers drew attention to the development of machines that can recognize patterns on their own. The program could, for example, studying thousands of photos or market transactions, and deduce the statistical signals corresponding to the face or the surge of prices in the market. This approach quickly became dominant, and since then, is the basis of everything, from automatic mail sorting and filtering of spam to detect credit card fraud.

But despite all the successes, these systems MO require a programmer somewhere in the process. Take as an example the mystery of my aunt. We assumed that in each row there are three important components (three numbers per line). But there is a potential fourth element is the result from the previous line. If the string property is valid, then appears another plausible pattern:

0 + 1 + 4 = 5

5 + 2 + 5 = 12

12 + 3 + 6 = 21

By this logic, the final answer should be equal to 40.

What is true? Of course, both and neither. It all depends on what patterns are valid. You can, for example, to build the pattern by taking the first number, multiplying by the second, adding one-fifth of the amount of the previous answer and three, and to round all this up to the nearest whole (very strange, but it works). But if we are allowed to use the properties associated with the appearance of digits are the sequence associated with tick and lines. Search patterns depends on assumptions of the observer.

The same is true for MO. Even when machines teach themselves, preferred patterns are selected by the people: whether for face recognition contain explicit rules for if/then, or it must regard each feature as additional evidence in favor of or against each possible person who owns the face? What features of the image processing. Whether to work with individual pixels? And maybe, with edges between light and dark areas? Selecting this options limits what patterns the system is deemed likely or even possible. Search for this perfect combination and has become a new work of specialists in MO.


But the automation did not stop there. Similarly, as programmers tired of writing rules, and now they were reluctant to develop new opportunities. "Wouldn't it be nice if the computer could figure out what features he needs?" So they developed a neural network with deep learning – the technology of MO, able to draw conclusions about the properties on the basis of more than simple information. We'll feed the neural network a set of pixels, and she will learn to take into account edges, curves, textures – all without direct instructions.

Well, the programmers have lost their jobs because of the One Algorithm to Rule them All?

Yet. Neural networks were not yet ideally suited to every task. Even in the best cases, they have to adjust. The neural network consists of layers of "neurons", each of which performs calculations based on input data and outputs the result of the next layer. But the number of required neurons and how many layers? Should every neuron to accept input from each neuron of the previous layer, or some neurons need to be more selective? What conversion needs to be done every neuron on the input to issue the result? And so on.

These issues constrain attempts to use neural networks to new problems; neural network perfectly discriminating person, quite incapable of automatic translation. And again selected a man design elements clearly are pushing the network to certain laws, diverting it from others. A knowledgeable person understands that not all laws are created equal. While programmers will not remain without work.

Of course, the next logical step would be neural networks, can guess yourself about how much you want to include neurons which connections to use, etc. Research projects on this topic have been underway for many years.

How far can it go? Learn whether the machine to work independently so good that external adjustment will turn into an old-fashioned relic? In theory, you could imagine the perfect universal disciple – so that he can decide everything for themselves, and always chooses the best scheme for the selected task.

But in 1996 as an it specialist David Walpert proved the impossibility of the existence of such student. In his famous "theorems about the absence of a free lunch," he showed that for any patterns that the student is well trained, there is a pattern, which he will learn terrible. This returns us to the mystery of my aunt to an infinite number of patterns that can arise from finite data. The choice of the learning algorithm means selecting patterns that the car will handle poorly. Perhaps all tasks, for example, pattern recognition, will eventually fall into one comprehensive algorithm. But no one learning algorithm can equally well learn.

This makes machine learning an unexpectedly similar to the human brain. Although we like to consider themselves smart, but our brain also learns not perfect. Each part of the brain is carefully tuned by evolution to recognize certain patterns – be it what we see, the language that we hear or the behavior of physical objects. But finding patterns in the stock market we do not so well; there are cars we beat.

History of machine learning has many patterns. But the most probable is the following: we will train the machine to learn many more years.


RELATED MATERIALS: Science and Society
Возрастное ограничение