Center for Strategic Assessment and forecasts

Autonomous non-profit organization

Home / Science and Society / Direction materials
To burn the library of Alexandria-2. GOOGLE has digitized 25 million books — why can't they be read?
Material posted : Administrator Publication date: 01-09-2017

A fascinating story about how the human naivety and greed strangled the most ambitious IT project of the Millennium project to digitize all of the books in the world. Published in The Atlantic, we offer a relatively short version.

Of ideas about the digitization of books and the ability to instantly search any text excerpts Google was born. Larry page and Sergey Brin conceived to create a search engine not the Internet, but from books. But it turned out differently, and the idea that to digitize all the books they returned only in the beginning of "zero".

Project to digitize all American first and then all books received the code name "Project Ocean". Even in the most Google those employees that were not involved in it, considered the idea as a poorly compatible with reality. Something like the current "wishlist" Elon musk to send humans to Mars. But the project was supported by page and Brin, so he, of course, be more than green light.

Since 2002, Google began eagerly to scan all the books to which he could reach. To do this, she agreed with the major libraries in the United States and organized a special scanning centers, in which books from the libraries were brought by trucks. This is not a figure of speech — logistics "Project Ocean" was no less complex than technical.

Yes, for the project, Google had to come up with a special "hardware" and "software" — in fact to at that time the goal of rapid scanning millions of books still has not decided.

A scanned book is rigidly fixed on a special stand, on top of it, watched a few cameras, and lidar ("three dimensional radar") has determined the exact position of the worksheet in the space below later special software took that into account and "squared her" crooked photographed sheets of paper.

Thus, Google has solved the biggest problem when digitizing books — their accurate fixing the scan to make it work smoothly and beautifully. Here, "headache" that people have, and the program and its algorithms.

Interestingly, for all the technological advanced stands for "scanning" books, leaves manually turned the people — machine couldn't do it fast enough and at the same time quite gently. After transfer into the digital format needed by old and very old books, contact with which we had very carefully.

The operator turned the page, pressed the pedal on the floor, the camera was taking pictures, he again turned up to a thousand times per hour.

By August 2010, Google spent on the project, a total of 400 million dollars. And announced that according to her calculations in the world 129 864 880 books. And she wants to digitize them all.

Here it is necessary to clarify that initially, Google didn't want to open full access to the books — the company's lawyers would never have allowed, they are not suicidal. The original idea was to provide the ability to search for all books with a demonstration of the user a small passage. Google's legal Department was sure that it falls under the definition of "fair usage" and looking too far ahead, we note that the judicial system will eventually, after many years of litigation, admitted that the company really have the right to such use of books.

Also worth mentioning that if in most European countries, the book becomes available for free on society 50 years after the author's death, in the US it doesn't work. The copyright law is that it is not published, no one has the right to publish again without solving all the issues with the author, publisher or heirs of their rights. That is, the book just lies there and collects dust, and to give her a second life, even digital, it is necessary to spend so much time and money that it is easier to do nothing.

When publishers and authors realized that Google is not kidding about the "get everything digitized", they are instantly excited. It's no joke — the company just took and copied the contents of the largest American libraries! Without asking permission from anyone except the libraries! In General, she filed a lawsuit — and a group of publishers and the authors Guild.

Later separate lawsuits were combined in one class action lawsuit filed on behalf of and to protect the rights of all authors and publishers in the United States. This is an important, one might even say the key point of the whole legal part of the story.

At some point all parties involved suddenly realized what I did, Google may open a new huge market of books, especially in the already out-of-circulation.

However, the lawsuit was filed, the court session went and with them came the understanding that if you let things drift and to bring it to its logical end, everyone will lose. So if authors and publishers will win in court, Google them something will pay and will stop to scan books, but would not open access to readers, since they do not have this right. If Google wins, she'll be able to show readers bits and pieces, but not to sell electronic copies of books entirely, because again, the laws forbid it.

And then the parties have in mind, probably the most ambitious in the history of the agreement on the settlement of a class action.

The peculiarity of the American judicial system is that during the consideration of class actions, representing the interests of one or more layers of companies, you can in court to "extend" the provisions of the laws. Provided that you do not intervene, the Ministry of justice and accepted by the judge hearing the case. The independence of the judiciary in all its glory.

For 2.5 years the lawyers of Google, libraries, publishers and the authors Guild were the most difficult negotiations that one of their members briefly but succinctly described as "four-dimensional chess" — it was necessary to take into account the interests of all parties.

The main problem faced by the negotiators was. OK, let's say Google makes a Grand online store of digital books, including those by authors who died long ago, the publisher closed and I do not understand who owns the rights. To pay the due fee? The establishment of rights to receive money in each case would be worth much more than any possible payoff. That is, purely economically, it was pointless.

But this problem was solved by inventing to create a single Agency, which would pay for all old books. The heirs of authors and publishers would to contact him for his share and part of the proceeds there would be spending on attribution. As would, of course, not all, the scheme makes economic sense — who still, "sponsored" would be those who would want to pay him. Moreover, rights holders and authors in any case would receive 69% of the price of ebooks, and Google would settle the rest.

Most importantly — when it would cost the standards of the American laws prohibiting the re-publication of books, the rights to which the snake had lost its strength and has not been re-decorated.

The grandeur of the agreement attracted the attention of the U.S. Department of justice, which launched an investigation and asked all who oppose this agreement, "speak now or be silent forever."

Of course, objections have been received. Microsoft and Amazon from the technological side, but also from several thousand authors, many of whom don't seem to fully understand the essence of the agreement. Against were expressed and many dear in the "book" community, the people.

In the opinion of the participants of those negotiations, active opposition to the transaction from the "authorities" decided the question — in the US Department of justice is unlikely to be heeded only the Microsoft argument that Google is "not fair" gets you access to all printed books ("Bo-Oh-Oh — the main competitor against!"); wouldn't listen to there and Amazon, which at the time controlled 80% of the market of electronic books ("BU-u-u — a monopolist in the market objected to the new player!").

As suggested by several participants of the negotiations, among those influential people that spoke out against the agreement, it was believed that a deal is wrapped up, but then the U.S. Congress will still make the necessary amendments to the laws. However, they did not realize that the lawmakers of some old books are not interested in the word "all": they will not win the elections and will not create new jobs. "They don't seem to understand how does the real world", — with bitterness says the participant in those negotiations.

In the end, the U.S. justice Department expressed his very authoritative opinion: the judge should not approve the deal because it a)goes beyond the essence of the lawsuit (and the lawsuit was whether Google to display extracts from books); b)too exclusive and creates a very bad precedent.

In fact — if Google agreed with the enemies-turned-partners in the process of settlement of class action, any other technology company to obtain the rights to create these electronic books would again go all the way. That is: to digitize books —> get sued by rights holders and the authors> agree with them. According to officials of the justice Department, it was no good at all. Specially to break the law to circumvent the law?! It's too much.

Well after the fact to include in the lawsuit as defendants Microsoft, Amazon and whoever else wanted to build your digital library the same scale — also had no chance. It really was quite the tough test for a system of collective claims of the United States, she would not have suffered.

In the end, the judge, the transaction is not approved, in its opinion quoted the U.S. justice Department.

Technically we won in the end, as we said in the beginning, Google it is allowed to show excerpts from digitized books. But lost everything. The readers do not have a huge digital library of all ever published books. Publishers and authors never had the opportunity to constantly obtain a little money from their sale. Google "freeze" spending in the amount of $ 400 million. Even winning, the company lost interest in his project and no longer scans books. Ended fuse

Today somewhere on the Google servers are 50-60 petabytes of digitized books. Here they are, just within arm's reach. But access to them is limited to a few engineers, responsible for no one else received these books access.

The last two paragraphs of the article are so good and from them becomes so painful that we simply translate:

I asked those who were doing it [in Google] earlier: "What must I do to make these books available to all?". I wanted to know how difficult it would be to open access to them. What stands between us and the digital public library of 25 million volumes?

"You would have a big problem [legal character] — said I — but all you have to do is write a single query to the database. So access would be switched from "Off" to "on". On the performance of such teams need a few minutes."

Vitaly Klapkowski


RELATED MATERIALS:Science and Society
Возрастное ограничение