The War Of Words Heats Up Between Google And Newspaper Publishers Wanting To Protect Their Online News Copy

Philip Stone March 14, 2008

Google continues to say that robots.txt gives newspaper publishers all the protection they need to stop Google accessing their online news, but the publishers, who have developed their own new coding system to give them more control, are getting ever more angrier that Google won’t play ball.

Putting the cat among the pigeons this week was Google’s Rob Jonas, European head of media and publishers partnerships, who told a media meeting in London this week that Google saw no reason to adapt the newly developed Automated Content Access Protocol (ACAP). The new system lets a web site block indexing of specific pages, or an entire site. It extends what was available from the robots. txt command developed in 1994 to block content on a server and the meta robots that were developed to allow page-by-page blockage.

Google says that newspapers that don’t want Google picking up their copy, for copyright or other reasons, just have to apply robots.txt, but publishers say robots is just a blocking tool that says either “yes or “no” whereby ACAP communicates automatically with the search engines, telling the search engine robots what they can do with each page of copy – publish it entirely, publish only extracts, or not touch it at all. But if the search engines robots don’t “talk” to ACAP, and Google reconfirmed this week it is not going to, then the system won’t work.

And that brought some really strong language from Gavin O’Reiily, President of the World Association of Newspapers and chairman on the media consortium that developed ACAP. “It’s rather strange for Google to be telling publishers what they should think about robots.txt, when publishers worldwide – across all sectors – have already and clearly told Google that they fundamentally disagree. If Google’s reason for not (apparently) supporting ACAP is built on its own commercial self-interest, then it should say so, and not glibly throw mistruths about.”

WAN says publishers in 16 countries have started to apply ACAP, but it might be not as many organizations are picking up on the new system as originally hoped, causing O’Reilly to send out a special message to WAN members at the end of February saying, “This is a decisive moment for the Automated Content Access Protocol, the new standard devised by the newspaper, magazine and book industries to protect our digital publishing interests and make us masters of our own content. We have done the hard work, we have defined a new set of rules for working online and now we need to ensure that they become a part of the Internet landscape.”

He added, “We started the ACAP project because we knew that we had to take responsibility not just for identifying the problem of managing permissions on the Internet, but also for providing the solution. We have successfully created a new protocol and tested it, and we are moving into the next phase of work.

“This is why I am writing, in my capacity as President of the World Association of Newspapers and Chairman of the ACAP Board. I want to make sure that you are aware of just how important it is right now to show your support publicly by using ACAP on your own websites. Whether you take advantage of the new rules which ACAP can be used to create, or just implement the standard in a "neutral" way without any immediate effect, is up to you. Either way, it is very simple to implement.”

The note had a flavor of “here’s what we have done for you so why aren’t you taking advantage of it” desperation behind it, although a WAN spokesman says not so. “It hasn’t been in the market very long and we’re currently informing publishers about it and making a case for implementation,” the spokesman said. “The response from publishers has been positive – there are arguably different ways one could approach this issue, but there is an industry consensus that something has to be done to address the issue. It isn’t only about Google -- there are hundreds and hundreds of sites that crawl content.”

Google maintains it does not see how publishers are hurt by the search engine promoting their news content and sending readers directly to those sites. Publishers take the view they want to decide what material Google may access.

To make its point of how Google actually helps newspapers, Jonas says that since last October when the Financial Times opened its site to Google News and made 30 stories a month available to everyone for free, that the site has seen a 75% traffic increase and it has gained an additional 230,000 registered users.Why wouldn’t other publishers want similar opportunities to increase their readership?

O’Reilly said that Google was part of “12 months of intensive cross industry consideration and active development” discussing what was wrong with robots.txt, so it seemed a little strange when Jonas told the UK’s Press Gazette there seemed to be a communications problem.

Asked whether traditional media consider Google to be the enemy, he said, “The one thing I have learned over the last couple of years is that most of those fears and concerns come from a misunderstanding. If we had time to sit down with them and explain what are our aims we could talk them through our way of doing things. But as it is we can’t really do that. It’s just a lack of detailed understanding over what we are trying to achieve.”

This couldn’t have been achieved during the 12 months of “intensive” talks with the ACAP consortium?

Jonas emphasized that all Google really does is drive traffic to news web sites around the world, and why should publishers complain about that? An executive for the UK Daily Telegraph’s web site last year put forward Google’s case -- “I want people to find Telegraph content in any way they choose. Be it through Google News, RSS, some obscure map mash-up I’ve never heard of (and need never become aware of), a link from a widget on someone else’s blog, I really don’t care. Come one, come all. The very idea of exclusion is ridiculous to any publisher with an advertising-based model that relies on traffic to pay the bills.”

It’s a point the ACAP people still haven’t really answered.

ftm followup & comments

no followup as of March 14, 2008

on March 14, 2008 Mark Bide ACAP Project Director wrote:

As the Project Director of ACAP, I hope you will allow me the space to respond to your final point.

Of course newspaper publishers can put all their content online, get lots of traffic and hope that one day the advertising revenue will be enough to provide an economic pay back. ACAP does not prevent the Telegraph from following an entirely open reuse policy for their site if they choose to do so; and nor does it prevent anyone else from following any policy they might choose to pursue.

But it does mean that not all publishers have to choose the same thing.

Does anyone – except perhaps someone who really rejects the whole of concept of copyright – believe that publishers don’t need to exert any control over what happens to their content? The same Telegraph commentator says elsewhere: "No one thinks that publishers shouldn't be able to control their content, everyone is free to publish online or not and once it's published no one but a few extreme libertarians would say that the text is entirely in the public domain for anyone to do what they want with."

However, many people and companies do indeed do whatever they want with it, treat the content exactly as if it were in the public domain – and build complete businesses on that basis. How are publishers supposed to exercise this control which he believes publishers should have?

Is it just down to the choice of choosing to publish online or not?

The whole point of ACAP is to expand the options beyond just that simplistic "all or nothing" scenario, and encourage more of those who currently choose "nothing" to get online. Newspaper publishers might not be able to imagine not being online, but those publishers who don't create a new product every day, and don't sell millions of copies (or page impressions) choose not to be online because it would be commercial suicide. Wouldn't it be good if all their content was accessible online too?

Take a look at a typical log for a large website and you’ll find maybe 300 spiders accessing the site, most of them ignoring robots.txt and only perhaps five of them delivering any significant, measurable traffic to the site owner. What are they all doing? How is it benefiting the publisher? Is it fair, legal or reasonable? How can the publisher even find out?

ACAP isn't simply about search engines but it started there because that is where the biggest current issue arises. It was started, and funded, by publishers because doing something practical about a problem is far better than just complaining about it. Nobody pretends that it currently does everything that it one day will, nobody pretends that it's perfect. The ACAP pilot project was entirely open, invited collaboration and input from anyone who wanted to offer it, was intended to prove the concept and the technology and it produced the first version which can now be implemented by anyone. It will continue to grow and develop in response to the needs of the stakeholders, and is still open for input and comment by anyone.

Post your comment here

The War Of Words Heats Up Between Google And Newspaper Publishers Wanting To Protect Their Online News Copy

Google continues to say that robots.txt gives newspaper publishers all the protection they need to stop Google accessing their online news, but the publishers, who have developed their own new coding system to give them more control, are getting ever more angrier that Google won’t play ball.

related ftm articles

ftm resources

ftm followup & comments