Google News Blog - The Official Blog from the team at Google news

Same Protocol, More Options for News Publishers

Wednesday, December 2, 2009 8:10 AM



There are more than 25,000 publishers from around the world in Google News today. That's because Google News is a great source of readers, sending publishers about 1 billion clicks every month. Each of those clicks is an opportunity for publishers, allowing them to show ads, sell subscriptions and introduce readers to the great content they produce every day. While we think this offers a tremendous opportunity for any publisher who wants new readers, publishers are the ones who create the content and they're in control of it. If they decide they don't want to be in Google, it's easy to do. Today, we're making it even easier with a web crawler specifically for Google News.

Publishers have always had the ability to block Google from including their content in Google's index. How? With something called Robots Exclusion Protocol (or REP) - a web-wide standard supported by all major search engines and any reputable company that crawls the web. When our crawler arrives at any site, it checks to see if there's a robots.txt file to make sure we have permission to crawl the site. With this file, or similar REP directives on specific pages, publishers can block their entire site, certain sections or individual pages. They can also give instructions on how they want us to index their content, such as telling us to exclude images or snippets of text. Furthermore, they can apply different instructions to different crawlers, giving access to some while blocking others.

The new Google News web crawler extends these controls to Google News. If they wanted to, it's always been easy for publishers to keep their content out of Google News and still remain in Google Search. They just had to fill out a simple contact form in our Help Center. Now, with the news-specific crawler, if a publisher wants to opt out of Google News, they don't even have to contact us - they can put instructions just for user-agent Googlebot-News in the same robots.txt file they have today. In addition, once this change is fully in place, it will allow publishers to do more than just allow/disallow access to Google News. They'll also be able to apply the full range of REP directives just to Google News. Want to block images from Google News, but not from Web Search? Go ahead. Want to include snippets in Google News, but not in Web Search? Feel free. All this will soon be possible with the same standard protocol that is REP.

Our users shouldn't notice any difference. Google News will keep helping people discover the news they're looking for, different perspectives from across the world and new sources of information they might not otherwise have found.

While this means even more control for publishers, the effect of opting out of News is the same as it's always been. It means that content won't be in Google News or in the parts of Google that are powered by the News index. For example, if a publisher opts out of Google News, but stays in Web Search, their content will still show up as natural web search results, but they won't appear in the block of news results that sometimes shows up in Web Search, called Universal search, since those come from the Google News index.

Most people put their content on the web because they want it to be found, so very few choose to exclude their material from Google. But we respect publishers' wishes. If publishers don't want their websites to appear in web search results or in Google News, we want to give them easy ways to remove it. We're excited about this change and will start rolling it out today. You can learn more about the details of this change on our Webmaster Central blog. If you see any problems or have any questions, please let us know.