Monday, September 8, 2008
Aggregators like Google News were today the subject of news – this time pertaining to a 2002 story regarding United Airlines. For those of you who are interested in the nuts and bolts of why this was indexed by Google News, here are the details:
On Saturday, September 6th at 10:36PM Pacific, the Google crawler discovered a new link on the Florida Sun-Sentinel website in a section of the most viewed stories labeled "Popular Stories: Business." The link appeared in that section sometime after Googlebot's last crawl at 10:17PM; because the crawler saw this new link appear, it followed it to an article titled "UAL Files for Bankruptcy."
The only date found in the context of the article indicated that the article was from September 7, 2008.
The article was indexed and then available through Google News search, but was not shown on our headlines pages.
We removed this story from the Google News index as soon as we were notified that it had been linked to in error.
It has been widely reported that many readers were unable to determine the original date of publication of this article, and our crawling was similarly unable to recognize that the article was old.
Since our last post, some have asked why Google News didn't recognize that an old story relating to United Airlines' 2002 bankruptcy was outdated. We thought that a brief chronology would be helpful.
On Saturday, September 6th at 10:36 PM Pacific Daylight Time (or Sunday, September 7th at 1:36 AM Eastern Daylight Time), the Google crawler detected a new link on the Florida Sun-Sentinel's website in a section of the most viewed stories labeled "Popular Stories: Business." The link had newly appeared in that section since the last time Google News' Googlebot webcrawler had visited the page (nineteen minutes earlier), so the crawler followed the link and found an article titled "UAL Files for Bankruptcy." The article failed to include a standard newspaper article dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of "September 7, 2008" (Eastern).
Because the Sun-Sentinel included a link to the story in its "Popular Stories" section, and provided a date on the article page of September 7, 2008, the Google News algorithm indexed it as a new story. We removed this story as soon as we were notified that it was posted in error.
While we don't know why the Sun-Sentinel's website included the link in its "Popular Stories" section, our timestamps show that Google News first crawled the UAL story after following the link from the Sun-Sentinel's "Popular Stories" box:
- At 10:17:35 PM/PDT, our crawler retrieved a copy of the Sun-Sentinel business section page.
As you can see, no UAL story appears at this time.
- At 10:36:38 PM/PDT, our crawler retrieved an updated copy of the same section. This updated version included a new link in the "Popular Stories: Business" section to a story titled "UAL Files for Bankruptcy."
- At 10:36:57 PM/PDT, our crawler followed the new link and fetched this copy of the UAL story.
At that point, our index was updated to include the article with the date that the story was crawled, and the story became searchable on Google News.
- At 10:39:57 PM/PDT, the Sun-Sentinel received its first referral to the UAL story from Google News, with a user clicking on a Google News link to the Sun-Sentinel's UAL story.
The Tribune Co. (owner of the Sun-Sentinel) has confirmed in its September 9, 2008 press release that the first referral from Google News to the article came after the UAL story appeared in the "Popular Stories" section.
We hope that this sheds some light on the situation from our perspective.