WAN-IFRA

A publication of the World Editors Forum

Date

Wed - 23.05.2012


Google News Archive Search now has four times the number of articles

Google News Archive Search now has four times the number of articles

MG Siegler at TechCrunch has stumbled across an announcement on the Google News Blog that the news search site has quadrupled the number of newspaper articles in its archives. While the company fails to give exact figures for how many pages are currently available, the original total of "millions" caused Siegler to quip it is now somewhere in the range of "millions times four."

The News Archive Search goes back over two centuries, with links to digitized copies of the original texts for the older publications (where available, Google links to the newspapers' own online archives). Newspapers added in the most recent update include the Sydney Morning Herald, the Village Voice and the Manila Standard.

While the vast repertoire of information is impressive, claims of the complete search-ability of the texts appear to be misleading. For instance, the Google blog boasts of an edition of the Halifax Gazette dating back to July 2, 1753, as one of the oldest documents in its records. Yet, a search for "Halifax Gazette" in the archives finds nothing for the years 1750-1759. Nothing again for "Halifax" or "London Gazetteer," which is mentioned in the first line of the newspaper's front page. The article finally appears for a search of "Proclamation," the headline of an announcement within the text that appears in the view of the edition the Google blog links to. If the texts are not completely searchable, will users actually be able to find them?

As Siegler points out, the benefit for Google in bringing these newspapers to the digital masses is that it can sell Adsense ads against the texts. The ad connections can be a bit ambiguous, but Google must make some profit. Is this revenue that newspaper sites could be making? The Village Voice's own online archives only go back as far as 1997, so anything before that is in Google's domain. In contrast, publications like The New York Times and the Chicago Tribune, with complete digital archives, have a paywall for articles appearing before a certain date. Perhaps certain newspapers lack the resources to create a full archive of all editions from the days before Internet. Still, it is disappointing Google can make money off the content if there was any possibility the newspapers could have profited.

Google's archive is not the only digital repository of newspapers. The British Library recently put up two million pages of papers from 1800-1900, and the Library of Congress project Chronicling America with newspapers from 1880-1920 announced its one millionth page at around the same time. Both comprise texts that are completely searchable, as well as tips for use and a full account of what the archives actually contain, although neither is quite as extensive as Google's catalog.

For the general public, the ability to peruse historical information to which they had no prior access is clearly a good thing. Google's efforts to democratize the world of archival search are thus quite commendable. What will the next announcement be? Millions of pages times four, times four?

Source: TechCrunch, Google News Blog


Links

Author

Liz Webber

Date

2009-08-04 17:54

The World Editors Forum is the organization within the World Association of Newspapers devoted to newspaper editors worldwide. The Editors Weblog (www.editorsweblog.org), launched in January 2004, is a WEF initiative designed to facilitate the diffusion of information relevant to newspapers and their editors.


© 2012 WAN-IFRA - World Association of Newspapers and News Publishers

Footer Navigation