It sounds like the stuff of futuristic science fiction, but article-writing robots are very much a 21st century trend. The practice of using algorithms to generate news content has existed for some time but the technology behind them has become increasingly sophisticated, leading to fears in some quarters that one day newspapers will be almost entirely staffed by untiring, uncomplaining computers. At present, algorithms are particularly suited to producing articles for statistic-heavy subjects like sports and finance but as our lives become increasingly dominated by data, computer writing systems will extend their range.
Narrative Science, a company that turns data into readable articles, is arguably at the forefront of the robot revolution. In the two years since it was officially launched, the Chicago-based team of 30 staffers has managed to attract illustrious clients such as Forbes with the quality of its automated journalism software. Developed by Stuart Frankel, Kris Hammond and Larry Birnbaum between 2009 and 2010, Narrative Science promises media and publishing companies “an innovative and cost-effective solution for creating high-quality, timely stories.”
Using Quill, software that integrates Artificial Intelligence and Big Data Analytics, the company is able to produce articles that read as though they were penned by a flesh-and-blood author. After collecting high-quality data, Narrative Science’s algorithms must then place that data within a wider understanding of the topic. The company’s engineers develop a set of rules that govern each topic, so that the computer systems are able, for example, to understand the criteria required for a team or individual to win in any given sport. The next step is to transform the resulting deductions into text, for which the company employs a team of "meta-writers" – journalists who work alongside the company’s engineers to produce a set of templates that give the story its “angle,” the most interesting element of the event it is writing up. To construct sentences, the algorithms draw on topic-specific lists of vocabulary provided by the meta-writers, and then place these sentences within pre-set article frameworks.
Clients are also able to select the tone in which articles are to be composed, flitting between straightforward reporting, irreverence, disbelief and breathlessness as the situation dictates. The result is a report produced within seconds of an event occurring, that is virtually indistinguishable from the work of a trained journalist. For each article of 500 words, Narrative Science charges $10; needless to say, a journalist with bills to pay and food to buy would not be able to compete with such low charges.
Unsurprisingly, algorithms have incited strong reactions from many quarters. Wired’s Steven Levy is not alone in thinking of them as “potentially job-killing technology,” while others fear the negative social impact of articles that could one day be tailor-made to suit the online interests and habits of each individual reader. Journalist and social commentator Evgeny Morozov fears that the desire to use algorithms to increase the individualisation of journalistic content would lead to readers being trapped in a “vicious news cycle,” prevented from engaging with different viewpoints and opinions. Eli Pariser, former executive director of MoveOn.org and author of The Filter Bubble, shares the same concern. During a TED lecture, Pariser warned that the increasing use of algorithms to tailor content to the interests of individual readers is leading to “a world in which the Internet is showing us what it thinks we want to see but not necessarily what we need to see.” He goes further, insisting that “If algorithms are going to curate the world for us and going to decide what we get to see and don’t get to see, then we need to make that sure they’re not just keyed to relevance, we need to make sure that they also show us things that are uncomfortable and important.”
Pioneers of the "robonews" technology are quick to assert that they are not in the business of replacing journalists. Professor Kathy McKeown, director of Newblaster, an algorithm-dependent news aggregation site developed by the Columbia Natural Language Processing (NLP) Group, insists that the technology she uses in her projects is not intended to do away with professional reporters. Newsblaster "reads" a number of news sources and synthesises them to produce a well-written summary of topics hitting the headlines each day.
In the 10 years that the automatic weblogging site has been running, reaction from the journalism world has been largely positive – probably because most reporters realise that without their articles Newsblaster would have very little to aggregate. Prof. McKeown has received several invitations to Columbia’s journalism school to discuss the site with students of Computer Science and Journalism who are interested in digital media. According to McKeown, Columbia’s new institute for data sciences and engineering will be interdisciplinary, involving multiple schools at the university’s campus, and could see an increasing number of prospective journalists being exposed to, and innovating with, algorithm technology.
Already there are signs that algorithms are being developed to perform ever more advanced journalistic functions. Funding permitting, McKeown would like to develop Newsblaster’s question-and-answer capabilities, saying: "One of the things that we’re doing now is that we’re using it for research, collecting the summaries that we built over periods of time to do other kinds of things. For example we’re using all the summary article pairs to generate data to be able to answer questions about events.” Ideally McKeown would also revisit a multilingual version of the news sight that was trialled a few years ago. For a period of four years, Newsblaster was able to draw from 10 or 15 different languages, then with the help of online translators present a summarisation page in English. Newsblaster would show a page of all the news in the world in English but would also allow users to drill down to the source language to see the original language.
Over at Narrative Science, Hammond and co. have even greater ambitions. The company’s founders are professors of both journalism and computer sciences at Northwestern Medill School of Journalism, and like true reporters hope that one day their software will be able to break major stories. Hammond has boldly claimed that “in five years time a computer program will win a Pulitzer Prize – and I’ll be damned if it’s not our technology.” Previously he has also suggested that by 2027 more than 90 percent of news will be written by computer algorithms. Although it may seem that Hammond is throwing down the gauntlet to traditional journalism, he is adamant that the world of reporting is, or will be, big enough for all participants. Rather than ousting journalists, algorithms will continue to cover subjects that journalists have neither the time nor the inclination to follow, such as local school sports events.
Still, for Hammond’s predictions of computer dominance to come to fruition, algorithms will first have to shake off the stigma that continues to hamper them. Examples of publications willing to admit using computer-generated content are rare. Although titles like Forbes are happy to admit to using articles provided by Narrative Science, the plethora of articles written about the company frequently mention that other clients in the news publishing business were unwilling to go on the record to admit being clients. News outlets perhaps fear that readers will not connect with articles lacking a genuine a human touch, no matter how much effort companies go to to mimic a human tone. Reader-aversion could be an impediment to any widespread use of algorithms to process non-statistical information in the future.
Regardless of whether or not they will one day be capable of producing award-winning prose, automatic writing systems could turn out to be something of a blessing for news organisations and their journalists. The Big Ten Network’s website, dedicated to university sports in the U.S. saw traffic increase by 40% between 2009 and 2011 after employing Narrative Science to write its post match reports. News titles struggling to attract both readers and advertiser revenue may ultimately find that turning to low-cost algorithm services could generate enough income to bolster editorial projects and employ greater numbers of (living, breathing) journalists.