Curation, Aggregation, and Web 2.0
Tools for curating, sorting, and managing web content usually take the form of social aggregators such as Digg or Reddit. The act of curating is not one of careful selection by a trained expert, but rather the weighted consensus of the masses promoting or up-voting content they find notable.
Web 2.0—nay, the entire information profession—has a problem; the barriers to information creation and storage have fallen in recent years. This has resulted in the amount of information on the Web proliferating beyond all expectations. Finding the right information among the endless supply of trivial and irrelevant data has become almost impossible. The rational response would be to trust our curation to trained professionals, able to disseminate and sort through this wealth of information and categorise it based on merits of accuracy and quality.
Instead, popular aggregators and the wisdom of crowds have emerged as the determining values of qualitative merit on the Web.
There is a very real risk that the Web—the most powerful source of knowledge available—is mislabelling, misrepresenting, and misplacing important data, and being unable to distinguish it from the unfiltered noise of the masses. We have trusted the most important resource in human history to the collective rule of enthusiastic amateurs.
This pollution of data poses a threat of eroding and fragmenting any real information stored on the Web. Users have come to rely on the anonymous and amorphous ‘rest of the Web’ as their authoritative filter. Content aggregators remix information drawn from multiple sources and republish them free of context or editorial control. These aggregated opinions of the masses are vulnerable to misinformation as users have too much control and too little accountability. The risk of aggregating information is the risk of privileging the inaccurate, banal, and trivial over the truth.
Digg.com, founded in 2004, was the first notable aggregator of Web 2.0 content. Voting content up or down is the core of the site: respectively ‘digging’ and ‘burying’ material based on contributors input. This supposedly democratic system allows content of merit to be promoted and displayed. But, this assumes that all opinions and user-generated regulations are equally valuable and relevant in determining merit.
The collective judgements of a group—the clichéd ‘wisdom of the crowds’—can be an effective measure of certain types of quantitative data. Called upon to guess at the number of jellybeans in a jar, the aggregated guesses of a thousand contributors would provide a relatively accurate figure. However, if that same group was called upon to disseminate the value of a news story their opinions would not represent a collective truth about the value or merits of the piece. The voting process of Digg or Reddit is transparent and instant, and causes contributors to cluster around popular opinions—promoting sensationalism and misinformation. Content that grabs the attention of users will quickly be promoted and rise to be seen by more users, regardless of its accuracy.
The momentum of a popular story is exponential: the more users see something, the more popular it becomes—exposing it to even more users. The infinite shelf-space and shelf-life of the Web means that once a piece of information has seen any exposure it almost impossible to control. Instantly a lie can spread across the Web by the zeal of its promoters, and be cross-referenced by a dozen news aggregators. Lies become widespread and pollute enough aggregation sites that they become the valid—supposedly authoritative—result of any Google search on the topic. The wisdom of the crowds is fickle and closer to a mob mentality; it is impossible to aggregate their wisdom without aggregating their madness as well. After all, there is a fine line between the wisdom of crowds and the ignorance of mobs
However, non-trivial and important content is still being created, promoted, and viewed on the Web; aggregated information services do capture these notable pieces of data in their trawling. In practice an old problem remains: time and effort must be manually expended to sort out the real information from the useless noise. Exactly the sort of time and effort that professional curators, librarians, and information professionals were traditionally employed to expend.
Digital media theorist Andrew Keen, in his book The Cult of the Amateur (2007) likens the community of Web 2.0 to evolutionary biologist T.H Huxley’s humorous theory that infinite monkeys on infinite typewriters would eventually create a masterpiece such as Shakespeare. Keen sees this infinite community of empowered amateurs as undermining expertise and destroying content control on the Web. He argues that their questionable knowledge, credentials, biases, and agendas means they are incapable of guiding the public discourse of the Web with any authority at all.
Another perspective on this comes from the 1986 book Amusing Ourselves to Death, wherein television commentator Neil Postman theorised about the erosion of the public discourse by the onslaught of the media. He frames the media in terms of the dystopian scenarios offered by Huxley’s grandson—science fiction author Aldous Huxley—in the novel Brave New World, and compares them to the similar dystopia of George Orwell’s 1984:
‘There are two ways by which the spirit of a culture may be shrivelled. In the first—the Orwellian—culture becomes a prison. In the second—the Huxleyan—culture becomes a burlesque’ (Postman, 1986, p.155).
In one dystopia, Orwell feared those who would deliberately deprive us of information; in another, Huxley feared those who would give us so much information that the truth would be drowned in a sea of irrelevance.
And, the culture of Web 2.0 is essentially realising Huxley’s dystopia. It is cannibalising the content it was designed to promote, and making expert opinions indistinguishable from that of amateurs.
User-generated content is creating an endless digital wasteland of mediocrity: uninformed political commentary; trivial home videos; indistinguishable amateur music; and unreadable poems, essays, and novels. This unchecked explosion of poor content is devaluing the work of librarians, knowledge managers, professional editors and content gatekeepers. As Keen suggests ‘What is free is actually costing us a fortune. By stealing away our eyeballs, the blogs and wikis are decimating the publishing, music, and news-gathering industries that created the original content these Websites ‘aggregate’ (Keen, 2007, p.32).
In a world with fewer and fewer professional editors or curators, knowing what and whom to believe is impossible. Because much of the user-generated content of the Web is posted anonymously—or under pseudonyms—nobody knows who the real author of much of this self-generated content is.
No one is being paid to check their credentials or evaluate their material on Wiki’s, aggregators, and collaboratively edited websites. The equal voice afforded to amateurs and experts alike has devalued the role of experts in controlling the quality and merit of information. So long as information is aggregated and recompiled anonymously then everyone is afforded an equal voice. As Keen dramatically states, ‘the words of wise men count for no more than the mutterings of a fool (2007, p.36)’.
We need professional curation of the internet now more than ever. We need libraries and information organisations to embrace the idea of developing collections that include carefully evaluated and selected web resources that have been subject to rigorous investigation. Once upon a time we relied on publishers, booksellers, and news editors to do the sorting for us. Now, we leave it to anonymous users who could be a marketing agency hired to plant corporate promotions; it could be an intellectual kleptomaniac, copy-pasting other’s work together and claiming it as their own; or it could be, as Keen fears, a monkey.
Without professional intervention, the future is a digital library where all the great works of human history sit side-by-side with the trivial and banal under a single, aggregated category labelled ‘things’. And we would have no-one to blame but ourselves.