What will the Library of Congress do with all those tweets?
Twitter is making its entire record of tweets, which are personal broadcasts of 140 characters or less, available to the Library of Congress (LOC). Although the people behind these messages range from the famous to the fatuous, the LOC is excited by this opportunity to gain a huge amount of data on what people are thinking as they react to extraordinary events and their daily lives (Stross, 2010).
Data dump
Twitter processes around 50 million tweets every day and plans to turn over all tweets since it began in 2006. In data terms, this isn't a huge storage problem; they estimate it will be less than five terabytes initially, which can easily be stored on drives that would fit into a standard file cabinet draw. However, to data miners this information is like anthropological and sociological gold.
Real-time research
One of the great gifts of this data is that it is users' first thoughts, unedited by significant time for reflection. In the Twitter world, if it happened more than 24 hours ago, the world has already moved on. In contrast, most historical records are filtered through the memories of individuals before being written down or published articles that have been edited for a newspaper or other publication. The immediacy of the text makes it unique in archival terms. The fact that it is all digital means that it is quick and easy to search and can be searched by multiple people all over the world.
Possible uses
Twitter is used mainly by people with smartphones that allow instant access to the Internet at low prices. Investment in a tweet, of both time and money, is small. Historians may be interested in how this section of society operates at this point in history. As all Twitter users know, you can receive posts on anything from what people are eating for breakfast to when their divorce becomes final. The defining characteristic is that they are all very personal moments that seem to merit being written down and shared—almost like a one-sided conversation. Combing the data for what is not included may be just as illuminating as cataloguing what is there.
Privacy concerns
Twitter specifically states that it is a broadcast service, existing only to make public what users choose to comment on to their followers (people who request all their tweets). Users are always identified by a user name of their choice, so anonymity is built into the system; still, Twitter will remove tweets from those users who have requested their highest privacy settings. Additionally, the LOC won't allow access to tweets until they are 6 months old, hoping to ensure that all possible privacy concerns are addressed. However, as Google, Yahoo! and Microsoft already archive tweets in real time for their search engines and are developing aggregator systems, the LOC does seem to be taking extraordinary measures to protect their integrity and reputation as the digital archive of choice.
Digital documentation
The LOC has been collecting and storing Internet data since the beginning of the World Wide Web (LOC Blog, 2010) and are actively exploring ways to preserve digital records for posterity. Their Web servers were almost overwhelmed by the interest following the announcement on the Twitter archive, which proves to all skeptics that people are intensely curious about our online lives.
All trademarks, registered trademarks, product names and company names or logos mentioned herein are the property of their respective owners. The use or display of any third party trademark, product name, company name or logo does not imply endorsement, sponsorship, affiliation or recommendation.



