Loose tweets: an analysis of privacy leaks on twitter
Twitter has become one of the most popular microblogging sites for people to broadcast (or "tweet") their thoughts to the world in 140 characters or less. Since these messages are available for public consumption, one may expect these tweets not to contain private or incriminating information. Nevertheless we observe a large number of users who unwittingly post sensitive information about themselves and other people for whom there may be negative consequences. While some awareness exists of such privacy issues on social networks such as Twitter and Facebook, there has been no quantitative, scientific study addressing this problem. In this paper we make three major contributions. First, we characterize the nature of privacy leaks on Twitter to gain an understanding of what types of private information people are revealing on it. We specifically analyze three types of leaks: divulging vacation plans, tweeting under the influence of alcohol, and revealing medical conditions. Second, using this characterization we build automatic classifiers to detect incriminating tweets for these three topics in real time in order to demonstrate the real threat posed to users by, e.g., burglars and law enforcement. Third, we characterize who leaks information and how. We study both self- incriminating primary leaks and secondary leaks that reveal sensitive information about others, as well as the prevalence of leaks in status updates and conversation tweets. We also conduct a cross-cultural study to investigate the prevalence of leaks in tweets originating from the United States, United Kingdom and Singapore. Finally, we discuss how our classification system can be used as a defense mechanism to alert users of potential privacy leaks.