This is a companion post to my earlier “Back. Up. Your. Stuff.“, where I said
You possibly have a lot of stuff out in the Web, scattered among social networking sites, blogs, Flickr, Google apps … and some of you don’t have local copies or backups of it, right? […] This is just a friendly reminder that you really should consider what you’re gonna do if any of those sites shut down or for some other reason become inaccessible.
But what if you don’t want that stuff out there any more? Maybe it’s embarrassing or confidential; or you’re being stalked or your identity has been stolen; or you’re just tired of a site and want to disappear from it.
Well, you’ve got a problem, as Bruce Schneier points out in “File Deletion“.
The days of your data just being on your local, non-networked computer are long gone. You have stuff in Google Mail; your phone’s email and SMS logs; Flickr, Facebook, and Twitter; your blog posts and comments on other blogs; online backup and file sharing services like CarbonCopy and DropBox; media services like iTunes and the Kindle. (While not directly related to this post’s topic, think about all the information about you stored in corporate databases of all the companies you’ve ever done business with.) There are several issues here.
First, just because you close out an account doesn’t mean that the provider immediately, or ever, deletes your stuff. Schneier points out that companies aren’t generally interested in doing so — notice that they usually use terms like “deactivate” or “disable”, rather than “delete”, if you’re given an option at all — and as an example links to the complicated procedure (formerly?) needed to “really” delete a Facebook account.
Second, Google, the Internet Archive, and other entities cache web-accessible data. How often have you seen comments like “this link is dead, but you can see a cached copy here”? Also, individuals will make personal copies, or even public mirrors, of your stuff that they find interesting.
Third, any responsible company makes backups of their data; so even if your data has really been deleted from the live system and the various caches here and yon have expired, it may linger in backup sets for days, months, or even years. For some types of data, the company may be legally required to retain it — for example, phone records, or email exchanged with a financial service.
Fourth, businesses get taken over by larger companies, sell off divisions, go into bankruptcy, or otherwise crash and burn with their assets (including data) sold to the highest bidder. Terms of service (TOS) often don’t survive such transitions, no matter how well intentioned or how much they guarantee your privacy.
Finally, it may not be the provider that crashes, but you. What happens if you can’t access the service any longer due to life circumstances, get “banned” by the provider, or even die? You (or your family) have lost control over the account and there may be no simple way to delete it or restrict ongoing public access to it.
Schneier mentions a research project that attempts to address some of these issues:
Vanish is a research project by Roxana Geambasu and colleagues at the University of Washington. They designed a prototype system that automatically deletes data after a set time interval. So you can send an email, create a Google Doc, post an update to Facebook, or upload a photo to Flickr, all designed to disappear after a set period of time.
However, Vanish doesn’t address the problem where a recipient simply copies the content into a non-protected form.
The bottom line is that once you share or publish something, or simply store it on a system you don’t own, you lose some, or all, control over it. Just as with backing up your data, you have to think of the consequences, then decide how much you are willing to trade convenience for loss of control.
Update 2009-09-17: also see “Liberate your stuff”
image: B.gliwa, Axt Handwerk.jpg, Wikipedia
I can imagine that standing at the pearly gates might resemble participation in some of the long standing chat board communities… as St. Peter points out an archived post made on October 19, 1992 at 3:08 am… then explains that although the ID isn’t directly identifiable, a cross match of spelling errors and gramatics had been made and you were in fact using a sock.
(Ficticious date.. in case anybody is searching.)