Data Mining: "Well, It's Not Like I've Got Anything to Hide...!"

…is what most people say as they unsuspectingly trawl the Internet shopping or as members of social networks—to them, there’s nothing wrong with divulging their personal details. But there’s one thing we all need to be clear about: the more information we voluntarily give up about ourselves, the more we are playing into the hands of those that wish to keep us on a leash, whether as a consumer slave or a mindless victim of political manipulation. How else do you explain why Facebook is already said to be worth 100 billion dollars?

Internet data gobblers move stealthily.

Internet data gobblers move stealthily.

Google is currently bringing in a quarterly profit of more than two billion dollars; other computer and Internet companies are also raking it in and are worth plenty of money: the share value of Facebook, for example, hit 33 billion dollars in the summer of 2010. On 2 December 2011 this ­figure had already tripled, as NZZ Online announced (“Facebook worth 100 billion dollars”). Think there’s nothing extraordinary in that? Well, if you consider that both companies offer services free of charge, one begins to wonder where this astronomical sum comes from.

Most of us know that “it’s got something to do with advertising”, but only a handful of people understand precisely how that works. To come straight to the point: we pay for all these practical, seemingly free services with our personal details—e-mail addresses, profiles, postcodes, phone num­­­bers, photos, etc., which are worth a whole lot of money. We have long been accustomed to having to enter various personal data on all sorts of things and of course we all know that we are leaving behind us a massive data trail on many everyday activities. Most of the time, this doesn’t seem much of a problem: “What can they do with my name, address or date of birth?” we shrug. While this might be true at first glance, the technical possibilities are far more advanced than we realise and “they” know considerably more than we think.

A Wide Selection of Tiny “Cookies”

The first step in the merry process of data collection is normally to use small files called cookies, which are often stored on the visitor’s hard drive as they visit a website. The cookie serves mainly to differentiate among individual visitors and recognise them when they revisit the website. The cookies can, however, also tell the website operator which areas the visitor is using, how often s/he does so, as well as for how long. Furthermore, it’s not only the website’s cookie files that can be stored on the user’s hard drive, but also cookies from the operator’s advertising partners. Thus in a very short time a whole load of these little files can accumulate on the hard drive, largely unnoticed. In themselves, the cookie files—if they are sold, for example—provide only small morsels of information and do not contain any names. However, very little is needed to be able to assign a name to this data i.e. find out who it‘s about. If you link the cookie files to an existing database that contains information as to your date of birth, gender and postcode, for example, it’s all over. Throw some additional cookies from partners into the mix and a browser’s behaviour can be tracked through several websites. Add to this information about your browsing habits, buying preferences or even computer settings, and even psychological and geographic analyses of the user are possible.

But herein lies the problem. As a result of all this collecting and linking, apparently harmless information gets turned into an individual dossier with a precise behavioural and purchasing profile, which can be used to target consumers very specifically. The advertising industry is able to enjoy much more direct access to its potential customers. If you think you know how to resist the allure of advertising, think again. Our data is worth billions to them…

The problem is that the industry can now send customised offers straight to our homes—postage-free. When we are sitting wearily in front of the computer, it is very tempting to just take a quick look at these offers—the ones that correspond precisely to the things for which we have a weakness. And as it only takes a couple of clicks to make a purchase, the road to shopping addiction is a very short one. Not to mention the fact that online shopping doesn’t limit us to real money at hand (cash) …
The boundaries for scamming users are often blurred, and ethical considerations are casually tossed overboard in favour of profit and enhancing the company value1. Although we willingly give up our details sometimes (something that is also skilfully encouraged), this is only the tip of the iceberg. Most of the time, we don’t have a clue who is collecting what data about us and to whom they’re forwarding i.e. selling it. If a company goes under or is sold, its data warehouse becomes a hot commodity—and we lose complete grip on where our details are vanishing to on the World Wide Web.

Moreover, it’s not only the companies and advertisers that profit from this collection mania: the more accurate the data sets, the more detailed the information, the more attractive they are to criminals—and the more sophisticated data crime becomes. Data to do with payment information, for example at hotels, shops or Internet stores, is particularly at risk, while data leaks and loss occur fairly regularly and the public remains blissfully unaware.

All Power to Algorithms

Algorithms are sequences of arithmetic statements by which a computer operates, for example with table calculations. Algorithms are also mathematical methods by which to conduct new findings and classifications. As such, not only can they be used to calculate the household budget, for example, they can also turn your computer into an oracle provided they are fed the right data. In the data we leave behind on the Net, our personality is broken down; our preferences, habits, purchasing characteristics, etc. Algorithmic calculations can help to generate a digital shadow of a person. Although this is “merely” a blurred reflection of our own identity, it assumes huge significance in various places: as mentioned, advertisers are able to “serve” us in a tailored way, companies can better capture the ‘quality’ of (future) employees, and authorities can better manage their citizens. Last but not least, it is also possible to compile prognostic values, so that an insurance company can estimate, for example, how much a policyholder will cost in the future.

These prognostic profiles are of course even more valuable than the current profiles, resulting from analysis of the cookies. An intelligent analysis of the information left behind by billions of Google users even makes it possible to predict stock market movements and to recognise where one should invest strongly in the near future! The value of this kind of information monopoly is massive!

But if we’re not careful, this digital shadow will loom ever larger until under certain circumstances it develops a life of its own, whose effects we can no longer even guess at. No one knows the long-term risks of a life in which most of its information is digitalised and stored. Digital data is very movable, you see; it glides rapidly and swiftly through the Net. Yet it also remains stubbornly in place for a long time across many sites. There are already people whose digital shadow has become longer than they feel comfortable with, and who now wonder why they never got that planned apprenticeship or job. In the meantime, it’s an open secret that two thirds of HR managers routinely consult the Internet to learn more about their candidates. So maybe posting that photo taken on the last holiday to Mallorca wasn't such a good idea after all…!

Incidentally, algorithms are adaptive. And their learning improves the more data material they get. Therefore it makes no difference if new data confirms or refutes previously calculated values, for example a person’s characteristics. The computer also learns through error and adapts the formulas, so any search query is not only a question in Google, but also an answer…