The Internet is Watching You


The owner of the corner shop has known you for a long time.  He knows what you eat, that you like to drink Italian wine, and that you usually watch action movies on Sundays. That’s how he can offer you things that you need, like a new crime thriller, the perfect bottle for your next party, and reserved bags of your favorite snacks when you forget to order them.
What sounds like a pleasant community store in the past occurs every day on the Internet. Our “corner shop owner” is not behind the counter, but instead runs a successful online business that offers exactly what his customers need.  He might have had to know you personally 30 years ago, but today the business’s computers simply have to analyze your online visiting habits.

Now imagine that man in the store is following you around, reminding you to buy a gift for the party you said you’d be attending on Facebook last week. Or imagine that he’s seen your status updates about starting a diet, and starts telling you about the store’s low-fat foods section. This is pretty much what’s happening online these days. GLOBE shows how online shops today use advanced Deep Packet Inspection to screen customers such that they can offer exactly what the customer wants. We also give you the lowdown on how behavior-based advertisements work with behavioral targeting.
Online shops collect data en masse

Online shops such as Amazon swear by one rule: get to know everything about our customers. The more information it has, the more specific its user profiles will be, and the more effective its advertisements. Thus, products that one has viewed on Amazon influence the display of others. For instance, if someone buys a Wii game console, he will be offered accessories for it in the future.

Analyzing surfing habits

Behavioral Targeting techniques are an evolution of this idea, which many marketing professionals consider a wonder weapon. Behavior-based advertisement displays take into account where the user comes from, which websites he has visited previously, and what he has clicked on.

For a long time, Google’s AdWords service has been displaying advertisements after detecting keywords on a web page. However since March 2009, the search giant has also been offering behavioral targeting and can display specific advertisements to groups of people. For instance, if a user has been browsing through a sportswear website for a football shirt in August, he might be shown ads for another website with Christmas offers on similar products in December.  Google itself describes its technique as using cookies which save tracking information on users’ computers.

Online shops also apply marketing tips from the real world. For instance, if a retailer wants to attract only well-to-do customers, leaflets with attractive offers are only put in mailboxes in upmarket areas with well-situated residents. Similarly, one can use geolocation information to analyze the place of origin of a surfer and recommend specific offers to him or her.  The coordinates obtained through IP address identification on the Internet are very fine-grained, but  modern cellphones and certain desktop browsers now supply precise GPS locations, which can even be used to guess the financial behavioral pattern of any surfer.

In-depth analysis divulges too much information

Deep Packet Inspection, or DPI in short, is a technological continuation of this personalized advertisement strategy.  While theoretically a surfer can avoid behavioral targeting by not allowing any cookies, DPI traces a user’s activities on the Internet as if he or she is under surveillance. In theory, every website that is called up can be recorded; every mail can be scanned in real time—and with the help of keywords found in these, an individual profile can be created through which advertisers can send users specific offers. For instance, if an advertiser detects a number of messages to a car dealer from a customer inquiring about certain accessories, advertisements for those very products can be inserted in advertisement spaces as he or she browses the Web. However, online shops cannot use DPI by themselves; they need Internet service providers to offer it, but they seem to be cautious of violating user privacy agreements. Governments will soon be forced to formulate policies to regulate this practice.

DPI can be misused, but there are no cases that could be cause for any alarm at present. It possible for providers to analyze data traffic, and manipulate it as well—just like cybercriminals do when they attempt to send malicious code to a victim.
When a user calls up a website, he receives more than just its source code. The Internet service provider can use DPI to slip in JavaScript that displays an advertisement, even if the website owner designs his/her website advertisement-free. Spyware installed on your computer can also do this. In the worst case scenario, a website owner is not even aware that an advertisement has been embedded into his site. ISPs could also determine which users are generating the most peer-to-peer file sharing traffic, and which are using their service mostly for email, leading to bandwidth throttling.

DPI has a negative connotation since it is used for monitoring and manipulating specific Web content—countries such as China and Iran use it to filter and censor the Web, which is alarming for free-speech advocates and political campaigners.

- Advertisement -

Web 2.0 follows specific identities

Social networks are also ideal data sources for marketing professionals. Data collectors have been known to make the most of Facebook with its open API. One can program applications that convince users to grant them access to personal information, including details about their other friends. Other less ethical means include persuading people to add a fake profile as a “friend”, thereby granting it access to more of your user profile, which most people leave totally visible to their friends. Through the Facebook API, programmers can access information about members, including details such as their employers, religious affiliations, and sexual orientation. According to the Facebook developer Wiki, applications can access over 50 sets of user information—which is interesting for marketers and hackers alike.

While Facebook is a superb example, all of this also applies to other services that identify individuals, such as OpenID and Google Accounts. These let users log into dozens of websites with a single username and password. For example, with a valid Facebook account, members can use the Facebook Connect system to log in to the video sharing portal Vimeo which also lets you publish your “likes” on your wall. This is easy for users and opens up new ways for companies to court customers if they are ethical. Online shops are experimenting with ways to display products that friends have bought or looked at often (although this famously spoiled many people’s Christmas shopping surprises when Facebook demonstrated the capability with its highly criticized and short-lived Beacon advertising program in late 2007).

Users become advertising figures

People are more receptive to recommendations from friends than from strangers, so companies try reaching customers personally by creating so-called fansites. Any user can, for instance, become fans of products, people, companies, and even designs. With Facebook’s Open Graph tool, companies even have the opportunity to put advertisements on external websites to receive testimonials from members of the fansite, and gain advertising exposure through the profile picture.

Data and the Google juggernaut

Of course no discussion of privacy online is complete without analyzing Google’s data-mining habits. The search giant is in a position to use its multiple online properties to gather amazing amounts of information, and possibly even link these profiles to individuals in the real world. The company’s motto has long been “Don’t be evil”, but it’s difficult to ascertain what exactly the company considers to be within this limit and what is too much. Incidents of anti-Google dissent are growing more common, from strangers being able to follow you on Google Wave, to protests in the publishing industry against the mass digitization of books, to rumblings of antitrust cases because of the company’s dominance in online advertising. Jeff Jarvis, blogger and author of the book “What would Google do?” sharply criticizes the company for a product called Sidewiki which collects user comments about websites and saves them on Google servers. The site operators themselves, and the furious Jeff Jarvis, have no control over it. Google copies entire libraries, and has detailed photographs of the entire planet, covering all countries and cities, many streets and houses, the oceans, the Moon and Mars. Google offers an operating system for mobile phones, and soon there will also be one for netbooks. Google says “It is our mission to organize the information of the world and to make it accessible and usable worldwide”.

Even the Chrome browser doesn’t have a clean record when it comes to privacy—it identifies each user with a unique ID. The ID is purged when a user first downloads an update, but it should not be there at all.

Brilliant ideas underlie most Google services. They are easy to use, technically solid, and best of all, they’re nearly all free. Google does a lot of good as a company by investing in alternative energy production and giving employees an allowance if they buy a hybrid car. But Google is also greedy for data. It commands the largest Web index available, and has insight into every website, photo and video.

Google tracks 80 percent of all websites

One can hardly elude Google today. It does not help if you stop using Google Search, YouTube, Picasa or even the services requiring registration like GMail, Docs and Calendar. With its astoundingly wide network, Google is present on 80 percent of all websites—for lay persons often invisibly. After its acquisition of advertising network DoubleClick, around half of all ad banners on the Web originate from Google servers. The more inconspicuous, but still more widely spread text ads come from Google AdWords as well. The Google Analytics service works completely secretly, allowing website operators to analyze the click-paths of their visitors. Whenever a surfer lands on a site that uses this service, Google sets a cookie with a unique ID and records his or her IP address. Thanks to its super dense network, Google can then see exactly who moves how on the Internet. Every click or search query generates a  log entry with an IP address and unique cookie ID as well as a time stamp. The log file of YouTube until mid 2008 alone was over 12 Terabytes in size.
For database security as well as privacy concerns, the different databases for each Google service are not necessarily tied to each other, but it is technically possible and Google certainly has to have the know-how. Even when there are no actual names, the records have enough parts to piece together a picture of the person who is sitting at a PC, where he lives, what interests he has, and how much money he spends.

How much is too much?

One can pick up interesting tidbits from the official company blogs, such as the fact that some employees are excited about the idea of building a 3D model of every building ever built on the planet. Google Building Maker already makes the required tools available. On an academic level, most of those working at high levels in the company are IT pros, mathematicians and statisticians—most of them toppers from prestigious universities. For them the masses of data collected are like toys with which they can run riot. They work on them as if possessed, to write algorithms which recognize patterns and structures in what seems like random chaos. There are no limits. Suggestions for projects which might seem outlandish are particularly welcome at Google.

- Advertisement -