Privacy on the Web: The Beacons Know You

 

Did you ever notice how a web site you have never visited before knows your interests enough to give you targeted advertisements? Sometimes, the ads are based on the content of the site, but other times, there appears to be no connection. There is an approach to collecting user information that crosses web site boundaries and maintains a history of your preferences.

You may ask, how is this possible? Are these companies sharing information? Is there adware on my computer that’s giving out this information? No. The answer is much simpler — outsourced advertising and analytics.

Many companies can’t afford to maintain a department that attracts advertisers, manages advertising sales, and tracks ad performance. As a result, they outsource their advertising to a specialized company. In the same way, most companies do not have the tools or expertise to track their own web site metrics, so they outsource to large companies that specialize in web analytics.

To display appropriate advertising, track ad performance, and track overall behavioral metrics on web sites, these companies that provide the service require the publisher or advertiser to put a small piece of code on their web site. This may be a small piece of Javascript or a simple image request. It is this image request that allows the advertising or analytics company to track user behavior across multiple web sites, since they are provided information about you at each site that has their beacon.

How Beacons Work

Web beacons are snippets of Javascript or HTML that create one pixel by one pixel image requests to a different web site that collects the data. This single pixel is invisible to the viewer of the web site. It is usually placed just inside the closing “body” tag of the page, although some analytics companies recommend that it be placed inside the opening “body” tag to improve accuracy.

There are three types of information that are collected using this image:

1. Data embedded in the URL: At a minimum, this data includes some form of account ID that represents the publisher of the web site the user is viewing. It may include any variables that can be retrieved using Javascript, such as screen resolution. It may include custom variables that better identify the user, such as user account number, email address, and any other data that the web site publisher collects from you during your visit.

2. Normal HTTP Data: This is collected by the web server that hosts the the one-by-one pixel image. This includes IP address, date and time, the page you requested, the previous page you requested, browser type and version, and session ID.

3. Persistent Data: This is collected in session cookies to to track your navigation through the web site you are viewing and in persistent cookies that connect your information between visits, and most interestingly, from other sites.

The company that collects the information that was embedded in the one-by-one pixel image stores raw data for each pixel request it receives. The company may provide tools for advertisers and customers using web analytics to aggregate data, graph it, display it in tables, and create custom reports. The output of this may be used to test marketing strategies, improve site navigation, or report on the success of a campaign. Companies will usually export the data and use it in recommendation engines.

Want to see it in more detail? Here are two things you can try. Go to the web site of a major retailer. From your web browser, view the source of the page. Scroll down to the bottom, near the “body” tag. Look for comments or snippets of Javascript that may be a web beacon. If it is just an image tag, try its URL in your web browser to see if it is an invisible 1?1 image. If you want to see some real action, try the Tamper Data pluginfor Firefox. You can inspect the requests made by your web browser and identify requests that are not for the site you are visiting.

The important thing to note is that in order to track users across web sites, the companies that provide advertising or analytic services must use a persistent cookie and it must be generated by their own domain. If publishers and retailers use their domain for the cookie, the cross-site tracking will not work.

Analytics, Recommendations, Summarizing, and Anonymity

Since web beacons from outsourcing companies may be able to track your every move, you may wonder what they are doing with your information. This post discusses the positive side of collecting and using this information. It also touches on the issue of anonymity and privacy.

Analytics

If you are running a web site today, you are probably using some form of web analytics. From the multi-billion dollar retailer to the blogger who publishes his rants, web analytics are easy to implement and provide a gold mine of information about your visitors. For this web site, I use a web beacon (some Javascript provided by a major analytics company) to collect traffic data to answer questions like:

  • What is the most popular content?
  • How did users get to the web site?
  • What keywords were used in search engines to get to the site?
  • What was the most used landing page?
  • How many pages per visit did users view?

Analytics, when used locally by a web site publisher, allow the publisher to enhance content and better reach an audience. For the web publisher, using the web beacon approach to gather these metrics is not only the easiest, it is also the most accurate approach. This is because legitimate web indexing services crawl web sites regularly and inflate traffic data. Web beacons that use Javascript do not record this data, since the indexing services do not execute Javascript.

Analytic data is summarized data. Although the raw data contains information about individuals and their behaviors on the web, companies who use the data aggregate it and use it to draw conclusions about all users — not individuals.

Recommendations

Companies use data collected from web beacons to feed their recommendation engines. This along with other sources of data helps them to present products that you may be interested in. Some recommendation engines will use the data to group users into virtual communities of people with the same interests, which broadens their ability by recommending products that others in your community are buying. Recommendations work with data at the individual level, but for this use, companies don’t view the data.

Recommendation data is information about the individual user. An automated process works with the data to identify you personally and serve you recommendations. The user’s web behavior is probably never reviewed by people, unless someone is debugging problems with the recommendation engine.

Targeting the Individual

Some web beacons collect information that you have submitted to the web site you are viewing. This may include your email address, user name, or account number. Companies may use this to follow up. After identifying you personally, you may be tracked to see what you purchased. The company may follow up with you individually or use this information for targeted email marketing.

Crossing the Boundaries

None of this may seem out of line to you. Most organizations that use web beacons to collect information about you have no harmful intent but rather aim to make your experience better. The potential issue lies with the collection of this data by large companies that cross company boundaries. Because they are a common collection point, they have the ability to match data from multiple web sites.
One privacy issue is that the privacy policy of the web site you visit may not be honored by the web beacon data collection company. This information may be provided to third parties or used in ways in which you have not agreed.

Another privacy issue is that this creates more repositories of rich user data that may or may not be protected with adequate security controls. It is subject to insider threats and may be used for corporate espionage and unsolicited email.

How to Hide from Web Beacons

Why would I want to hide from web beacons and consolidated web traffic analysis? I don’t have anything to hide. We each make decisions about how much privacy and security to give up to gain convenience. The settings in web browsers – to save passwords, accept third party cookies, and keep authenticated sessions persistent over many days and across many sites – make using the web easier. For some people this is an acceptable trade off. For others it is a more serious matter.

You may want to consider some privacy measures if you are:

  • A regular user who wants to keep your web browsing habits out of the hands of marketers
  • A parent who wants additional protection for your children online by hiding their IP address
  • A member of the military or are involved in covert activities
  • A citizen in a country that monitors the web and enforces information standards
  • A whistle blower who wants to remain anonymous
  • A journalist, writer, or blogger who publishes sensitive information
  • An activist concerned about privacy

At the most basic level, hiding from web beacons is as easy as turning off cookies in your web browser. Unfortunately, many web sites won’t work if you do not enable cookies. You can limit the exposure of your web browsing by clearing your cookies frequently, such as each time you close your web browser. This segments the trail of information about your web surfing habits and makes your browsing less identifiable across web sites and over a period of time.

This approach only helps to reduce your exposure to web beacons. It does not protect the normal web traffic that is part of the HTTP protocol. HTTP traffic is the network information that passes from your web browser, over the Internet, to a web server, and back again. It is what makes the web work. Each request contains your IP address and the page you are requesting. It also contains the web address of the web site you visited, if you clicked on a link to get there.

If you are really serious about web privacy and feel that you need to hide from web beacons and HTTP traffic analysis, you need a complete solution that bounces your web traffic off of several relays and manages cookies. One solution for this is Tor and Privoxy.

Tor is short for “The Onion Router”. It uses relays distributed across the Internet to hide your HTTP traffic from the web sites you visit. When configured correctly, it provides a high degree of privacy on the web. It does not, however, provide protection from web beacons, which run from the content of the web pages, usually using Javascript.

Privoxy provides a flexible solution for handling of cookies and blocking various types of content. When used with Tor, it provides the content-level privacy from web beacons.

Both Tor and Privoxy are freely available, but they may require some time to learn and configure correctly. There are also commercial solutions available that may simplify the setup and configuration.

How much web anonymity is right for you? You need to decide the right balance of convenience and privacy for yourself. When writing this series of posts, I tried some of the measures described here, but found that I’m more in favor of convenience. I have gone back to allowing cookies, web beacons, and HTTP traffic that can be traced to me.

How much privacy is right for you? Please share your thoughts in the comments section.

29 Окт 2012 , written by Администратор
Печать PDF