Your Web site analytics solution generates a lot of data, potentially gigabytes a day if you run one or more busy sites. But who really owns all that rich data? It's a complex issue that often gets overlooked during Web analytics vendor selection and contract negotiations. As more customers turn to SaaS-based solutions (where the vendor stores your traffic data) and as Google and Yahoo continue to broaden this marketplace with their free hosted analytics offerings, the question of data ownership becomes increasingly germane.
Unfortunately many analysts and Web managers we encounter at large enterprises either don't read or don't have access to their vendor service terms, and they generally don't ask about data ownership during the vendor evaluation process. Most Web analytics customers just assume that they fully own their Web analytics data and are just granting a limited license to the vendor to generate reports. Depending on what "full ownership" means to you, that may not be totally true.
Within the enterprise, most companies have data ownership sorted out pretty clearly: all data belongs to "the employer." In fact, the fear of conferring ownership to an employee or any other entity spawned the phrase "data stewardship," rather than "data ownership." You may work with enterprise data for eight hours every day, but you are only the steward of that data; your employer actually owns it.
To give you a better sense of what truly matters to your enterprise, this article breaks ownership down into four, more specific dimensions:
Before we consider the four dimensions of ownership, let's have a look at the contracts. Hosted Web analytics vendors are providing you a service by ingesting, aggregating and parsing your traffic data, which they accumulate on your behalf. As part of its research for the Web Analytics Report, CMS Watch examined agreements from Google and Yahoo as well as contracts submitted privately from customers of two traditional fee-based Web analytics providers -- call them Vendor X and Vendor Y.
The two fee-based vendors explicitly confirm your ownership of the data.
Here are Vendor X's terms:
"As between [Vendor X] and Customer, Customer exclusively owns all rights, title and interest in and to all Customer Data. Customer Data is deemed Confidential Information..."
And Vendor Y, too:
"Customer Data, other Customer Confidential Information and any other Customer information and materials, and all worldwide intellectual property rights in the foregoing, are your exclusive property."
What about the free services?. Google's ToS (which is, to Google's credit, publicly available online) also refers to the collected data as "customer data." But the agreement doesn't clarify whether "customer" in this context means you, and not your Web site visitors. Yahoo's terms are mute on the topic of data ownership.
What does data ownership really mean? This legal dictionary defines ownership, alternately, as "Legal title coupled with exclusive legal right to possession," or "The right by which a thing belongs to someone in particular, to the exclusion of all other persons." When a SaaS vendor says you own your data, what rights are you still conveying and withholding?
USE OF YOUR DATA
Both Google and Yahoo have built powerful platforms, but "the real value is in the data," says new media guru Scott Karp. In exchange for the free service, they both give themselves expansive usage rights to your data.
Here's what Yahoo says:
"As a condition of using Analytics, you will: (i) obtain on behalf of the Yahoo! Entities all rights and permissions necessary for the Yahoo! Entities to use the Analytics data, including statistical and traffic information collected by us and/or provided by you..."
Yahoo mandates that you put strict notice of this in your Web site's privacy statement, including this clause:
"...(iii) a statement that expressly identifies Yahoo! and its use of the Analytics data to improve Yahoo!'s products and services and to provide advertisements about goods and services that may be of interest to end users..."
Your privacy statement must also link visitors to an Analytics opt-out form.
Google is equally vague:
But Google does at least severely restrict third-party access to the data. Google and many fee-based analytics vendors will privately combine your data with that of other customers for benchmarking or industry-average information -- and then share those reports with you -- but you can typically opt out of this.
Accumulating data across customers relates to the retention and disposition topic below. If you leave the service, what data usage rights does the vendor retain? Some analytics vendors may purge your raw data but still keep your aggregate information to inform their benchmarking warehouse.
DATA SECURITY AND AVAILABILITY
All of the large Web analytics vendors go to great lengths to ensure the safety of your data from the perspective of unauthorized access. You may get careless with passwords, but that's your problem. Of course to the extent that most data thefts are inside jobs, those vendors with more fine-grained access controls (hint: Google not among them) may provide a greater degree of safety in this regard.
As with security, most (though not all) Web analytics vendors invest in back-up, redundancy, and failover systems for optimal availability. Yet if there's a data loss, you're on your own. In cases where you don't have access to the raw data, you may never even know about a blip unless the roll-up reports tip you off. All agreements that we've seen absolve the vendors of any liability here. So, in this case, what does data ownership really do for you? Not much, unless you have access rights and retrieve it regularly. More about that, below.
We don't know of any instances of major data loss in this marketplace. It's worth noting, though, that just such a nightmare recently befell users of the Ma.gnolia bookmarking service (a competitor to Del.icio.us). A critical failure to the main and backup data store wiped out everyone's bookmarks. Ma.gnolia is not a large, commercial vendor, but the fact remains that sometimes the cloud can fail you.
Vendor X's terms and conditions in this regard seem instructive:
"[Vendor X] cannot guarantee that any Customer Data Customer stores or transmits through the Service will not be subject to unauthorized access by others or that others will not gain access to the Service. [Vendor X] performs regular system-wide back up procedures for the Service, however Customer understands that there is an inherent risk in electronic storage and agrees to rely solely on its own backup copies of any Customer Data stored in or transmitted through the Service should the Customer Data become lost or damaged for any reason. At no time and for no reason will [Vendor X] be responsible for recovering or retrieving any Customer Data stored and/or transmitted by Customer using the Service unless such recovery or retrieval results from an event or occurrence that requires a Service-wide restoration (which shall be at [Vendor X's] sole determination.)"
There are basically two types of Web analytics data you can retain: data that is used in creating summary tables that form the basis of the fancy reports you retrieve, and unaggregated source data -- complete records of all captured activity for each individual visit. Think of this distinction as summaries versus raw data.
Vendors will typically retain these two types of data for different periods of time. Raw data gets unwieldy quickly, so they may keep it for only a brief period (long enough to aggregate it) and commit contractually only to formal retention periods for your aggregate data.
Most vendors start with default retention terms. For example, they may grant one month for raw data, three months for aggregate summaries. If you want more than the default, you have to pay for it.
Consider WebTrend's standard retention schedule. By default the hosted service retains report data for four months, unless you upgrade to "Extended Data Retention" for thirteen months duration. Raw data gets stored for 14 days.
As e-discovery specialists know, retention could theoretically matter for legal or regulatory reasons. If you wanted to go back and prove or disprove that a visitor came to your site and accessed certain pages on a particular date and time, you might need to review the raw traffic data. (That data may still prove inconclusive regarding an individual visitor session, but that's another story.)
Then you also have to consider the issue of disposition. What happens to your raw data, let alone your aggregate data, when you leave a service? If the vendor says they've deleted it, is it really gone? The more important issue here is that you should not take perpetual ownership for granted. Unless you negotiate otherwise, at some point you no longer own your data (raw or unaggregated) -- because it will no longer exist.
Most Web analytics vendors point out that you can export your data at any time. However, not every vendor lets you export your raw data, and not always all of it at one time. You might have noticed in the WebTrends terms above that access to raw data costs extra, and carries certain limitations.
Given the size of these datasets, you can understand some of those limitations. Sometimes vendors will request that you perform large data exports only at certain times, and only in a certain formats (like CSV files) that may or may not prove convenient. And of course, you need to figure out how to store all that imported data yourself, but at least you know you have it.
Many customers need more regular access to their data. Enterprises are increasingly looking to integrate online data with offline information. Or they may want to run custom queries, perhaps against raw data, that can't be assembled using the vendor's report-building services. Larger vendors have responded by offering data warehouse functionality -- a term that will lead to some suitably expansive fees. The idea here is that the vendor offers an API to the underlying raw data which you can use to get programmatic access. That is, you can run queries and regularly extract just the data you need. Here again, larger data extractions may queue up at a vendor and their execution could get measured in days.
But the larger point is this: if "owning" your data means ad-hoc access to arbitrary slices of it on an ongoing basis, it's very likely you'll have to pay extra for that privilege.
WHAT YOU SHOULD DO
So, if you go with a hosted Web analytics service, you'll want to figure out what elements of data ownership are most relevant to your enterprise. If you run an SMB site and feel Google's free service provides decent analytics, you may be quite willing to forego some traditional privileges of ownership. For larger enterprises, making sure that you can get your hands on your data when you need it and with as little interference as possible is a prime consideration.
Above all, you should clarify contractually that your Web analytics vendor is simply a steward of your data for the purposes of ingest, mining, and reporting, for however long you engage them to play that role. You may grant them additional rights, but at the end of the day, the data should be yours.