Recommendation from the Usage Log Retention
Policy
Workshop
In the course of operation, web servers gather and store voluminous data about
site visitors. In the typical default configuration, this includes the IP address,
last page visited, and browser used, among other information. This log data
is useful for security, improving usability, and marketing purposes, but contrary
to the operating assumptions of many organizations, it typically includes personally
identifiable information (PII), or information that can be used to generate
PII. PII is governed by a complex set of state, federal, and European laws,
and mismanagement of this information can result in potential civil liability
or regulatory action by state or federal agencies.>This liability may arise
from one of the many laws governing PII, or from inadvertent variance from stated
privacy policies. For example, an organization subject to Family Educational
Rights and Privacy Act may be inadvertently retaining or failing to safeguard
PII that is an education record. At other times, an organizations privacy policy
may incorrectly state that no PII is being collected. In addition, organizations
concerned with healthcare, financial services, or educational records face additional
requirements as a result of the Graham-Leach-Bliley Act, Health Information
Portability and Accountability Act, and the Family Educational Rights and Privacy
Act. These rules make it necessary for organizations to know what data they
collect, and to manage it effectively. While simply turning off all logging
features may be a reasonable option for some organizations, it is not an option
for many, and those that choose to retain data need tools that can accurately
describe what data is collected, and eliminate unwanted PII.
Because issues of data logging and retention are complex, and site operators
generally wish to balance these needs, the following policy guidelines and the
associated log management tool have been developed with help and advice of representatives
from commercial, academic, and advocacy organizations at a meeting held at the
Computers, Freedom, and Privacy Conference in New York City in April, 2003.
These guidelines have been informed by the Logging, Monitoring, and Privacy
(LAMP) Project funded by the National Science Foundation (see
http://www.aacrao.org/publications/catalog/NSF-LAMP.pdf),
various fair information practice standards (e.g.
http://www.cdt.org/privacy/guide/basic/generic.html)
that serve as the basis for European and U.S. regulation, as well as laws specifically applicable to usage logs.
Recommendations
To gain widespread adoption, usage log policies must provide substantial benefits
to the organizations that implement them.Such benefits include complying with
applicable laws and reducing legal exposure, improved security practices,
and improved understanding of user behavior. To be effective, a usage log
policy must describe:
what data is logged
how it will be managed
when it will be retained or destroyed
what is disclosed to site visitors
For organizations that wish to implement this policy quickly and simply, compliance
may be achieved by ensuring that the final octet of logged IP addresses, and
all form data (generally, everything after a ? in a URL), will be deleted or
obscured within 30 days or other appropriate, stated, fixed time period. Organizations
that are unable or unwilling to do this may implement their log retention policies
using the framework described in the LAMP study(see
http://www.aacrao.org/publications/catalog/NSF-LAMP.pdf),
which defines three levels of logging based on the type and purpose of data
gathered, and calls them Level I, Level II, and Level III. These roughly correspond
to non-personally identifiable information, derivable PII,and personally identifiable
information. Recommendations regarding the gathering, use, management, and disposal
of this data are in the following table, which can be used as the basis for
organizational log policies. Many of these recommendations have been incorporated
into the design of the Electronic Frontier Foundation's log management tool,
which allows administrators of Apache to automatically implement these recommendations.
This tool also serves as a reference implementation for organizations that operate
other types of web servers.
|
Level I
(No personally identifiable information)
|
Level II
(Derivable PII)
|
Level III
(Personally identifiable information)
|
Purpose
|
Level I data is gathered mainly
to assist in design, or network and
systems management.
|
Level II data is used for
network and systems management, as well as security.
|
Level III data is mainly used
for security, and for transaction systems.
|
Data gathered
|
Browser type
Usage trails within the site
|
Referring pages
Cookies
Time and date stamps
Level II data may be combined
with other information to create personally identifiable information.
|
IP addresses
User IDs
Email addresses
Form data
Cookies
Level III data is directly
traceable to individuals.
|
Management and access control
|
|
Access controls should be
applied to Level II data.
|
Access controls should be
applied to Level III data. Individuals
with access do this data should have the authorizations necessary for
access documented in their positions.
|
Scrubbing
|
|
Level II data may be scrubbed
in various ways to reduce its sensitivity. For example, cookie data may
be set to expire within a day.
|
Level III data may be scrubbed
in various ways to reduce its sensitivity. For example, IP addresses
may be truncated.
|
Retention and disposal
|
Unlimited
|
Level II data should be
disposed of as soon as practically possible.
|
Level III data should be
disposed of as soon as practically possible, e.g. 30 days after the
period allowed for returns and exchanges has passed.
|
Labeling and disclosure
|
None
|
Notice that Level II data is being gathered
should be disclosed in a web site's privacy policy.
|
Notice that Level III data is being gathered
should be disclosed in a web site's privacy policy.
|
Modifications to web server default settings
|
None
|
Modifications to default
settings are needed to enforce retention settings and access control.
|
Modifications to default
settings are needed to enforce retention settings and access control.
|