Recommendation from the Usage Log Retention Policy Workshop 

In the course of operation, web servers gather and store voluminous data about site visitors. In the typical default configuration, this includes the IP address, last page visited, and browser used, among other information. This log data is useful for security, improving usability, and marketing purposes, but contrary to the operating assumptions of many organizations, it typically includes personally identifiable information (PII), or information that can be used to generate PII. PII is governed by a complex set of state, federal, and European laws, and mismanagement of this information can result in potential civil liability or regulatory action by state or federal agencies.>This liability may arise from one of the many laws governing PII, or from inadvertent variance from stated privacy policies. For example, an organization subject to Family Educational Rights and Privacy Act may be inadvertently retaining or failing to safeguard PII that is an education record. At other times, an organizations privacy policy may incorrectly state that no PII is being collected. In addition, organizations concerned with healthcare, financial services, or educational records face additional requirements as a result of the Graham-Leach-Bliley Act, Health Information Portability and Accountability Act, and the Family Educational Rights and Privacy Act. These rules make it necessary for organizations to know what data they collect, and to manage it effectively. While simply turning off all logging features may be a reasonable option for some organizations, it is not an option for many, and those that choose to retain data need tools that can accurately describe what data is collected, and eliminate unwanted PII.

Because issues of data logging and retention are complex, and site operators generally wish to balance these needs, the following policy guidelines and the associated log management tool have been developed with help and advice of representatives from commercial, academic, and advocacy organizations at a meeting held at the Computers, Freedom, and Privacy Conference in New York City in April, 2003.

These guidelines have been informed by the Logging, Monitoring, and Privacy (LAMP) Project funded by the National Science Foundation (see http://www.aacrao.org/publications/catalog/NSF-LAMP.pdf), various fair information practice standards (e.g. http://www.cdt.org/privacy/guide/basic/generic.html) that serve as the basis for European and U.S. regulation, as well as laws specifically applicable to usage logs.

Recommendations


To gain widespread adoption, usage log policies must provide substantial benefits to the organizations that implement them.Such benefits include complying with applicable laws and reducing legal exposure, improved security practices, and improved understanding of user behavior. To be effective, a usage log policy must describe:

  • what data is logged
  • how it will be managed
  • when it will be retained or destroyed
  • what is disclosed to site visitors


  • For organizations that wish to implement this policy quickly and simply, compliance may be achieved by ensuring that the final octet of logged IP addresses, and all form data (generally, everything after a ? in a URL), will be deleted or obscured within 30 days or other appropriate, stated, fixed time period. Organizations that are unable or unwilling to do this may implement their log retention policies using the framework described in the LAMP study(see http://www.aacrao.org/publications/catalog/NSF-LAMP.pdf), which defines three levels of logging based on the type and purpose of data gathered, and calls them Level I, Level II, and Level III. These roughly correspond to non-personally identifiable information, derivable PII,and personally identifiable information. Recommendations regarding the gathering, use, management, and disposal of this data are in the following table, which can be used as the basis for organizational log policies. Many of these recommendations have been incorporated into the design of the Electronic Frontier Foundation's log management tool, which allows administrators of Apache to automatically implement these recommendations. This tool also serves as a reference implementation for organizations that operate other types of web servers.



     

    Level I

    (No personally identifiable information)

    Level II

    (Derivable PII)

    Level III

    (Personally identifiable information)

    Purpose

    Level I data is gathered mainly to assist in design, or  network and systems management. 

    Level II data is used for network and systems management, as well as security.

    Level III data is mainly used for security, and for transaction systems.

    Data gathered

    Browser type

    Usage trails within the site

    Referring pages

    Cookies

    Time and date stamps

     

    Level II data may be combined with other information to create personally identifiable information.

    IP addresses

    User IDs

    Email addresses

    Form data

    Cookies

     

    Level III data is directly traceable to individuals.

    Management and access control

     

    Access controls should be applied to Level II data. 

    Access controls should be applied to Level III data.  Individuals with access do this data should have the authorizations necessary for access documented in their positions.

    Scrubbing

     

    Level II data may be scrubbed in various ways to reduce its sensitivity. For example, cookie data may be set to expire within a day.

    Level III data may be scrubbed in various ways to reduce its sensitivity. For example, IP addresses may be truncated. 

    Retention and disposal

    Unlimited

    Level II data should be disposed of as soon as practically possible. 

    Level III data should be disposed of as soon as practically possible, e.g. 30 days after the period allowed for returns and exchanges has passed. 

    Labeling and disclosure

    None

    Notice that Level II data is being gathered should be disclosed in a web site's privacy policy.

    Notice that Level III data is being gathered should be disclosed in a web site's privacy policy.

    Modifications to web server default settings

    None

    Modifications to default settings are needed to enforce retention settings and access control.

    Modifications to default settings are needed to enforce retention settings and access control.