E-resource Proxy Logs
General Information
- Description
-
This data set consists of log events generated by the UW-Madison Libraries e-resource proxy server, presently using the EZProxy software. There are two sets of logs, audit logs and traffic proxy logs. The audit logs represent login events recording when a browser session is permitted to establish a new proxy session either due to the detection of an on-campus IP address or because a user has authenticated against the UW-Madison NetID login service. All post-authorization proxy traffic is then captured in the traffic proxy logs and represents user traffic to licensed electronic databases and content from UW-Madison Libraries e-resource subscriptions.
- Purpose
-
The audit log data is used by Library Technology Group (LTG) staff to troubleshoot and debug patron access problems. The traffic logs are scrubbed for patron identifiers, such as a user's IP address, and then used for collection analysis purposes to determine which library subscriptions have the most use.
- Quick Facts
-
Access logs consist of the most recent 15 days worth log data and are automatically rolled off. Scrubbed/de-identified traffic logs are stored in daily files dating back to 2016 and range in size from a few megabytes to 50MB.
Data Classifications
- Campus
-
- Sensitive: Raw logs pre-anonymization contain personally identifying information about library patrons, specifically usernames in the form of campus photo IDs and IP addresses.
- Internal: Post anonymization logs are classified as internal and are not shared publicly on our website but may be used within the campus Libraries for collection analysis purposes of the proxied e-resources.
- Library
-
- Authentication & Authorization Proxy audit logs detail the authentication status of library patrons using e-resource subscriptions.
- Reporting & Analytics Anonymized log data may be used for assessment of e-resource subscriptions.
- Transactional Traffic log events represent individual web browser access access to e-resources.
Data Contacts
- Data Owner/Trustee
- Lee Konrad lee.konrad@wisc.edu
- Data Custodian
- Library Technology Group (LTG)
- Data Architect/Modeler
- OCLC (software vendor)
- Internal Data Client
- Reporting staff
Risk Assessment
Score | Risk Type | Details | Evaluation Date |
---|---|---|---|
4 | Library Impact | Without access to the audit logs, Library Technology Group staff would not be able to troubleshoot access problems for library patrons trying to use library electronic content for teaching, learning and research. | August 26, 2021 |
5 | Data | This data is generated by EZProxy software running on a server and has no analogue source from which it can be recreated. | August 26, 2021 |
2 | Institutional Knowledge | While there is great potential for this data to be used for collection analysis purposes, to-date few projects have utilized it. | August 26, 2021 |
Technical Details
- Specifications
-
The audit log data is stored in a custom format defined by EZProxy. The traffic logs are a stored in a format that is very close to the default Apache web server log format.
- Correctness
-
This data set is correct if it accurately represents the use of e-resources by UW-Madison patrons. There are no quality control steps that ensure the raw data is generated correctly as the logging output is determined by the proxy software and is minimally configurable.
However, the long term copies of the traffic log data that are stored are only considered to be correct if there is no personally identifying information stored in the log events that would describe a patron. The attributes that qualify would include IP addresses and authenticated usernames. These elements should be removed from the persistent copy of the data.
- Representative Record
-
The authoritative copy of this data is stored on a server managed by the Libraries' LTG department. Because the persistent copy of the data contains no sensitive information, this data has no retention schedule and is kept indefinitely for long term analysis projects.
- Dependencies
-
The generation of this data is dependent on the Libraries' EZProxy server.
Access & Use
- Delivery Modalities
-
Audit logs are made accessible only to select Libraries IT staff for troubleshooting purposes. These staff will access the data by logging onto the servers where the data is stored.
Traffic logs are made available to staff with a reporting need for analyzing the use of library e-resource subscriptions. These staff may access the scrubbed logs.
- Lifecycle
-
Audit log data and traffic log data are automatically generated by patron use of e-resources. Audit logs are kept for 15 days and then removed from the server where they are stored as they no longer have a troubleshooting/debugging purpose outside of a two week time frame.
Traffic logs go through an automated anonymization and de-identification process. The original/raw logs containing sensitive personally identifying information are deleted and a reporting copy of the data is retained indefinitely for longitudinal collection analysis of e-resource vendors.
- Disposition
-
Data is stored within the Libraries' Linux computing environment. The servers are part of the Libraries production backup processes.
- Relevant Processes
-
Data is automatically generated. Periodic review of the data should take place to ensure that the anonymization processes are working correctly.
- Constraints
-
Raw log data that has not yet been scrubbed is subject to University policy for Computer Logging Statement:
https://policy.wisc.edu/library/UW-520