Login

App Data & Preservation Inventories

Proquest ETD Data

General Information

Description

ETD submission packages from ProQuest that contain PhD dissertations submitted by UW-Madison students. The packages contatin custom bibliographic metadata XML files, PDF dissertation documents (one each), and 0 or more file attachments that supplement each dissertation.

Purpose

Used to create MARC and MODS records to facilitate discovery and display of born-digital PDF dissertations produced by UW-Madison students. The dissertations reside in the Library's FEDORA digital repository and are considered one instance of the Library's archival copies. (A print version is also delivered by ProQuest to the Library.)

Quick Facts

5,226 records and associated digital objects, adding about 800 new items per year, with a current total of 380 associated supplemental objects. The range of file mime types for supplemental objects currently includes 22 formats.

Data Classifications

Campus
  • Sensitive: The exposure of restricted theses and dissertations could result in opportunity loss for the affected graduate.
Library
  • Descriptive Descriptive metadata about electronic theses and dissertations created by graduating UW-Madison students.
  • Content Includes actual content of theses and dissertations.
  • Research

Data Contacts

Data Steward
Brian Shepphard brian.sheppard@wisc.edu
Internal Data Client
Tom Lundstrom tom.lundstrom@wisc.edu
Data Owner/Trustee
Lee Konrad lee.konrad@wisc.edu

Risk Assessment

Score Risk Type Details Evaluation Date
1 Library Impact Recent XML data could be (re)supplied by the vendor. Older data has already been archived and key fields transformed into MARC and stored in both the Alma and OCLC. February 25, 2019

Technical Details

Specifications

The delivered ProQuest ETD package contatins a custom XML file containing bibliographic and structural metadata, a PDF dissertation document, and 0 or more file attachments that supplement the dissertation.

Via an automated process, the ProQuest metadata file included in the delivered package is processed via XSLT to a valid MODS document which is, in turn, transformed to two corresponding MARC21 records — one XML and one binary. These are forwarded to Cataloging via an e-mail report, and the dissertation PDF and MODS are ingested, along with any attachments, into a FEDORA repository. Technical file metadata is created via a locally maintained file validation service and is included with the digital object.

As the time of ingest, CNRI handles are created to provide persistent URLs

KB: Documentation: https://kb.wisc.edu/library/internal/page.php?id=21652

ETD Example:

Dissertation: A pianist's introduction to Gary Powell Nash

http://digital.library.wisc.edu/1711.dl/VOG6NUAUMIPBT9C

Forward record, with "Related Electronic Resources" linking (via marc 856) to the dissertation attachments:

https://search.library.wisc.edu/catalog/9910133265102121

Correctness

All XML files are validated against their respective schemas. File formats are validated via JHOVE, but are ingested regardless of status. The existence of files referenceced in the PQ metadata is verified.

Schemas: MODS: http://digital.library.wisc.edu/1711.dl/XMLSchema-MODS-3.6

MARC21: https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd

ProQuest ETD XML: locally maintained schema, paDISS.xsd, based on PQ DTD at http://www.etdadmin.com/dtds/etd.dtd

Representative Record

Marc records reside in a local network drive managed by LTG (Pete Boguszewski) and in the Alma LMS.

The digital objects reside in the FEDORA repository managed by UWDCC/SDG.

Dependencies

The source is the ProQuest ETD SFTP submission package. Downstream uses are bibliographic records, via Cataloging, in the Discovery system, and digital objects in the FEDORA repository.

Access & Use

Delivery Modalities

Data is delivered automatically by Proquest to an ftp server. Transformed bibliographic data is delivered to Cataloging staff for MARC quality control checks and import into Alma.

Dissertations and supplemental files are immediately available to users, but per Grad School policy, are restricted to UW-Madison campus-only access for one year. Once the restriction period has expired, an automated process removes the restriction and updates the associated CNRI handle values (URLs).

Lifecycle

The digital objects in the FEDORA repository are considered preservation objects and have associated version and audit information. It's anticipated these will feed into a more formal preservation workflow once that is established.

Disposition
No Information
Relevant Processes

Data arrives via the ProQuest workflow/SFTP submission. The only vetting of data is via Cataloging's workflow.

Constraints

The UW-Madison Graduate School is the arbiter regarding access to ETDs. The default policy specifies a one-year campus-only restriction, after which they are publicly available. In rare instances specific ETDs are restricted to administrative-only access.