Proquest ETD Data
General Information
- Description
-
ETD submission packages from ProQuest that contain PhD dissertations submitted by UW-Madison students. The packages contatin custom bibliographic metadata XML files, PDF dissertation documents (one each), and 0 or more file attachments that supplement each dissertation.
- Purpose
-
Used to create MARC and MODS records to facilitate discovery and display of born-digital PDF dissertations produced by UW-Madison students. The dissertations reside in the Library's FEDORA digital repository and are considered one instance of the Library's archival copies. (A print version is also delivered by ProQuest to the Library.)
- Quick Facts
-
5,226 records and associated digital objects, adding about 800 new items per year, with a current total of 380 associated supplemental objects. The range of file mime types for supplemental objects currently includes 22 formats.
Data Classifications
- Campus
-
- Sensitive: The exposure of restricted theses and dissertations could result in opportunity loss for the affected graduate.
- Library
-
- Descriptive Descriptive metadata about electronic theses and dissertations created by graduating UW-Madison students.
- Content Includes actual content of theses and dissertations.
- Research
Data Contacts
- Data Steward
- Brian Shepphard brian.sheppard@wisc.edu
- Internal Data Client
- Tom Lundstrom tom.lundstrom@wisc.edu
- Data Owner/Trustee
- Lee Konrad lee.konrad@wisc.edu
Risk Assessment
Score | Risk Type | Details | Evaluation Date |
---|---|---|---|
1 | Library Impact | Recent XML data could be (re)supplied by the vendor. Older data has already been archived and key fields transformed into MARC and stored in both the Alma and OCLC. | February 25, 2019 |
Technical Details
- Specifications
-
The delivered ProQuest ETD package contatins a custom XML file containing bibliographic and structural metadata, a PDF dissertation document, and 0 or more file attachments that supplement the dissertation.
Via an automated process, the ProQuest metadata file included in the delivered package is processed via XSLT to a valid MODS document which is, in turn, transformed to two corresponding MARC21 records — one XML and one binary. These are forwarded to Cataloging via an e-mail report, and the dissertation PDF and MODS are ingested, along with any attachments, into a FEDORA repository. Technical file metadata is created via a locally maintained file validation service and is included with the digital object.
As the time of ingest, CNRI handles are created to provide persistent URLs
KB: Documentation: https://kb.wisc.edu/library/internal/page.php?id=21652
ETD Example:
Dissertation: A pianist's introduction to Gary Powell Nash
http://digital.library.wisc.edu/1711.dl/VOG6NUAUMIPBT9C
Forward record, with "Related Electronic Resources" linking (via marc 856) to the dissertation attachments:
https://search.library.wisc.edu/catalog/9910133265102121
- Correctness
-
All XML files are validated against their respective schemas. File formats are validated via JHOVE, but are ingested regardless of status. The existence of files referenceced in the PQ metadata is verified.
Schemas: MODS: http://digital.library.wisc.edu/1711.dl/XMLSchema-MODS-3.6
MARC21: https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
ProQuest ETD XML: locally maintained schema, paDISS.xsd, based on PQ DTD at http://www.etdadmin.com/dtds/etd.dtd
- Representative Record
-
Marc records reside in a local network drive managed by LTG (Pete Boguszewski) and in the Alma LMS.
The digital objects reside in the FEDORA repository managed by UWDCC/SDG.
- Dependencies
-
The source is the ProQuest ETD SFTP submission package. Downstream uses are bibliographic records, via Cataloging, in the Discovery system, and digital objects in the FEDORA repository.
Access & Use
- Delivery Modalities
-
Data is delivered automatically by Proquest to an ftp server. Transformed bibliographic data is delivered to Cataloging staff for MARC quality control checks and import into Alma.
Dissertations and supplemental files are immediately available to users, but per Grad School policy, are restricted to UW-Madison campus-only access for one year. Once the restriction period has expired, an automated process removes the restriction and updates the associated CNRI handle values (URLs).
- Lifecycle
-
The digital objects in the FEDORA repository are considered preservation objects and have associated version and audit information. It's anticipated these will feed into a more formal preservation workflow once that is established.
- Disposition
- No Information
- Relevant Processes
-
Data arrives via the ProQuest workflow/SFTP submission. The only vetting of data is via Cataloging's workflow.
- Constraints
-
The UW-Madison Graduate School is the arbiter regarding access to ETDs. The default policy specifies a one-year campus-only restriction, after which they are publicly available. In rare instances specific ETDs are restricted to administrative-only access.