docs.rockarch.org
Rockefeller Archive Center Documentation

RAC Archivematica | Appendix

Processing Configuration

Digitized - Automated

These should be the default settings for digitized materials:

Step Action Rationale
Scan for viruses? No  
Assign UUIDs to directories No  
Generate transfer structure report No  
Perform file format identification (Transfer) No Digitized ingests contain a small set of known file formats
Extract packages No Digitized ingests should not contain packages (e.g., zip files)
Delete packages after extraction Yes  
Perform policy checks on originals No Policy checks refers to MediaConch, which is only used for A/V
Examine contents Skip examine contents Runs Bulk Extractor - not necessary for digitized materials, which are already processed
Create SIP(s) Create single SIP and continue processing  
Perform file format identification (Ingest) No  
Normalize Do not normalize Access derivatives are created in a separate application
Approve normalization Yes  
Choose thumbnail mode No  
Perform policy checks on preservation derivatives No  
Perform policy checks on access derivatives No  
Bind PIDs No This functionality is not currently used
Document empty directories No Empty directories do not exist in RAC-created transfers
Reminder: add metadata if desired Continue Digitized ingests include a rights.csv file, so metadata does not need to be added manually
Transcribe SIP contents No This uses Tesseract to OCR TIFs; OCR workflows for digitization happen outside of Archivematica
Perform file format identification command (Submission documentation & metadata) No  
Select compression algorithm 7z using bzip2  
Select compression level 5 - normal compression mode  
Store AIP Yes  
Store AIP location Store AIP in standard Archivematica directory  
Upload DIP Do not upload DIP  
Store DIP Reject DIP Access copies are created outside of Archivematica
Store DIP location None  

Digitized AV - Automated

These should be the default settings for digitized audiovisual materials:

Step Action Rationale
Scan for viruses? No  
Assign UUIDs to directories No  
Generate transfer structure report No  
Perform file format identification (Transfer) No Digitized ingests contain a small set of known file formats
Extract packages No Digitized ingests should not contain packages (e.g., zip files)
Delete packages after extraction Yes  
Perform policy checks on originals No Policy checks refers to MediaConch, which is only used for A/V
Examine contents Skip examine contents Runs Bulk Extractor - not necessary for digitized materials, which are already processed
Create SIP(s) Create single SIP and continue processing  
Perform file format identification (Ingest) No  
Normalize Do not normalize Access derivatives are created in a separate application
Approve normalization Yes  
Choose thumbnail mode No  
Perform policy checks on preservation derivatives No  
Perform policy checks on access derivatives No  
Bind PIDs No This functionality is not currently used
Document empty directories No Empty directories do not exist in RAC-created transfers
Reminder: add metadata if desired Continue Digitized ingests include a rights.csv file, so metadata does not need to be added manually
Transcribe SIP contents No This uses Tesseract to OCR TIFs; OCR workflows for digitization happen outside of Archivematica
Perform file format identification command (Submission documentation & metadata) No  
Select compression algorithm 7z using bzip2  
Select compression level 5 - normal compression mode  
Store AIP Yes  
Store AIP location Digitized AV AIP Store Digitized AV should be stored in an S3 bucket
Upload DIP Do not upload DIP  
Store DIP Reject DIP Access copies are created outside of Archivematica
Store DIP location None  

Legacy Born Digital - Automated

Legacy born digital materials are those that have been recoverd from digital media items or otherwise accessioned outside of Aurora, and have been fully processed by a processing archivist before ingest into Archivematica. These should be the default settings for legacy born digital materials:

Step Action Rationale
Scan for viruses? No  
Assign UUIDs to directories No  
Send transfer to quarantine No  
Remove from quarantine after __ days 28  
Generate transfer structure report No  
Perform file format identification (Transfer) Yes  
Extract packages No  
Delete packages after extraction Yes  
Perform policy checks on originals No  
Examine contents Skip examine contents  
Create SIP(s) Create single SIP and continue processing  
Perform file format identification (Ingest) Yes  
Normalize Normalize for preservation and access  
Approve normalization Yes  
Choose thumbnail mode No  
Perform policy checks on preservation derivatives No  
Perform policy checks on access derivatives No  
Bind PIDs No  
Document empty directories No  
Reminder: add metadata if desired Continue  
Transcribe SIP contents No  
Perform file format identification command (Submission documentation & metadata) No  
Select compression algorithm 7z using bzip2  
Select compression level 5 - normal compression mode  
Store AIP Yes  
Store AIP location Store AIP in standard Archivematica directory  
Upload DIP Do not upload DIP  
Store DIP Reject DIP Access copies are created outside of Archivematica
Store DIP location None  

Born Digital - Automated

This is the processing configuration named “automated” for born digital materials ingested from Aurora:

Step Action Rationale
Scan for viruses? No Transfers coming through Aurora undergo virus checking
Assign UUIDs to directories Yes Provides more contextual information about folders within a transfer which can assist processing archivists
Generate transfer structure report No Since the structure of the transfer is not changed during ingest, this is not necessary
Perform file format identification (Transfer) Yes Provides information about the file types within the transfer which allows archivists to decide on access and preservation steps
Extract packages No If there are any compressed folders within a transfer, decisions on extraction will be made by the processing archivist
Delete packages after extraction No Because there are not any directories being uncompressed within a transfer, there will not be any extracted packages requiring deletion
Perform policy checks on originals No Policy checks refers to MediaConch, which is only used for A/V
Examine contents Skip examine contents Runs BulkExtractor; unclear what the benefits are of doing this at ingest
Create SIP(s) Create single SIP and continue processing Avoid sending packages to temp storage and creating a backlog for processing
Perform file format identification (Ingest) No No changes have occurred to the transfer packages (such as extraction of packages) so the file format identification would be the same, existing data will be used
Normalize Normalize for preservation Access copies are created and uploaded to access systems with other microservices outside of Archivematica
Approve normalization Yes Normalization rules are set in the Format Policy Registry
Choose thumbnail mode No Thumbnails are not used in any RAC systems
Perform policy checks on preservation derivatives No This step applies only to AV materials, which are out of scope for this configuration
Perform policy checks on access derivatives No This step applies only to AV materials, which are out of scope for this configuration
Bind PIDs No PIDs in Archivematica are only Handle.Net IDs which are not used by the RAC
Document empty directories No In line with physical material processing guidelines, the RAC does not maintain records of empty folders
Reminder: add metadata if desired Continue Metadata has been packaged with the content in Aurora
Transcribe SIP contents No The OCR tool in Archivematica does not meet RAC needs
Perform file format identification command (Submission documentation & metadata) No Creates an unnecessary excess documentation about non-collection materials to be preserved
Select compression algorithm 7z using bzip2 Default algothrithm in Archivematica and in line with other RAC processing configurations
Select compression level 5 - normal compression mode Normal compression mode meets the RAC’s needs for a moderately compressed file
Store AIP Yes Archivematica AIPs are the RAC’s primary archival copy for born-digital materials
Store AIP location Store AIP in standard Archivematica directory Storage location defined for maintaining AIPs for born digital content packages
Upload DIP Do not upload DIP Archivematica does not have integrations with the access systems at the RAC
Store DIP Reject DIP Access copies are created outside of Archivematica
Store DIP location None The RAC does not store Archivematica DIPs

FPR Customatizations

The following customizations have been added to the Format Policy Registry (the “Preservation Planning” tab):

Characterization Rules

  • ffprobe and MediaInfo should be disabled for JPG and TIFF files, as ExifTool is sufficient for characterizing these formats
  • ffprobe and ExifTool should be disabled for moving image files, as MediaInfo is sufficient for characterizing these formats