In Archivematica, metadata schemas other than Dublin Core (DC) can be embedded into the AIP’s METS.xml. This guide explains how to include non-DC metadata such as EAD and MODS in a Transfer using source-metadata.csv, and verifies via API that they are correctly stored in the AIP.

Table of Contents

  1. Background and Purpose
  2. How source-metadata.csv Works
  3. XML Validation Feature
  4. Verification 1: MODS-Only Metadata Registration
  5. Verification 2: Simultaneous EAD + MODS Registration
  6. Storage Format of Non-DC Metadata in METS.xml
  7. Verification 3: Metadata Addition via Reingest
  8. Summary

Background and Purpose

In a standard Archivematica Transfer, Dublin Core metadata described in metadata/metadata.csv is stored as <dmdSec> in METS.xml. However, in actual digital archive operations, there are use cases where metadata schemas other than DC need to be handled:

  • EAD (Encoded Archival Description): A widely used standard for hierarchical archival description
  • MODS (Metadata Object Description Schema): A schema used for detailed library material description
  • LIDO: A descriptive standard for museum and gallery materials
  • MARC21: A catalog data format for libraries

Archivematica provides functionality to associate arbitrary XML metadata with a Transfer through a CSV file called source-metadata.csv, storing it as <dmdSec> in the AIP’s METS.xml. This guide verifies this functionality via API.

How source-metadata.csv Works

CSV Format

source-metadata.csv is a CSV file placed in the metadata/ directory of a Transfer, consisting of 3 columns.

foooibbbljjjeeeencccatttmssse,,/,emdmaoieddrt.sax.fdmxialmlt,leaE,.,AMptDOdyDfpS,efile_metadata.xml,CustomType
ColumnDescription
filenameRelative path of the file or directory the metadata applies to (starting from objects/)
metadataPath to the XML metadata file (relative from the metadata/ directory)
typeMetadata type identifier. Used for the OTHERMDTYPE attribute in METS.xml

Transfer Directory Structure

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

Associating Multiple Metadata with the Same File

In source-metadata.csv, multiple metadata entries with different type values can be associated with the same filename across multiple rows.

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

In this example, both EAD and MODS are associated with the objects directory (all files within it), and each is stored as a separate <dmdSec> in METS.xml.

Processing Flow

  1. source-metadata.csv is read during Transfer
  2. XML files specified in the metadata column are parsed
  3. If XML Validation is enabled, validation against schemas is performed
  4. XML that passes validation is embedded as <dmdSec> in METS.xml

XML Validation Feature

Overview

Archivematica includes a feature to validate XML metadata specified in source-metadata.csv against schemas. This feature is disabled by default and is positioned as an experimental feature.

To enable it, set the MCP Client environment variables.

AMRECTHAIDVAETMAA_TXIMCLA__VMACLPICDLAITEINOTN__MSCEPTCTLIINEGNST__FMIELTEA=D/ApTaAt_hX/MtLo_/VxAmLlI_DvAaTlIiOdNa_tEiNoAnB.LpEyD=true

Validation Configuration File

Validation settings are described in a Python file. Below is an example configuration used in Archivematica’s test environment.

fX}Xr_MMoDLLmI__RV""""""""VpAhhhhhambAa=LtttttleaLtItttttttgIhPDpppppoa-DlaA:::::"diAitT/:anTbhI/tfI(OwwwwsaoOi_Nwwwwl""Nm_wwwwu_::_pf=....bDFoiolllaINNArl{pioorRooIteedcccnnL_no..h/ee_P_a-ggi,,Oa)rsoov"Nt.ccv.a_hphh/slEaieMmltRrvmAouoReeaRdb-Ons.Cs-vRt.o2/d2sor1r.=[rg/3e00g"s"s.F]/:l:dxaOiesl/AmndsI"."e"/_:_d)s2DDe.c.IIah0RRrse/_i_mo/D/gpaaIhosi"R"ts"_lmsidi/o1xcdd"(/o"s:)"-M.,:vAx1Rs.Cd_12"D_.1)IDxs.RIslaRdis/"m_/).p".xor"assiosdiga_"xhip)(t_o.)sdsa,1cis..x_xx(pss)odd,s""i))x..(aa)ss,__ppoossiixx(()),,

The keys of the XML_VALIDATION dictionary are matched against XML documents in the following order:

  1. Value of the xsi:noNamespaceSchemaLocation attribute
  2. Last value of the xsi:schemaLocation attribute
  3. Namespace URI of the root element
  4. Local name of the root element

When the value for a key is None, validation is skipped, but storage in <dmdSec> still occurs. When a key does not exist in the dictionary, that metadata is silently skipped and is not stored in <dmdSec> either.

Supported Schemas in Test Environment

Namespace / KeyMetadata TypeValidation
http://www.openarchives.org/OAI/2.0/oai_dc/Dublin Core (OAI-PMH)XSD
http://www.lido-schema.orgLIDOXSD
http://www.loc.gov/MARC21/slimMARC21XSD
http://www.loc.gov/mods/v3MODSXSD
http://slubarchiv.slub-dresden.de/rights1SLUB RightsXSD
altoALTO (OCR)XSD
metadataGeneric metadataSkip
bag-infoBagIt informationSkip

Note: EAD (urn:isbn:1-931666-22-9) is not included in the default test environment configuration. To use EAD, an entry must be added to the configuration file.

Verification 1: MODS-Only Metadata Registration

Administration > Processing configuration screen – managing processing profiles (default / automated / backlog) used when submitting Transfers

Verification Environment

  • Archivematica 1.19 (Docker environment)
  • Dashboard: http://127.0.0.1:62080
  • Storage Service: http://127.0.0.1:62081
  • XML Validation: Enabled (test environment default settings)

Creating the Transfer Package

Create a Transfer package with the following structure.

metadomabetjtaea-ctdsemtteaoaoesstudds/tar.st-/cx./demxo-lmcmluemteandta.ttax.tcsv

source-metadata.csv:

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

mods.xml:

<<?m/xo<<<<<mmdt/n/tl/aolsi<ta<<nya<lbdttimnr/apnlassvxliteao<rmegant>emetlmlroeOungrrlIleteeol>faguasnneIyP>leRguacisf>npae>eeagto=oTferTs>ge>n">eo=teoe>T=hs>">ruTe"ttcNmres1toacrt.pDrktem0:opay>d"/comptto/urueeycewmar=xpunweta"temcwnet<=eo.t"Te/"ndl>extctiofstyoncot"pdfg.r>eeo=gOcO"r"oMrrfUvegeRavT/taaeueFmantstr-odioohi8dazruof"sta<rry?/at/cii>viretn3Voo>yg"anl=l<e"Mvi/TiOednesDraaroSstmm6iie>3mooP9enna-t=<r2a"/tbd3t>"a.i>t6tja"lp>enr><e/gliasntgruaatgieoTne.r<m/>abstract>

Executing the Transfer via API

cur---}lHHd'"""""-""'ntppaXAC{ayaruuomptotPtneehcoOht"""e_Soe:::saTrnspit"""iphz-ms<nrtaTetbgottytaa_vpipansce:eddeo"n:aa6n:/:tr4f1aad-it2Ap-"egr7ppt,n"u.ilec:e0Kiso.ectd"0ya"ea.t,du1ti-t:eopo6snam2t/ta0:jht8ts>e0eo"d/sn,"at",p"i\v2beta/package/

Results

Transfer tab – Transfer processing for metadata-test and metadata-test2 is complete, with each Microservice status displayed

Transfer -> Ingest completed and the AIP was successfully created. Examining the <dmdSec> of METS.xml, only MODS was stored as a dmdSec.

<m/e<mtm/ese<mt:tm/esdse<mt:m:tm/esddmso<mt:mSd:d!osmdeWxs-d:dScrm-sxWealx>mrcIpDmFla>DaluDp=Mtnla>"DasltdT>=amY"M>dPhOSEtDe=tSc"p_O:m2T/e"H/tEwaCRwdR"waE.tAOlaTToEHcDE.=Rg"Mo2Dv0T/2Ym6Po-Ed0=s2"/-Mv1O37D"TS0"v1>e:r5s1i:o4n1=""3S.T6A"T>US="original">

Reason EAD was not stored: The following error was recorded in the MCP Client log.

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

0

Because the EAD namespace (urn:isbn:1-931666-22-9) was not registered in the XML Validation configuration file, it was skipped during validation processing and was not stored in <dmdSec>.

Verification 2: Simultaneous EAD + MODS Registration

Modifying the XML Validation Configuration

To handle EAD, the EAD namespace was added to the configuration file.

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

1

By specifying None, XSD schema validation is not performed, but storage in the <dmdSec> of METS.xml is still carried out.

Executing the Transfer via API

A new Transfer package (metadata-test2) was submitted via API with the same structure as before.

Results: METS.xml dmdSec

Archival Storage list – AIPs for metadata-test (Verification 1) and metadata-test2 (Verification 2) have been successfully stored

This time, 3 dmdSecs were generated.

dmdSec_1: PREMIS:OBJECT (standard)

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

2

dmdSec_2: EAD

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

3

dmdSec_3: MODS

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

4

structMap Association

In structMap, it was confirmed that both EAD and MODS are associated with the objects directory via DMDID="dmdSec_2 dmdSec_3".

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

5

Storage Format of Non-DC Metadata in METS.xml

MDTYPE Attribute Handling

Values specified in the type column of source-metadata.csv are stored in METS.xml as follows.

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

6

Examining Archivematica’s source code (archivematicaCreateMETSMetadataXML.py), metadata via source-metadata.csv is always stored as MDTYPE="OTHER", with the type column value set in the OTHERMDTYPE attribute.

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

7

This differs from Dublin Core (which is stored as MDTYPE="DC" via metadata.csv).

STATUS Attribute

  • Initial Ingest: STATUS="original"
  • Re-ingest: STATUS="update"

During Re-ingest, existing dmdSecs with the same type are treated as superseded, and a new dmdSec is added with STATUS="update".

XML File Storage Location

XML files referenced by source-metadata.csv are also saved as files within the AIP.

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

8

Verification 3: Metadata Update via Reingest

AIP detail screen – UUID, size, storage location, and METS file download link for metadata-test2 can be confirmed

Purpose of Verification

Metadata re-ingest was performed on the AIP created in Verification 2 (containing EAD + MODS) to verify the following:

  • When existing MODS metadata is updated, is the old metadata retained as superseded and the new metadata added as update?
  • Are XML files added during Re-ingest saved within the AIP?

Starting the Reingest

Re-ingest tab on the AIP detail screen – select Reingest type (Metadata / Partial / Full) and Processing config. In this case, Metadata re-ingest was executed via API

The Storage Service API was used to start a Metadata re-ingest.

my-tromabenjtseafctdsemeteaoaorsstudd//tar.s-/cx.demxo-lmcmluemteandta.ttax.tcsvDMEMieAOgtDDiaStdmaaemltteaato-dabtadjotae-atcoatbjteoctbemapprpeisnegrvdeedfinition

9

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

0

Adding Metadata

When Reingest starts, the AIP is extracted and submitted to Archivematica’s Ingest workflow. Before approving “Approve AIP reingest”, new metadata files are placed in the extracted AIP’s data/objects/metadata/ directory.

Important: During Reingest, only objects/metadata/source-metadata.csv (root level) is processed. CSVs under objects/metadata/transfers/ are only read during the initial Ingest.

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

1

source-metadata.csv (for Reingest):

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

2

When the same value as an existing dmdSec is specified in the type column (MODS), the existing dmdSec becomes superseded and a new dmdSec is added as update. When a new type value (DC-CUSTOM) is specified, it is added as a new dmdSec.

Reingest Approval and Processing

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

3

After approval, the Ingest workflow proceeds. Even during Metadata re-ingest, decision points such as Normalize and Transcribe must be passed (manual selection is required if the automated processing config is not applied).

Results: METS.xml After Reingest

After Reingest completion, the METS.xml contained 4 dmdSecs.

dmdSecSTATUSTYPEDescription
dmdSec_1originalPREMIS:OBJECTSIP identification information (no change)
dmdSec_2originalOTHER(EAD)Initial Ingest EAD (no change)
dmdSec_3original-supersededOTHER(MODS)Initial Ingest MODS (changed to superseded)
dmdSec_4updateOTHER(MODS)Updated MODS added during Reingest

dmdSec_3 (old MODS -> superseded):

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

4

dmdSec_4 (new MODS -> update):

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

5

structMap Changes

After Reingest, all dmdSecs including superseded ones are referenced in the structMap for the objects directory.

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

6

Additionally, metadata files added during Reingest are saved as files within the AIP.

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

7

Why DC-CUSTOM Was Not Included in dmdSec

dc-reingest.xml (Dublin Core Terms format) was saved as a file within the AIP, but was not stored in dmdSec. The following error was recorded in the MCP Client log.

fooibbljjeeenccattmsse,,,emmaoeddt.sax.dmxalmt,laE,,AMtDOyDpSe

8

The only Dublin Core registered in the XML Validation configuration is the OAI-PMH format (http://www.openarchives.org/OAI/2.0/oai_dc/), and the DC Terms format (http://purl.org/dc/terms/) was not registered. Since XML Validation registration rules are namespace URI-based, even the same Dublin Core requires separate registration if the namespace differs.

Summary

Verification Results Summary

Verification ItemResult
MODS metadata dmdSec storageSuccess
EAD metadata dmdSec storageSuccess (after adding XML Validation configuration)
Multiple metadata association to same objectSuccess (EAD + MODS stored simultaneously)
structMap dmdSec associationNormal (DMDID="dmdSec_2 dmdSec_3")
MODS metadata update via ReingestSuccess (old: original-superseded, new: update)
structMap update after ReingestNormal (DMDID="dmdSec_2 dmdSec_3 dmdSec_4")
AIP storage of files added during ReingestSuccess
Storage of unregistered namespace DC TermsFailed (not registered in XML Validation configuration)

Notes

  1. XML Validation Configuration: In environments where XML Validation is enabled, the namespaces of metadata schemas used must be registered in the validation configuration file. If not registered, metadata is silently skipped. Errors are only recorded in logs, and Ingest / Re-ingest processing continues.

  2. MDTYPE Handling: Metadata via source-metadata.csv is always stored as MDTYPE="OTHER". It is not stored with METS standard MDTYPE values like MDTYPE="MODS" or MDTYPE="EAD".

  3. source-metadata.csv Location During Reingest: During initial Ingest, metadata/transfers/<transfer-name>/source-metadata.csv is used, but during Reingest, only metadata/source-metadata.csv (root level) is processed.

  4. Metadata Version Management via type Column: The type column in source-metadata.csv functions as an identifier for metadata updates during Reingest. Using the same type value causes the existing dmdSec to become superseded, while using a new type value adds a new dmdSec.

  5. Namespace-Based Registration: Even for the same metadata standard (e.g., Dublin Core), if different namespace URIs are used, each must be registered separately in the XML Validation configuration.

References