If you are responsible for digital archives or long-term preservation, you have surely wondered, “Is this file really in the format its extension suggests?” This time, I introduce “DROID,” a powerful tool that resolves such doubts, along with actual analysis results.

What is DROID?

DROID (Digital Record Object Identification) is a file format identification tool developed by The National Archives (UK). It identifies the true format by analyzing not just the file extension but the internal structure (signature) of the file.

Main Features of DROID

  • Identification by binary signature: Direct analysis of file content
  • Integration with PRONOM registry: Utilizing a database of over 15,000 file formats
  • Batch processing: Mass file analysis at the folder level
  • Extension mismatch detection: Discovering discrepancies between extensions and actual formats
  • CSV output: Making analysis results usable as data

Why is DROID Needed?

Digital files commonly have the following issues:

  1. Intentional extension changes: To hide the file format
  2. Incorrect extension assignment: Human error or system errors
  3. Extension not updated during format conversion: Extension remains old after conversion
  4. Files without or with unknown extensions: Such as during migration from old systems

These issues can have serious impacts on long-term preservation plans and migration strategies.

The Power of DROID Through Real Examples

Multiple interesting issues were found by actually analyzing sample files from a digital preservation workshop with DROID.

Major Issues Discovered

1. Audio File Disguised as Image File

FEAPMSixcUItlttIMaeeuDEtnna:uaslTsmify:eofmp:noteE:r/:X4m1T1.a4aE2tt1uN0i:dS1fiI6WoO_(a/N_svx_sue-MkgfwIygoaSmervMasmArtTysAC_uH_TdcIi=aFotFt-(rpiPuumCeraMrgWieAnVgfE-oFarOnmRdaM-tAm)Te)ow.tif

Issue: There is a risk of it being treated as an image and not being accessible with appropriate audio playback tools.

2. New Format with Old Extension

FEAPMSixcUItlttIMaeeuDEtnna:uaslTsmify:eofmp:noteE:r/:XAm4Tu.a1aEsdt2pNPo:pSrclIeMiOs(icNesca_rurtMvgoiIegsoSseonM-sf/ACttvTosnCdWdHeWo.ooro=frdpCdetofnrn9oxud7rmeu-lc2Wft0io_0nr23dm0oa1fwt8oss1r-1m2o1a0f5t0f.)7idcooecndwoacrudmsent..dwoocrxd)processingml.document

Issue: May not open in older Word versions, causing compatibility problems.

3. Actually MP4, Not MP3

FEAPMSixcUItlttIMaeeuDEtnna:uaslTsmify:eofmp:noteE:r/:XAm1Tu.a9aEsmt9pNtp:pSr3lIaMiOl(PcNasEa_suGtMig-iIag4oSPenMrsM/AetemTssdpCei4Hraa,vu=edFvsiiit_oldrieeuPfoeRi/ElmSep2)4019.mp3

Issue: May not play correctly in audio-only players. Additionally, additional information as a video container (video tracks, etc.) may be overlooked.

4. Confusion from Custom Extensions

FEAPSixcUtlttIaeeuDtnna:uaslsmif:eofm:notE:r/XFm3Tl.a5Eott3Npi:SpfIy_TODoaNirg_sigMkgeIsidS.nMtaIAilmTfaC_(gHonero=inFg-itislrnteuaaelnFdoarrmda)t

Issue: Likely intended to indicate a backup or original file, but standard image viewers cannot open it.

5. Files with Unidentifiable Format

FSFSiiOtlzRaeeMtn:AuaTsm5_:e6C:2OF,UoA3Nru9Tms3:attrb0ayultnaeissdieanPtriefsiearbvlees-modified.png

Issue: Despite being a relatively large file, DROID could not identify the format. It may be corrupted or in an extremely rare format.

Lessons from DROID Analysis Results

Practical Lessons for Digital Preservation

  1. Extensions cannot be trusted

    • Judging files solely by extension is dangerous
    • Internal structure verification is essential
  2. Early detection is important

    • Finding problems early reduces remediation costs
    • Regular file audits are recommended
  3. Accuracy of metadata

    • Correct file format information is the foundation of preservation strategies
    • A format information verification process is needed
  4. Impact on migration plans

    • Incorrect format information leads to wrong migration tool selections
    • Directly affects the accuracy of risk assessments

DROID Use Cases

1. During Digital Archive Ingestion

Used for quality checking newly received collections. Comparing provided metadata with actual file content.

2. Regular Health Checks

Used for periodic audits of existing collections to detect file deterioration and corruption early.

3. Migration Project Preparation

Before format migration, accurately identify the target files’ formats and select appropriate conversion tools.

4. Risk Assessment

Identify file formats scheduled for deprecation or those whose support has ended, and prioritize them.

Utilizing DROID Output Data

DROID outputs detailed reports in CSV format. Main output items:

  • PUID: Unique identifier in the PRONOM registry
  • MIME_TYPE: Standard MIME type
  • FORMAT_NAME: Official format name
  • FORMAT_VERSION: Format version
  • EXTENSION_MISMATCH: Extension mismatch flag
  • MD5_HASH: File checksum

By analyzing this data:

  • Understand the format distribution of the entire collection
  • Prioritize processing of high-risk files
  • Estimate preservation costs
  • Automatically generate technical metadata

Getting Started with DROID

Basic Workflow

  1. Download and install DROID

    • Free download from The National Archives website
    • Java runtime environment required
  2. Select analysis targets

    • Select a single file or entire folder
  3. Run the profile

    • DROID scans files and identifies formats
  4. Review results

    • Review results in the GUI
    • Focus on extension mismatches
  5. Export report

    • Export in CSV format
    • Use for further analysis or record keeping

Points to Note

  • Regular PRONOM updates: To support new file formats
  • Possibility of multiple formats: Some files may be identified as multiple formats
  • Understanding container formats: Container formats such as ZIP, DOCX, etc. have their internal structures analyzed as well

Summary

DROID is a powerful tool that makes “invisible problems” visible in digital preservation. As the real examples show, more files than expected may have incorrect extensions.

As a digital archive manager, the following is recommended:

  • Conduct regular DROID analysis
  • Prioritize fixing extension mismatch files
  • Record and manage file format information
  • Add to mandatory check items during ingestion

Accurate file format information is the foundation for ensuring long-term accessibility of digital assets. Why not use DROID to reveal the “true nature” of your collection?


Reference Resources

  • In-depth explanation of the PRONOM registry
  • Siegfried: Comparison with DROID alternative tools
  • The role of file format identification in digital preservation workflows

This blog post was created based on sample data from the iPRES2025 workshop. Using actual analysis results demonstrates the practical value of DROID.