If you are responsible for digital archives or long-term preservation, you have surely wondered, “Is this file really in the format its extension suggests?” This time, I introduce “DROID,” a powerful tool that resolves such doubts, along with actual analysis results.
What is DROID?
DROID (Digital Record Object Identification) is a file format identification tool developed by The National Archives (UK). It identifies the true format by analyzing not just the file extension but the internal structure (signature) of the file.
Main Features of DROID
- Identification by binary signature: Direct analysis of file content
- Integration with PRONOM registry: Utilizing a database of over 15,000 file formats
- Batch processing: Mass file analysis at the folder level
- Extension mismatch detection: Discovering discrepancies between extensions and actual formats
- CSV output: Making analysis results usable as data
Why is DROID Needed?
Digital files commonly have the following issues:
- Intentional extension changes: To hide the file format
- Incorrect extension assignment: Human error or system errors
- Extension not updated during format conversion: Extension remains old after conversion
- Files without or with unknown extensions: Such as during migration from old systems
These issues can have serious impacts on long-term preservation plans and migration strategies.
The Power of DROID Through Real Examples
Multiple interesting issues were found by actually analyzing sample files from a digital preservation workshop with DROID.

Major Issues Discovered
1. Audio File Disguised as Image File
Issue: There is a risk of it being treated as an image and not being accessible with appropriate audio playback tools.
2. New Format with Old Extension
Issue: May not open in older Word versions, causing compatibility problems.
3. Actually MP4, Not MP3
Issue: May not play correctly in audio-only players. Additionally, additional information as a video container (video tracks, etc.) may be overlooked.
4. Confusion from Custom Extensions
Issue: Likely intended to indicate a backup or original file, but standard image viewers cannot open it.
5. Files with Unidentifiable Format
Issue: Despite being a relatively large file, DROID could not identify the format. It may be corrupted or in an extremely rare format.
Lessons from DROID Analysis Results
Practical Lessons for Digital Preservation
Extensions cannot be trusted
- Judging files solely by extension is dangerous
- Internal structure verification is essential
Early detection is important
- Finding problems early reduces remediation costs
- Regular file audits are recommended
Accuracy of metadata
- Correct file format information is the foundation of preservation strategies
- A format information verification process is needed
Impact on migration plans
- Incorrect format information leads to wrong migration tool selections
- Directly affects the accuracy of risk assessments
DROID Use Cases
1. During Digital Archive Ingestion
Used for quality checking newly received collections. Comparing provided metadata with actual file content.
2. Regular Health Checks
Used for periodic audits of existing collections to detect file deterioration and corruption early.
3. Migration Project Preparation
Before format migration, accurately identify the target files’ formats and select appropriate conversion tools.
4. Risk Assessment
Identify file formats scheduled for deprecation or those whose support has ended, and prioritize them.
Utilizing DROID Output Data
DROID outputs detailed reports in CSV format. Main output items:
- PUID: Unique identifier in the PRONOM registry
- MIME_TYPE: Standard MIME type
- FORMAT_NAME: Official format name
- FORMAT_VERSION: Format version
- EXTENSION_MISMATCH: Extension mismatch flag
- MD5_HASH: File checksum
By analyzing this data:
- Understand the format distribution of the entire collection
- Prioritize processing of high-risk files
- Estimate preservation costs
- Automatically generate technical metadata
Getting Started with DROID
Basic Workflow
Download and install DROID
- Free download from The National Archives website
- Java runtime environment required
Select analysis targets
- Select a single file or entire folder
Run the profile
- DROID scans files and identifies formats
Review results
- Review results in the GUI
- Focus on extension mismatches
Export report
- Export in CSV format
- Use for further analysis or record keeping
Points to Note
- Regular PRONOM updates: To support new file formats
- Possibility of multiple formats: Some files may be identified as multiple formats
- Understanding container formats: Container formats such as ZIP, DOCX, etc. have their internal structures analyzed as well
Summary
DROID is a powerful tool that makes “invisible problems” visible in digital preservation. As the real examples show, more files than expected may have incorrect extensions.
As a digital archive manager, the following is recommended:
- Conduct regular DROID analysis
- Prioritize fixing extension mismatch files
- Record and manage file format information
- Add to mandatory check items during ingestion
Accurate file format information is the foundation for ensuring long-term accessibility of digital assets. Why not use DROID to reveal the “true nature” of your collection?
Reference Resources
- DROID official site: The National Archives (UK)
- PRONOM registry: https://www.nationalarchives.gov.uk/PRONOM/
- File format registry: Over 15,000 format definitions available for reference
Related Articles (Planned)
- In-depth explanation of the PRONOM registry
- Siegfried: Comparison with DROID alternative tools
- The role of file format identification in digital preservation workflows
This blog post was created based on sample data from the iPRES2025 workshop. Using actual analysis results demonstrates the practical value of DROID.