Overview

While investigating Archivematica, there were aspects of File Information Tool Set (FITS) behavior I wanted to verify, so I tried it using Docker. This is a memo of that process.

https://github.com/harvard-lts/fits

Installation

The installation method using Docker is described at the following page.

https://github.com/harvard-lts/fits?tab=readme-ov-file#docker-installation

However, when accessing the following page mentioned in the manual, the latest release (1.6.0) that includes the Dockerfile could not be downloaded.

https://projects.iq.harvard.edu/fits/downloads

Instead, the latest zip file could be downloaded from the following GitHub releases page.

https://github.com/harvard-lts/fits/releases/tag/1.6.0

After that, I extracted and built it according to the README instructions.

However, on M1 Mac, executing the steps as described resulted in the following error.

%2jll0aiiod2vbbpo4ammtc-.ee/k0lddfe1aiiir-naat2giisr6.nn/uUfftn1nooo1s..o:assl-4toosr1i::/m:sm1fcce0iaadenni-dnna`LooipEittnwRnfdRkooo`OEpp/:Rreel/rnniw-onorssurM:hhxkeaa/dUrrlfineeiiaaddbtIbmsnlooefebbd-ojjii:teea9occif5ttnilft-offosaii..Edllssreeohrl::.oi0rbNN:roolacorssaayuundccni'hhonmtgeffdiionillpaaeeetininoovfrrseoh'ddal:iirirrebeedrccattoroobyrrjyyefcotrftihlies:oNpoersautcihngfislyestoermdfiorrecttooorly:MediaInfo.ostype=[Linux]jvmModel=[64]nativeLibPath=[/opt/fits/tools/mediainfo/linux]NonativeMediaInfolibraryforthisOS

After consulting ChatGPT 4 about this, it instructed me to add the following to the Dockerfile.

RUNaa#m&ppe&ttOd--tirgghameeeittrn-fruidofpnedsplateivtanbaeldmrlee/&ndl&-ciiyiab\qei/qsnafp\ot-/dleivsts/*

After adding the above, it worked correctly.

Trying It Out

This time, since I wanted to target a file with Japanese in the filename, I used “A Very Understandable Guide to Copyright and Classes.pdf” (Hiroshima University Information Media Education Research Center), which is published online under a CC BY license.

https://www.media.hiroshima-u.ac.jp/wp-content/uploads/2023/05/すごくわかる著作権と授業.pdf

Then, I executed the following.

dockerrun-rm`pwd`:/workfits-i.pdf

As a result, the following output was obtained.

<<?f/xi<<<<<fmti/f/f/m/s/ilsd<ii<<<<<<<<fi<fe<mt<<<<<<<<<<<<<<<stei/dlsclcffmfilwvitd/eattttttttttttttttsvxnd<<<<ieeirariidsleealao<<<<<<<<<<<<<<<<<<<<<<<<dttoooooooooooooooa>emtetttvednizesell5leslledctalphhgf/f/f/f/f/f/f/f/f/f/f/f/f/f/f/f/f/oaioooooooooooooootrlinoooexetneataeecaitlisauiuaaaaro<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fo<fcdslllllllllllllllisnftooortniftmtpnhsna-dttmttngssanfonfonfonfonfonfonfonfonfonfonfonfonfonfonfonfonfouatsisiilllsetfotioeaaetftfaaelhgeOAptontontontontontontontontontontontontontontontontonmtitttttttttttttttto=ctirii>onddtmcmouott>neouCunhntntntntntntntntntntntntntntntntnteacoooooooooooooooin"aytttontcogiheko>sroutraotnitt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>tt>n>soooooooooooooooc=htooonayalAftsd>mos>tgulocoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNtllllllllllllllls"tifoool>tnpiottuiel>otenitsoaoaoaoaoaoaoaoaoaoaoaoaoaoaoaoaoa>fnnnnnnnnnnnnnnn>1toollltIiapeooomfdnootnaClmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmiaaaaaaaaaaaaaaa.pnrnnnodomldlooialotetonenenenenenenenenenenenenenenenenetmmmmmmmmmmmmmmm0:>maaaoeneinlltetmnlotiua>a>a>a>a>a>a>a>a>a>a>a>a>a>a>a>a>seeeeeeeeeeeeeee"/ammmln>=ctannodoeanootonmDmDmGmGmGmHmHmHmHmHmHmHmHmRmRmRmSE===============/teeent"aomaaoo=maloonteIeIeaeaeoeieieieieieieieieeeyeyehx"""""""""""""""eh====aiJtoemmltl"emnlos=N=N=r=r=t=r=r=r=r=r=r=r=r=i=u=u=ueMOAVDjJefENOOfTnu""""mfhil=eenonJ=eanlt"N"N"a"a"h"a"a"a"a"a"a"a"a"m"m"m"eceIDTrphmixLIIficlPDETeioon"==aoah"=mantoJeJeJmJmJiJgJgJgJgJgJgJgJgJPJiJiJiudSLToyobliNSSiko.Drxi=evnaE""mlmoJ"emaoohxhxhohohchihihihihihihihihrhnhnhMtiilvAefZdadhFik"reNmxOOenevhE=emolototononoMononononononononoooPoPoGiaATTdyeRtFXe"iaifaT"aeiII=a=eox"=elnvLvRvdvdvBvovovovovovovovov-vrvrvooIuoo"z"CuoMiMnnrXdt"itm=fSS"m""viJ"=naeTeoePePe1eUeUeUeUeUeUeUeUeEeoeoeSnndooe"toelLttgv""okote"tOeJefhJ"am"P"u"r"r"0"D"D"D"D"D"D"D"D"x"-"-"tTfilltrtilte"o=aotaooEoFFI=ht"tohJmernee1SSSSSSSSBEMdioo""o"otl"aMo"rmtlo"lotxoiiS"ooovohe=totdtmtmtPtatatatatatatatatotxtet-m"oooidIetlUdio"onloilllOvotoevo="o-oeorororononononononononoloBodoBeIttltlottantovT.moltavof"eeFIelol"ev"JoModoPoPooososososososososodoooio<=tnoovovlyotfaoeFeeltvomeltiS"vo""eJhlelLlrlrl-lFlFlFlFlRlRlRlSl<lllul/"ofooeoev"oaodlr-dtvoeoernotIIlelt"hovdvTvovovDvSvSvSvSvSvSvSvtv/vdvmf2oollrlrelravs8uyeorl=saoonneFtrvtotoveiePe<e-eeetetetetetetetedefe<e<eo2lrvvsvsrtvEmtei"/prlsv"imloffioseoootverurrr/rSrBrdrdrdrdrdrdrdr-ror/r/rn5vmeeieisoexaaro?oesvieDoe"looIloirolooe"smsosfsmsosNsNsNsNsNsNsNsWsnsfsfst6earroroiortt"sn>i=ieorrn=vrrneloslvlo"i<i-ioibili-i-i-i-i-i-i-i3itioioiN"rtssnsnolsrii=s"ornso="temmfvnivevlto/oRonododoWoWoWoWoWoWoWo<oNononoa>siii=i=nviaoto"/ans=ii"JoraaoIe=oerevtonfnentn<n<n3n4n5n6n4n5n6n/nantntnmiooo"o"=eocnon2xp=i"od1hosttrnr"nrsreoo=o=g=N=/=/=<=<=<=<=<=<=<=f=m=N=N=eonnn6n1"rnt"o=.mp"o2n".oliiimfs1=sisrol"n"u"a"f"f"/"/"/"/"/"/"/"o"e"a"a">n"==.=.0s=ol"6ll6n.=2vvoooaoi."ioislv1t1l1m1o1o1f1f1f1f1f1f1f1n1>1m1m1=""5"2.i"rtv0./i.=6"t6eennntro21onoive.N.a.e.n.n.o.o.o.o.o.o.o.t..e.e."t00.262o1"oe.0nc5".2o."r=""imn6.n=noer2a2r2>2t2t2n2n2n2n2n2n2n2N22>2>22o..2.."n2or2"sa.10.o1s"oa=.2="=nrs6m6<66N6N6t6t6t6t6t6t6t6a66662o11"11=.tls"/t22"6l"ti1ttnt"16"1"=si.e./..a.a.N.N.N.N.N.N.N.m.....l""."s"5oviefi"..v>oo2oo"i1".1.1"io1>1f11m1m1a1a1a1a1a1a1a1e11110ve0t50oeoexio50e1on.ooo.122.1on""o""e"e"m"m"m"m"m"m"m">""""9essx"ea."lrnxetn0"r3l=5lltn2s".62.n=n>>eeeeeee"rttext4vs=ecs">s8v"0vvo"6t>5.62="sstssss>s>s>s>s>s>s>ssssssaacseu3eei"cu/pPi4e1"eeo.a01.6"1ttNtttttttttttttttsittutcs"xro0utfdDo5r2>rrlt1t""1.1.aaaaaaaaaaaaaaaaaatouutau=esn.tiifFn1s.2ssvo"u>"1.2ttmtttttttttttttttanssitt"eci=2iot"/=6i50iieossj>"26uueuuuuuuuuuuuuuuut===ouidxuo""onsX"6o02oorls=ta66.ss>sssssssssssssssu"""nsoietn1nT_t-6<n"3nnsvt"a-4s.1=================s0ddT=ndci=.sTioo4./=>-==ieaStJ<t1""""""""""""""""""=.iii"Tuo"0timuo<5s"20""ortIuP/a"SSSSSSSSSSSSSSSSS"1ddmdintn3"ametl/.i10111nsuNs<ptsIIIIIIIIIIIIIIIIId"eimoiT.te=pnv2z.2-..=isG=/austNNNNNNNNNNNNNNNNNinn=detoi6eu="uae"e23100"o=L"lgstaGGGGGGGGGGGGGGGGGdsoo"=nmGxs"2tmr>6-4""1n"ESae=atLLLLLLLLLLLLLLLLLttt2n"rTeAe=71"est.0T.=S__InC"tuEEEEEEEEEEEEEEEEEna9o1ui="c"33=iy110ss0"IRPNgoSus_________________otrr9t2nm"ud87x"op"-5tt"1NEDGuuIs=RRRRRRRRRRRRRRRRRtuuu"0"e9sti""mFne>1:aa.GSFLanN="EEEEEEEEEEEEEEEEEsnnr1=8tidlI>=A43tts0LUEgtG"SSSSSSSSSSSSSSSSSSr=""u""3aonT"dT1uut"EL._e>LSIUUUUUUUUUUUUUUUUUu"n9"tnnsSpo0:ssa_TiR>EINLLLLLLLLLLLLLLLLLnd"6uTo:"ub62==tsR"nE_NGTTTTTTTTTTTTTTTTT"i3sitxie:6""utE>dSRGL"""""""""""""""""d"=mstd2ZSSsaStdUELE>>>>>>>>>>>>>>>>>"erio"P8<II=tUr<LSE_nd=u=o>D:/NN"uLu/TU_Roi"n"lfF1cGGSsTet"LREtd2"hvm7rLLI="<i>TES3tetLZeEEN">/tA"SUrn1tr/i<a__GStvld>ULuo"ps4b/tRRLIraeonLTnt:i8rleEEENul>boT""/o8aadSS_Geie<">r/n<rs>UURL<d/>7uw=/ytLLEE/>Ihy<nw"emTTS_wnae/"w1x1o""UReDssg..t7d>>LEleO<rw6e.i/TSlsu/a3.r0fw"U-ithp.0n/io>Lfglaho"aAer1Tonisir>lddkf"rnAcgIo>/6>m1ens/dba1e8>nC2ee16d.oo0n18>1tu0tIa4an1in19(tt/fDb7Mi>Xie.26aoMesp39cnLrid65isS>gf02n>cn<70tha0oe1.f00sm8pie<ha.dl2/)-1fe9f<i<nes/n(/a1lasMfm0autaieestacl>fthniedmocnpcoreta1d>"ot5ish3fxh>5is)8ei<4d:/<>sc/crmhedea5mtcaihLneogccAkapstpuilmoi>nc=a"thitotnpN:a/m/eh>ul.harvard.edu/ois/xml/ns/fits/fits_outputhttp://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd"version="1.6.0"timestamp="1/26/24,12:49PM">

The following is an explanation by ChatGPT 4.


The provided XML is an example of output generated by FITS (File Information Tool Set). FITS is a tool used to identify and report characteristics and metadata of digital files. This XML output shows the analysis results of a specific PDF file. Each section is explained below:

<identification> Section

  • Contains information about the file format, MIME type (in this case application/pdf), and the tools used to identify it (Droid, Exiftool, Tika).
  • The version tag indicates the PDF file version (PDF/X-4).
  • The externalIdentifier tag shows the PUID (Persistent Unique Identifier) of the file format identified by the Droid tool.

<fileinfo> Section

  • Contains basic information about the file, such as file size, creating application, last modification date, creation date, file path, filename, MD5 checksum, and filesystem last modification date.

<filestatus> Section

  • Indicates whether the file format is well-formed and valid. This shows whether the file conforms to standards.

<metadata> Section

  • Contains metadata about the document. For example, information such as title, author, language, page count, presence of outlines, presence of annotations, number of images, and fonts used.

<statistics> Section

  • Shows the execution time and status of each tool. This reveals how long each tool took to analyze the file.

This output presents a detailed analysis of a PDF file and provides important information for the fields of archiving, libraries, and digital preservation. FITS is used to integrate the results of various tools to provide a comprehensive view of file characteristics.


Summary

Using FITS, I was able to learn various information about digital files. I plan to continue investigating for further utilization of FITS.