Overview# While investigating Archivematica, there were aspects of File Information Tool Set (FITS) behavior I wanted to verify, so I tried it using Docker. This is a memo of that process.
https://github.com/harvard-lts/fits
Installation# The installation method using Docker is described at the following page.
https://github.com/harvard-lts/fits?tab=readme-ov-file#docker-installation
However, when accessing the following page mentioned in the manual, the latest release (1.6.0) that includes the Dockerfile could not be downloaded.
https://projects.iq.harvard.edu/fits/downloads
Instead, the latest zip file could be downloaded from the following GitHub releases page.
https://github.com/harvard-lts/fits/releases/tag/1.6.0
After that, I extracted and built it according to the README instructions.
However, on M1 Mac, executing the steps as described resulted in the following error.
% 2 j l l 0 a i i o d 2 v b b p o 4 a m m t c - . e e / k 0 l d d f e 1 a i i i r - n a a t 2 g i i s r 6 . n n / u U f f t n 1 n o o o 1 s . . o : a s s l - 4 t o o s r 1 i : : / m : s m 1 f c c e 0 i a a d e n n i - d n n a ` L o o i p E i t t n w R n f d R k o o o ` O E p p / : R r e e l / r n n i w - o n o r s s u r M : h h x k e a a / d U r r l f i n e e i i a a d d b t I b m s n l o o e f e b b d - o j j i i : t e e a 9 o c c i f 5 t t n i l f t - o f f o s a i i . . E d l l s s r e e o h r l : : . o i 0 r b N N : r o o l a c o r s s a a y u u n d c c n i ' h h o n m t g e f f d i i o n i l l p a a e e e t i n i n o o v f r r s e o h ' d d a l : i i r i r r e b e e d r c c a t t o r o o b y r r j y y e f c o t r f t i h l i e s : o N p o e r s a u t c i h n g f i s l y e s t o e r m d f i o r r e c t t o o o r l y : M e d i a I n f o . o s t y p e = [ L i n u x ] j v m M o d e l = [ 6 4 ] n a t i v e L i b P a t h = [ / o p t / f i t s / t o o l s / m e d i a i n f o / l i n u x ] N o n a t i v e M e d i a I n f o l i b r a r y f o r t h i s O S After consulting ChatGPT 4 about this, it instructed me to add the following to the Dockerfile.
R U N a a # m & p p e & t t O d - - t i r g g h a m e e e i t t r n - f r u i d o f p n e d s p l a t e i v t a n b a e l d m r l e e / & n d l & - c i i y i a b \ q e i / q s n a f p \ o t - / d l e i v s t s / * After adding the above, it worked correctly.
Trying It Out# This time, since I wanted to target a file with Japanese in the filename, I used “A Very Understandable Guide to Copyright and Classes.pdf” (Hiroshima University Information Media Education Research Center), which is published online under a CC BY license.
https://www.media.hiroshima-u.ac.jp/wp-content/uploads/2023/05/すごくわかる著作権と授業.pdf
Then, I executed the following.
d o c k e r r u n - r m ` p w d ` : / w o r k f i t s - i す ご く わ か る 著 作 権 と 授 業 . p d f As a result, the following output was obtained.
< < ? f / x i < < < < < f m t i / f / f / m / s / i l s d < i i < < < < < < < < f i < f e < m t < < < < < < < < < < < < < < < s t e i / d l s c l c f f m f i l w v i t d / e a t t t t t t t t t t t t t t t t s v x n d < < < < i e e i r a r i i d s l e e a l a o < < < < < < < < < < < < < < < < < < < < < < < < d t t o o o o o o o o o o o o o o o a > e m t e t t t v e d n i z e s e l l 5 l e s l l e d c t a l p h h g f / f / f / f / f / f / f / f / f / f / f / f / f / f / f / f / f / o a i o o o o o o o o o o o o o o o t r l i n o o o e x e t n e a t a e e c a i t l i s a u i u a a a a r o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f o < f c d s l l l l l l l l l l l l l l l i s n f t o o o r t n i f t m t p n h s n a - d t t m t t n g s s a n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o n f o u a t s i s i i l l l s e t f o t i o e a a e t f t f a a e l h g e O A p t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n t o n m t i t t t t t t t t t t t t t t t t o = c t i r i i > o n d d t m c m o u o t t > n e o u C u n h n t n t n t n t n t n t n t n t n t n t n t n t n t n t n t n t n t e a c o o o o o o o o o o o o o o o i n " a y t t t o n t c o g i h e k o > s r o u t r a o t n i t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > t t > n > s o o o o o o o o o o o o o o o c = h t o o o n a y a l A f t s d > m o s > t g u l o c o N o N o N o N o N o N o N o N o N o N o N o N o N o N o N o N o N t l l l l l l l l l l l l l l l s " t i f o o o l > t n p i o t t u i e l > o t e n i t s o a o a o a o a o a o a o a o a o a o a o a o a o a o a o a o a o a > f n n n n n n n n n n n n n n n > 1 t o o l l l t I i a p e o o o m f d n o o t n a C l m l m l m l m l m l m l m l m l m l m l m l m l m l m l m l m l m i a a a a a a a a a a a a a a a . p n r n n n o d o m l d l o o i a l o t e t o n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e t m m m m m m m m m m m m m m m 0 : > m a a a o e n e i n l l t e t m n l o t i u a > a > a > a > a > a > a > a > a > a > a > a > a > a > a > a > a > s e e e e e e e e e e e e e e e " / a m m m l n > = c t a n n o d o e a n o o t o n m D m D m G m G m G m H m H m H m H m H m H m H m H m R m R m R m S E = = = = = = = = = = = = = = = / t e e e n t " a o m a a o o = m a l o o n t e I e I e a e a e o e i e i e i e i e i e i e i e i e e e y e y e h x " " " " " " " " " " " " " " " e h = = = = a i J t o e m m l t l " e m n l o s = N = N = r = r = t = r = r = r = r = r = r = r = r = i = u = u = u e M O A V D j J e f E N O O f T n u " " " " m f h i l = e e n o n J = e a n l t " N " N " a " a " h " a " a " a " a " a " a " a " a " m " m " m " e c e I D T r p h m i x L I I f i c l P D E T e i o o n " = = a o a h " = m a n t o J e J e J m J m J i J g J g J g J g J g J g J g J g J P J i J i J i u d S L T o y o b l i N S S i k o . D r x i = e v n a E " " m l m o J " e m a o o h x h x h o h o h c h i h i h i h i h i h i h i h i h r h n h n h M t i i l v A e f Z d a d h F i k " r e N m x O O e n e v h E = e m o l o t o t o n o n o M o n o n o n o n o n o n o n o n o o o P o P o G i a A T T d y e R t F X e " i a i f a T " a e i I I = a = e o x " = e l n v L v R v d v d v B v o v o v o v o v o v o v o v o v - v r v r v o o I u o o " z " C u o M i M n n r X d t " i t m = f S S " m " " v i J " = n a e T e o e P e P e 1 e U e U e U e U e U e U e U e U e E e o e o e S n n d o o e " t o e l L t t g v " " o k o t e " t O e J e f h J " a m " P " u " r " r " 0 " D " D " D " D " D " D " D " D " x " - " - " t T f i l l t r t i l t e " o = a o t a o o E o F F I = h t " t o h J m e r n e e 1 S S S S S S S S B E M d i o o " " o " o t l " a M o " r m t l o " l o t x o i i S " o o o v o h e = t o t d t m t m t P t a t a t a t a t a t a t a t a t o t x t e t - m " o o o i d I e t l U d i o " o n l o i l l l O v o t o e v o = " o - o e o r o r o r o n o n o n o n o n o n o n o n o l o B o d o B e I t t l t l o t t a n t o v T . m o l t a v o f " e e F I e l o l " e v " J o M o d o P o P o o o s o s o s o s o s o s o s o s o d o o o i o < = t n o o v o v l y o t f a o e F e e l t v o m e l t i S " v o " " e J h l e l L l r l r l - l F l F l F l F l R l R l R l S l < l l l u l / " o f o o e o e v " o a o d l r - d t v o e o e r n o t I I l e l t " h o v d v T v o v o v D v S v S v S v S v S v S v S v t v / v d v m f 2 o o l l r l r e l r a v s 8 u y e o r l = s a o o n n e F t r v t o t o v e i e P e < e - e e e t e t e t e t e t e t e t e d e f e < e < e o 2 l r v v s v s r t v E m t e i " / p r l s v " i m l o f f i o s e o o o t v e r u r r r / r S r B r d r d r d r d r d r d r d r - r o r / r / r n 5 v m e e i e i s o e x a a r o ? o e s v i e D o e " l o o I l o i r o l o o e " s m s o s f s m s o s N s N s N s N s N s N s N s W s n s f s f s t 6 e a r r o r o i o r t t " s n > i = i e o r r n = v r r n e l o s l v l o " i < i - i o i b i l i - i - i - i - i - i - i - i 3 i t i o i o i N " r t s s n s n o l s r i i = s " o r n s o = " t e m m f v n i v e v l t o / o R o n o d o d o W o W o W o W o W o W o W o < o N o n o n o a > s i i i = i = n v i a o t o " / a n s = i i " J o r a a o I e = o e r e v t o n f n e n t n < n < n 3 n 4 n 5 n 6 n 4 n 5 n 6 n / n a n t n t n m i o o o " o " = e o c n o n 2 x p = i " o d 1 h o s t t r n r " n r s r e o o = o = g = N = / = / = < = < = < = < = < = < = < = f = m = N = N = e o n n n 6 n 1 " r n t " o = . m p " o 2 n " . o l i i i m f s 1 = s i s r o l " n " u " a " f " f " / " / " / " / " / " / " / " o " e " a " a " > n " = = . = . 0 s = o l " 6 l l 6 n . = 2 v v o o o a o i . " i o i s l v 1 t 1 l 1 m 1 o 1 o 1 f 1 f 1 f 1 f 1 f 1 f 1 f 1 n 1 > 1 m 1 m 1 = " " 5 " 2 . i " r t v 0 . / i . = 6 " t 6 e e n n n t r o 2 1 o n o i v e . N . a . e . n . n . o . o . o . o . o . o . o . t . . e . e . " t 0 0 . 2 6 2 o 1 " o e . 0 n c 5 " . 2 o . " r = " " i m n 6 . n = n o e r 2 a 2 r 2 > 2 t 2 t 2 n 2 n 2 n 2 n 2 n 2 n 2 n 2 N 2 2 > 2 > 2 2 o . . 2 . . " n 2 o r 2 " s a . 1 0 . o 1 s " o a = . 2 = " = n r s 6 m 6 < 6 6 N 6 N 6 t 6 t 6 t 6 t 6 t 6 t 6 t 6 a 6 6 6 6 2 o 1 1 " 1 1 = . t l s " / t 2 2 " 6 l " t i 1 t t n t " 1 6 " 1 " = s i . e . / . . a . a . N . N . N . N . N . N . N . m . . . . . l " " . " s " 5 o v i e f i " . . v > o o 2 o o " i 1 " . 1 . 1 " i o 1 > 1 f 1 1 m 1 m 1 a 1 a 1 a 1 a 1 a 1 a 1 a 1 e 1 1 1 1 0 v e 0 t 5 0 o e o e x i o 5 0 e 1 o n . o o o . 1 2 2 . 1 o n " " o " " e " e " m " m " m " m " m " m " m " > " " " " 9 e s s x " e a . " l r n x e t n 0 " r 3 l = 5 l l t n 2 s " . 6 2 . n = n > > e e e e e e e " r t t e x t 4 v s = e c s " > s 8 v " 0 v v o " 6 t > 5 . 6 2 = " s s t s s s s > s > s > s > s > s > s > s s s s s s a a c s e u 3 e e i " c u / p P i 4 e 1 " e e o . a す 0 1 . 6 " 1 t t N t t t t t t t t t t t t t t t s i t t u t c s " x r o 0 u t f d D o 5 r 2 > r r l t 1 t ご " " 1 . 1 . a a a a a a a a a a a a a a a a a a t o u u t a u = e s n . t i i f F n 1 s . 2 s s v o " u く > " 1 . 2 t t m t t t t t t t t t t t t t t t a n s s i t t " e c i = 2 i o t " / = 6 i 5 0 i i e o s わ s j > " 2 6 u u e u u u u u u u u u u u u u u u t = = = o u i d x u o " " o n s X " 6 o 0 2 o o r l s = か t a 6 6 . s s > s s s s s s s s s s s s s s s u " " " n s o i e t n 1 n T _ t - 6 < n " 3 n n s v t " る a - 4 s . 1 = = = = = = = = = = = = = = = = = s 0 d d T = n d c i = . s T i o o 4 . / = > - = = i e a S 著 t J < t 1 " " " " " " " " " " " " " " " " " " = . i i i " T u o " 0 t i m u o < 5 s " 2 0 " " o r t I 作 u P / a " S S S S S S S S S S S S S S S S S " 1 d d m d i n t n 3 " a m e t l / . i 1 0 1 1 1 n s u N 権 s < p t s I I I I I I I I I I I I I I I I I d " e i m o i T . t e = p n v 2 z . 2 - . . = i s G と = / a u s t N N N N N N N N N N N N N N N N N i n n = d e t o i 6 e u = " u a e " e 2 3 1 0 0 " o = L 授 " l g s t a G G G G G G G G G G G G G G G G G d s o o " = n m G x s " 2 t m r > 6 - 4 " " 1 n " E 業 S a e = a t L L L L L L L L L L L L L L L L L t t t 2 n " r T e A e = 7 1 " e s t . 0 T . = S _ _ I n C " t u E E E E E E E E E E E E E E E E E n a 9 o 1 u i = " c " 3 3 = i y 1 1 0 s s 0 " I R P N g o S u s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ o t r r 9 t 2 n m " u d 8 7 x " o p " - 5 t t " 1 N E D G u u I s = R R R R R R R R R R R R R R R R R t u u u " 0 " e 9 s t i " " m F n e > 1 : a a . G S F L a n N = " E E E E E E E E E E E E E E E E E s n n r 1 = 8 t i d l I > = A 4 3 t t s 0 L U 版 E g t G " S S S S S S S S S S S S S S S S S S r = " " u " " 3 a o n T " d T 1 u u t " E L . _ e > L S I U U U U U U U U U U U U U U U U U u " n 9 " t n n s S p o 0 : s s a _ T i R > E I N L L L L L L L L L L L L L L L L L n d " 6 u T o : " u b 6 2 = = t s R " n E _ N G T T T T T T T T T T T T T T T T T " i 3 s i t x i e : 6 " " u t E > d S R G L " " " " " " " " " " " " " " " " " d " = m s t d 2 Z S S s a S t d U E L E > > > > > > > > > > > > > > > > > " e r i o " P 8 < I I = t U r < L S E _ n d = u = o > D : / N N " u L u / T U _ R o i " n " l f F 1 c G G S s T e t " L R E t d 2 " h v m 7 r L L I = " < i > T E S 3 t e t L Z e E E N " > / t A " S U r n 1 t r / i < a _ _ G S t v l d > U L u o " p s 4 b / t R R L I r a e o n L T n t : i 8 r l e E E E N u l > b o T " " / o 8 a a d S S _ G e i e < " > r / n < r s > U U R L < d / > 7 u w = / y t L L E E / > I h y < n w " e m T T S _ w n a e / " w 1 x 1 o " " U R e D s s g . . t 7 d > > L E l e O < r w 6 e . i / す T S l s u / a 3 . r 0 f w ご " U - i t h p . 0 n / i o く > L f g l a h o " a A e r わ 1 T o n i s i r > l d d k か f " r n A c g I o > / る 6 > m 1 e n s / d b す 著 a 1 e 8 > n C 2 e e ご 作 1 6 d . o o 0 n く 権 1 8 > 1 t u 0 t I わ と a 4 a n 1 i n か 授 1 9 ( t t / f D る 業 b 7 M i > X i e 著 . 2 6 a o M e s 作 p 3 9 c n L r i 権 d 6 5 i s S > g と f 0 2 n > c n 授 < 7 0 t h 業 a 0 o e 1 . f 0 0 s m 8 p i e < h a . d l 2 / ) - 1 f e 9 f < i < n e s / n ( / a 1 l a s M f m 0 a u t a i e e s t a c l > f t h n i e d m o c n p c o r e t a 1 d > " o t 5 i s h 3 f x h > 5 i s ) 8 e i < 4 d : / < > s c / c r m h e d e a 5 m t c a i h L n e o g c c A k a p s t p u i l m o i > n c = a " t h i t o t n p N : a / m / e h > u l . h a r v a r d . e d u / o i s / x m l / n s / f i t s / f i t s _ o u t p u t h t t p : / / h u l . h a r v a r d . e d u / o i s / x m l / x s d / f i t s / f i t s _ o u t p u t . x s d " v e r s i o n = " 1 . 6 . 0 " t i m e s t a m p = " 1 / 2 6 / 2 4 , 1 2 : 4 9 P M " > The following is an explanation by ChatGPT 4.
The provided XML is an example of output generated by FITS (File Information Tool Set). FITS is a tool used to identify and report characteristics and metadata of digital files. This XML output shows the analysis results of a specific PDF file. Each section is explained below:
<identification> Section# Contains information about the file format, MIME type (in this case application/pdf), and the tools used to identify it (Droid, Exiftool, Tika). The version tag indicates the PDF file version (PDF/X-4). The externalIdentifier tag shows the PUID (Persistent Unique Identifier) of the file format identified by the Droid tool. <fileinfo> Section# Contains basic information about the file, such as file size, creating application, last modification date, creation date, file path, filename, MD5 checksum, and filesystem last modification date. <filestatus> Section# Indicates whether the file format is well-formed and valid. This shows whether the file conforms to standards. Contains metadata about the document. For example, information such as title, author, language, page count, presence of outlines, presence of annotations, number of images, and fonts used. <statistics> Section# Shows the execution time and status of each tool. This reveals how long each tool took to analyze the file. This output presents a detailed analysis of a PDF file and provides important information for the fields of archiving, libraries, and digital preservation. FITS is used to integrate the results of various tools to provide a comprehensive view of file characteristics.
Summary# Using FITS, I was able to learn various information about digital files. I plan to continue investigating for further utilization of FITS.