Introduction

Omeka-S is a powerful digital archive system, but by default, Japanese full-text search barely functions. This article explains how to achieve Japanese full-text search by introducing the MroongaSearch module.

Why the MroongaSearch Module is Essential

Omeka-S’s standard full-text search (FullTextSearch module) uses the InnoDB engine, which has the following critical issues:

Example of Japanese word search:

DSRaeetasaru:clht":termN:o"hits""

Since InnoDB’s full-text search assumes space-delimited languages like English, for Japanese:

  • Word search is impossible: The entire string is treated as a single word
  • Partial matching also fails: FULLTEXT indexes cannot process Japanese correctly
  • Zero search results: Users cannot find anything

The MroongaSearch Module Solution

The MroongaSearch module solves this problem in two stages:

1. Fallback Feature (Active Immediately After Module Installation)

Important: Simply installing the MroongaSearch module enables Japanese search to work without any special configuration.

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

The MroongaSearch module’s fallback feature:

  • Automatically detects CJK (Japanese, Chinese, Korean) single-word searches
  • Automatically falls back to LIKE '%term%' search
  • Works even without Mroonga configured
  • Without this, Japanese full-text search simply does not work properly

Furthermore, by configuring the Mroonga plugin for MariaDB:

  • Precise word search through morphological analysis
  • Fast full-text search (hundreds of times faster than LIKE)
  • Strict control of AND/OR search

What is the MroongaSearch Module?

MroongaSearch is a full-text search enhancement module for Omeka-S.

Main Features

  1. Automatic fallback feature

    • Enables CJK search even without Mroonga configured
    • Automatic switching to LIKE search
    • Ready to use immediately without configuration
  2. Mroonga integration

    • Precise search through morphological analysis
    • TokenMecab support
    • Fast index-based search
  3. Diagnostic page

    • Plugin status verification
    • Table engine display
    • Tokenizer information
    • Manual engine switching
  4. Strict AND/OR search

    • More precise search logic than standard FullTextSearch

Developers

  • Kentaro Fukuchi (initial version)
  • Kazufumi Fukuda (feature extensions)
  • Toshihito Waki (current maintainer)

Setup Procedure

Step 1: Install the MroongaSearch Module

cgdit/pcaltohn/etoh/totmpesk:a/-/sg/imtohduubl.ecsom/wakitosh/MroongaSearch.git

Activate the module from the Omeka-S admin panel.

Japanese search will work with just this! (via LIKE search fallback)

For faster and more precise search, configure the Mroonga plugin for MariaDB.

For Docker Environments

Directory structure:

omekaDdm-ooasccr-kkideeaDiorrdoncf-bcikicktele.remrs/pfqoilslee.yml

mariadb/Dockerfile:

F#R#RRUUOINENMnnsaamgmmaraemtppareepmbcaattrocctlhrl--ioaa--eoilgganbbgraeedg-efM"dMttba\itrpbr--pol:ouiptacvoulopnlodlangandsukiergitgatgeca/aneatain-nl_selniuipltpl-zt&blol&mef&/uau&-rr8agdgyo-\pi_iom&tnanne&/dgclodaaa\innbs=dtssthM/aae*r_Ctmaurbpoofnogra"Japan/eestec/tmoykseqnli/zmaatriioandb.conf.d/50-server.cnf

mariadb/init.sql:

ICCCNRRRSEEEITIAAAnAnTTTsLsEEEtLtaaFFFlSlUUUlOlNNNNCCCMAMTTTrMrIIIoEoOOOooNNNn'nghgIIIaaaFFF_pmUNNNlrDOOOuoFTTTgoinfEEEnguXXXanIIIa'cSSSn;tTTTdiSSSoUnmmmDsrrrFoooooofnnnugggnaaac___tsceinosoimcnpmaspapenetdRHRETETMTULURRNSNSOSNSASTMTRERIIN'NGhGaS_SOmONrNAoAMoMEnEg'a'h.hasa_o_m'mr;roooonnggaa..ssoo'';;

docker-compose.yml (mariadb section):

sermvairbreciueoneaicdsl-vMMMMsdlootuiYYYY:bdncammrSSSS::tkrea/oQQQQeetsrmnLLLLxr::iam____tfareRDUP:iadinOASAllbatOTESew:d:TARS/:a/b_B:Wmyv/PAOaDsaiASoRrornSEmDic/iS:e:akltWkdei.OoaobrbsRmmf/qDeeiml:kkly:aaes/yqdoloucrk_epra-sesnwtorrydpoint-initdb.d/init.sql

Rebuilding containers:

dddoooccckkkeeerrrcccooommmpppooossseeedbuoupwinl-ddmariadb

Step 3: Verify the Setup

1. Verify the Mroonga Plugin

doc-keer"SeHxOeWcP<LcUoGnItNaSi"ne|r-gnraempe>-imamrrioaodnbga-uroot-p<password>

Expected output:

MroongaACTIVESTORAGEENGINEha_mroonga.soGPL

2. Verify TokenMecab

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

0

Expected output (excerpt):

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

1

If TokenMecab is included, you are all set.

3. Check the MroongaSearch Diagnostic Page

In the Omeka-S admin panel:

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

2

Displayed information:

  • Plugin status: ACTIVE / NOT ACTIVE
  • Table engine: InnoDB / Mroonga
  • Tokenizer: TokenMecab / None
  • Mroonga effective: YES / NO

If “Mroonga effective: NO”:

  • Plugin is ACTIVE but the table engine remains InnoDB
  • Fallback search (LIKE) is used
  • It works but is slow

To set “Mroonga effective: YES”:

  • Manually switch the engine to Mroonga from the diagnostic page

  • Or change directly via SQL:
DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

3

4. Re-index

Run re-indexing from the diagnostic page or the Omeka-S admin panel.

How Search Works

Without Mroonga Configured (Fallback)

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

4

With Mroonga + TokenMecab Configured

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

5

Substring Search Also Works

Mroonga supports not only morphological analysis but also substring search:

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

6

This allows users to get results even when they do not know the exact words.

Morphological Analysis with TokenMecab

What is Morphological Analysis?

Since Japanese does not have space-delimited words like English, sentences need to be segmented into words.

Example:

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

7

This enables word-level searching for terms like “Tokyo” and “university.”

Limitations of Morphological Analysis

TokenMecab is powerful, but may not work as expected in the following cases:

1. Proper Nouns (New Words Not in the Dictionary)

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

8

2. Compound Words and Technical Terms

DS[[aeWWtaiiartt:chZhFShoeae"urMlattorlreoscrMrohmrenb:osgarouace"nlSksgteuasatlSrotecs"ahLrIacmKr"hoEedmu'rol%edetuul(reen]v%ee'dn!withoutMroongaconfigured)]

9

3. Coined Words and Neologisms

cgdit/pcaltohn/etoh/totmpesk:a/-/sg/imtohduubl.ecsom/wakitosh/MroongaSearch.git

0

4. Multiple Segmentation Patterns

cgdit/pcaltohn/etoh/totmpesk:a/-/sg/imtohduubl.ecsom/wakitosh/MroongaSearch.git

1

Solutions

  • User dictionary: Add custom words to the MeCab dictionary
  • TokenBigram: Supplement partial matches with 2-character N-grams
  • Fallback: MroongaSearch automatically uses LIKE search as well

Available Tokenizers

TokenizerDescriptionUse Case
TokenMecabMorphological analysisJapanese search (recommended)
TokenBigramSplit into 2-character unitsEmphasis on partial matching
TokenUnigramSplit into single charactersExact matching only
TokenDelimitSplit by delimitersEnglish, etc.

Performance Comparison

LIKE Search (Fallback)

cgdit/pcaltohn/etoh/totmpesk:a/-/sg/imtohduubl.ecsom/wakitosh/MroongaSearch.git

2

  • Full table scan
  • Delay proportional to data volume
  • However, search results are returned (zero without the module)
cgdit/pcaltohn/etoh/totmpesk:a/-/sg/imtohduubl.ecsom/wakitosh/MroongaSearch.git

3

  • Uses indexes
  • Fast search (hundreds of times faster than LIKE)
  • Scalable

Summary

Importance of the MroongaSearch Module

  1. Essential: The MroongaSearch module is essential for Japanese full-text search in Omeka-S
  2. Immediate effect: Search is possible immediately after installation via the fallback feature
  3. Incremental improvement: Further speed improvement with Mroonga configuration
LevelConfigurationSearch BehaviorPerformance
MinimumMroongaSearch module onlyLIKE search fallbackSlow (but works)
RecommendedMroongaSearch module + Mroonga + TokenMecabMorphological analysis searchFast

Benefits of Implementation

  1. Japanese search becomes possible: Immediately functional via fallback
  2. Improved accuracy: Word-level search with TokenMecab
  3. Speed improvement: Optimization with the Groonga engine
  4. Flexibility: Both morphological search and partial matching

Conclusion: When handling Japanese content in Omeka-S, the MroongaSearch module is essential.

Testing Environment

  • Omeka-S: 4.1.1
  • MroongaSearch: latest
  • MariaDB: latest (11.x)
  • Docker Compose
  • macOS (Darwin 24.6.0)

If you found this article helpful, please star the GitHub repository!