Overview

“Advanced Search adapter for Solr” is an Omeka S module that provides an advanced search adapter for Apache Solr. This enables you to leverage the full power of a search engine within Omeka. It provides features such as relevance-based (score) search, instant search, facets, autocomplete, and suggestions for both general users and administrators.

https://github.com/Daniel-KM/Omeka-S-module-SearchSolr

Setting Up Apache Solr

Apache Solr can be installed on a server different from the one where Omeka S is installed.

Set up Apache Solr in an environment where Java can be installed. For Ubuntu, the following site was helpful.

https://tecadmin.net/how-to-install-apache-solr-on-ubuntu-22-04/

You can start Apache Solr with commands like the following.

Also, create a core called mycol1.

Installing the Module

From here, work on the server where Omeka S is installed.

Download and install the module from the following page.

https://github.com/Daniel-KM/Omeka-S-module-SearchSolr/releases

During installation, you may see an alert indicating that AdvancedSearch is required, as shown below.

In that case, install and enable the following module first, then try installing Advanced Search adapter for Solr again.

https://omeka.org/s/modules/AdvancedSearch/

Connecting to Apache Solr

From the left side of the admin panel, navigate to Modules > Search manager to access the following screen.

</admin/search-manager>

Click the pencil button for the default core in “Solr cores” to access the following page. On this page, enter the IP address or hostname of the server where Apache Solr is installed in the “IP or hostname” form. Also enter the core you created earlier (here, mycol1) in “Solr core”.

</admin/search-manager/solr/core/1/edit>

If configured correctly, the “Status” will show OK as shown in the following screen. This confirms that Omeka S and Apache Solr are connected.

Creating Indexes and Pages

Here, we will create indexes and pages, and set up the search screen display in Omeka S.

In Search Admin

Access the following screen again.

</admin/search-manager>

Add Search Engine

On the above screen, click the “Add new search engine” button in the upper right. On the following screen, enter an appropriate name and select “Solr” for the “Adapter” field.

Then, on the following screen, click the “Reindex” icon. A reindexing block will appear on the right side of the screen, and click the “Confirm reindex” button.

This synchronizes Omeka S with Apache Solr. After reindexing is complete, checking the Apache Solr admin panel shows that (in this case, 3) documents have been registered.

Create a Page

Next, create a page. Access the Search manager admin screen again and click the “Add new page” button in the upper right.

After navigating to the following screen, fill in the required fields. Here is an example:

Field	Value
Name	page1
Path	find
Search engine	engine1 (the name of the engine you created earlier)
Form	Main

Also, for “Availability on sites”, select “Make available in all sites” for now.

In Admin or Site Settings

Next, add the created page to the Omeka S site. Select a specific site from the list of created sites and choose “Navigation”. On the following screen, select “Advanced search page” from the “Add custom link” on the right side, and select the name of the page you created (here, page1).

As a result, you can access the search page that displays Apache Solr query results at the following path (the Path you set earlier).

https://omekas.aws.ldas.jp/omeka4/s/default/find

However, with the configuration so far, facets and other settings have not been configured, so we will set them up below.

Configuration

First, I will explain how to configure facets.

From the Search manager page, click the pencil icon for the page you created (here, page1).

On the following screen, select the “Configure” tab at the top of the screen.

</admin/search-manager/config/1/configure>

Then, edit the items labeled “Facets” as follows.

From the rows displayed in “Available facets”, copy and paste the needed rows to the List of facets. Here, we add the following filter:

dcterms_subject_ss = Subject =

As a result, facets are displayed on the Omeka S site page you created earlier, enabling filtering based on values.

Filters

Next, here is how to configure filters for specifying search conditions.

Similar to the facets above, copy and paste the needed rows from “Available filters” to Filters. Here, we add the following filter:

dcterms_subject_ss = Subject = =

</admin/search-manager/config/1/configure>

As a result, a “Keyword/Subject” form has been added as shown below.

Advanced Filters

“Advanced filters” are forms that allow users to dynamically change filter conditions. For example, let’s add “Subject” and “Date” to “Advanced filters”.

The following form is displayed on the site’s search page.

Here is an example of searching for Date contains 11. Only items with 2020-11-24 as their Date were retrieved.

Other

Let’s try registering in the following format.

dcterms_subject_ss = Subject = Select = baby |medical

This makes it available as a select box as shown below.

Sort

https://omekas.aws.ldas.jp/omeka4/s/default/find

Japanese Language Support

Introduction

In the above settings, even for titles alone, there were 3 types available. These indicate differences in how data is indexed in Apache Solr.

dcterms_title_s = Title
dcterms_title_txt = Title
dcterms_title_txt_ja = Title

For example, using the string “横から見たオムツ姿の赤ちゃんのイラスト” (Illustration of a baby in a diaper seen from the side), the indexing results are as follows.

Field	Type	Terms
dcterms_title_s	string	横から見たオムツ姿の赤ちゃんのイラスト
dcterms_title_txt	text_general	横,か,ら,見,た,オムツ,姿,の,赤,ち,ゃ,ん,の,イラスト
dcterms_title_txt_ja	text_ja	横,見る,オムツ,姿,赤ちゃん,イラスト
dcterms_title_txt_cjk	text_cjk	横か,から,ら見,見た,たオ,オム,ムツ,ツ姿,姿の,の赤,赤ち,ちゃ,ゃん,んの,のイ,イラ,ラス,スト

*_txt appears to apply the StandardTokenizerFactory tokenizer, indexing consecutive katakana as one term and other characters individually. (I am not entirely confident about this.)
*_txt_ja applies the JapaneseTokenizerFactory tokenizer, indexing by morphemes.
*_txt_cjk applies the CJKBigramFilterFactory filter, indexing in 2-character bigrams.

Due to these differences, you need to configure how Omeka S fields are handled in Apache Solr according to your purpose.

CJK Filter

Solr Map

For example, let’s add *_txt_cjk which indexes title values in 2-character bigrams.

On the Search manager screen, select “Map Omeka metadata and Solr fields” for the Solr core, select “Resource (or Item)”, and click the “Add new map” button in the upper right.

On the following screen, select *_txt_cjk for “Solr field”.

Then perform reindexing.

Filters

Then, referring to the earlier Filters configuration, add dcterms_title_txt_cjk as follows.

</admin/search-manager/config/1/configure>

Site

This results in the following differences.

Searching with dcterms_title_s=イラスト indexes strings like 横から見たオムツ姿の赤ちゃんのイラスト as a whole, so no items match exactly and the result is 0 items.

On the other hand, searching with dcterms_title_txt_cjk=イラスト indexes strings like イラ,ラス,スト, and the search term イラスト is processed the same way. Therefore, 2 items containing the string イラスト are returned.

References

Let’s examine how _txt_cjk and _txt_ja are indexed.

_txt_cjk

Target Strings

横から見たオムツ姿の赤ちゃんのイラスト
【タイトルを更新】赤ちゃんの胸囲の測定のイラスト
iiif presentation api v3のマニフェスト

Results

</solr/#/mycol1/schema?field=dcterms_title_txt_cjk>

Term Frequency	Term
3	スト
2	イラ
ゃん
んの
のイ
ラス
赤ち
ちゃ
1	ら見
の赤
たオ
フェ
ムツ
を更
presentation
の胸
オム
の測
から
ェス
タイ
ツ姿
トル
ニフ
測定
マニ
のマ
更新
ルを
囲の
定の
姿の
イト
横か
v3
胸囲
見た
iiif
api

Except for English, you can confirm that strings are indexed in 2-character segments.

For example, スト appears 3 times because it is contained in both イラスト and マニフェスト.

_txt_ja

Target Strings

横から見たオムツ姿の赤ちゃんのイラスト
【タイトルを更新】赤ちゃんの胸囲の測定のイラスト
iiif presentation api v3のマニフェスト

Results

</solr/#/mycol1/schema?field=dcterms_title_txt_ja>

Term Frequency	Term
2	赤ちゃん
イラスト
1	iiif
v
横
オムツ
マニフェスト
タイトル
姿
更新
presentation
測定
胸囲
見る
api
3

Particles are excluded, and indexing is performed by morphemes such as 赤ちゃん and イラスト.

Comparison

dcterms_title_s=スト returns 0 results because it is indexed by morphemes like イラスト and マニフェスト.

On the other hand, dcterms_title_txt_cjk=スト returns 3 results. However, dcterms_title_txt_cjk=イラスト returns only 2 results. This is because only data containing all three bigrams イラ, ラス, and スト in the index will match. Data containing the string イラスト will match, while data containing マニフェスト will not.

Summary

I introduced how to connect Omeka S with Apache Solr. If you want to perform advanced searches including morphological analysis in Omeka S, this could be a useful option.

Some content may be inaccurate, but I hope it serves as a helpful reference.

Overview#

Setting Up Apache Solr#

Installing the Module#

Connecting to Apache Solr#

Creating Indexes and Pages#

In Search Admin#

Add Search Engine#

Create a Page#

In Admin or Site Settings#

Configuration#

Facet#

Filters#

Advanced Filters#

Other#

Sort#

Japanese Language Support#

Introduction#

CJK Filter#

Solr Map#

Filters#

Site#

References#

_txt_cjk#

Target Strings#

Results#

_txt_ja#

Target Strings#

Results#

Comparison#

Summary#

Overview

Setting Up Apache Solr

Installing the Module

Connecting to Apache Solr

Creating Indexes and Pages

In Search Admin

Add Search Engine

Create a Page

In Admin or Site Settings

Configuration

Facet

Filters

Advanced Filters

Other

Sort

Japanese Language Support

Introduction

CJK Filter

Solr Map

Filters

Site

References

_txt_cjk

Target Strings

Results

_txt_ja

Target Strings

Results

Comparison

Summary