Investigation date: 2026-02-24 Target: GakuNin RDM (GRDM) Search API Source code: RCOSDP/RDM-osf.io (website/search/ directory) Developer guide: RCOSDP/RDM-developer-guide Note: Official documentation for the Search API could not be found. This article is an investigation record based on both the actual API behavior and the source code.


Overview

GakuNin RDM is a fork of OSF (Open Science Framework), and its source code is available on GitHub (RCOSDP/RDM-osf.io). The search functionality implementation is in the website/search/ directory and consists mainly of the following files.

FileRole
elastic_search.pyIndex mapping definitions, document registration/update
views.pyAPI endpoint handlers
util.pyQuery construction including build_private_search_query()
search.pyHigh-level interface
PAOuStThohrtitzpast:i/nr:dmB.enairie.rac<.pjepr/saopnia/lv1a/csceeasrscht/oken>

In Japanese environments, Elasticsearch’s kuromoji_analyzer is used (confirmed in the source code).


Request Format

{}""}""ae,hspl"}""ioiaq,fsgr_su"}rihtvtefozl"eiri"}mei:rcylq""gss"tu"}::h"ie:eeqtmoarru""01"onr{eyedq,0:d"cd"reui:h":yfe"f_:_arti{d{suyie"s{tl"tdvlrt:l_e"i_edn:nf":edgis3so{"ee0cr:la,""drn:{"ca:hm"eg"k:r_e3day0mlw,"lou,"rs,de""rv:e3r0s,itoenx"t::122}4,,comments.*:124",
ParameterDescriptionNotes
api_versionvendor: "grdm", version: 2version supports 1 and 2 (confirmed in source code)
elasticsearch_dsl.queryElasticsearch Query DSLUses filtered format (ES 2.x syntax)
from / sizePaginationConfirmed to work up to size=100. match_all + size>50 results in a 500 error
highlightfield_name:character_count formatGRDM-specific format. Wildcards (comments.*) can also be used
sortSort orderDescribed below

Sort Options

According to the source code (build_private_search_query in util.py), the following sort targets are defined.

Sort valueVerifiedDescription
modified_desc / modified_ascVerifiedBy modification date
created_desc / created_ascVerifiedBy creation date
project_desc / project_ascUnverifiedBy project name (defined in source)
file_desc / file_ascUnverifiedBy file name (defined in source)
wiki_desc / wiki_ascUnverifiedBy wiki name (defined in source)
user_desc / user_ascUnverifiedBy user name (defined in source)
institution_desc / institution_ascUnverifiedBy institution name (defined in source)

relevance, title_asc, etc. are not defined in the source and were confirmed to result in 400 errors.


Fields Indexed (From Source Code)

From update_file(), update_node(), and update_user() in the source code elastic_search.py, the fields indexed for each category can be confirmed.

file (update_file())

FieldSource_all searchNotes
namefile_.nameHitsHighlight confirmed
normalized_nameunicode_normalize(file_.name)For kuromoji
node_titletarget.titleHitsParent project name
creator_nameObtained from user infoHits
modifier_nameObtained from user infoUnverified
tagsFile tagsHitsHighlight confirmed
normalized_tagsNormalized tagsHitsHighlight confirmed
extra_search_termsclean_splitters(file_.name)UnverifiedFile name tokenized
commentscomments_to_doc()UnverifiedNo test data available
node_publictarget.is_publicFor filtering
node_contributorsContributor ID listFor permission filtering
deep_urlFile URLFor display
date_created / date_modifiedDatetimeFor sorting
categoryFixed as "file"

folder_name and parent_title / parent_url are not included in the index. These are dynamically added during response by format_results() (confirmed in source code).

project (update_node())

FieldSource_all searchNotes
titlenode.titleHits
normalized_titleNormalized titleFor kuromoji
descriptionnode.descriptionHitsHighlight confirmed
normalized_descriptionNormalized descriptionFor kuromoji
tagsProject tagsHits
normalized_tagsNormalized tags
contributorsContributor info
creator_nameCreator nameHits
commentsCommentsUnverified
wikisWiki contentUnverifiedMapped via dynamic template
licenseLicense infoFor display
affiliated_institutionsAffiliated institutions
boostBoost value

user (update_user())

FieldSourceNotes
useruser.fullname
normalized_user / normalized_namesNormalized names
job / job_titleJob info
ongoing_job / ongoing_job_department / ongoing_job_titleCurrent workplace
school / ongoing_school*Education history
emailsEmail addresses
socialSNS links
boostFixed at 2Set to boost user search scores

Other Categories (Defined in Source Code)

According to the source code, the following categories exist in addition to file, project, and user.

  • component – Project sub-components
  • registration – Registrations (snapshots)
  • preprint – Preprints
  • wiki – Wiki pages (text field contains the body)
  • comment – Comments
  • institution – Institutions
  • collectionsubmission – Collection submissions

The text highlight field is most likely for Wiki page content (not file body text).


_all Field Search Targets

Elasticsearch Mapping (From Source Code)

The _all field is analyzed with kuromoji_analyzer. In create_index() of the source code, analyzers are configured for each field, and fields with analyzers configured are included in _all.

Search Targets Confirmed by Experiment

Search termMatched fieldCategoryConfirmed via highlight
"2507"name (file name)filehighlight[name]
"dmp-project-aaa"node_title (project name)file
"Nakamura"creator_name (creator name)file
"blockchain"tags (tags)filehighlight[tags]
"arxiv"tags / normalized_tagsfilehighlight[tags], highlight[normalized_tags]
"アーカイブズ学"description (project description)projecthighlight[description]
"digital-preservation"tags (project tags)project

Confirmed Not Searchable

Search termTarget fieldCategoryReason
"NII Storage"folder_namefileNot included in index (dynamically added by format_results())
"digital preservation framework"file_descriptionfileDataCite metadata is not searchable
"Clio-X".txt file bodyfileFile body text is not indexed
"Victoria Lemieux".txt file bodyfileSame as above

Filters

Filter formatResult
{"term": {"category": "file"}}Works
{"and": [{"term": ...}, {"term": ...}]}Works
{"bool": {"should": [...]}}Works
{"bool": {"must": [...]}}Results in 500 error

The reason bool + must fails is unknown. Since bool + must is used within build_private_search_query() in the source code, this is presumed to be a limitation on the API wrapper side. As a workaround, and filters or query_string AND syntax can be used.


Highlights

Confirmed

FieldCategorySearch term used for confirmation
namefile"ip2"
tagsfile"blockchain", "arxiv"
normalized_tagsfile"arxiv"
descriptionproject"アーカイブズ学"

Unverified

  • text – According to the source code, Wiki page body text goes into this field. It should be testable with a project that has a Wiki.
  • comments.* – Dynamic field for comments. Unverified because there were no comments in the test data.
  • title – Project title. Can be tested with search terms that match the title.
  • user – User name.

DataCite Metadata

Editable Fields

Managed via /v2/files/{id}/metadata_records/. However, the fields that OSF allows editing are limited to the following four (confirmed in the validation schema).

FieldTypeDescription
resource_typeenumAudio/Video, Dataset, Image, Model, Software, Book, Funding Submission, Journal Article, Lesson, Poster, Preprint, Presentation, Research Tool, Thesis, Other
file_descriptionstringFile description text
related_publication_doistringDOI of related publication (10.xxxx/yyyy format)
fundersarray[{"funding_agency": "...", "grant_number": "..."}]

All fields of the DataCite v4.0 schema (titles, creators, subjects, descriptions, etc.) are defined, but due to OSF’s input validation, fields other than the above four result in Additional properties are not allowed errors.

I set a value in file_description and searched for it, but it did not hit. There is also no processing to read DataCite metadata in update_file() in the source code, and DataCite metadata is not included in the search index.


Summary

What Was Confirmed

API behavior (experiments):

  • /api/v1/search/ is an Elasticsearch Query DSL-based search API
  • It can perform cross-category searches across file, project, and user
  • _all full-text search hits on name, node_title, creator_name, tags, and description
  • Field-specific queries (name:, tags:, category:), wildcards, and AND/OR operators can be used
  • term / and / bool+should filters work
  • sort was confirmed to work with 4 types: modified_desc/asc and created_desc/asc
  • size was confirmed to work up to 100
  • Highlights were confirmed to work for name, tags, normalized_tags, and description

Confirmed from source code:

  • kuromoji_analyzer is used in Japanese environments
  • folder_name, parent_title, and parent_url are not included in the index and are dynamically added during response
  • The text highlight field is for Wiki page content
  • sort has definitions for each direction of project, file, wiki, user, institution, created, and modified
  • api_version supports version 1 and 2, and vendor supports "grdm"
  • In addition to file, project, and user, categories component, registration, preprint, wiki, comment, and institution exist

Confirmed Not Searchable

  • Text file body (text within .txt files) – Confirmed by both experiment and source code
  • DataCite metadata (file_description, etc.) – Confirmed by both experiment and source code
  • folder_name (storage provider name) – Confirmed by both experiment and source code

Unverified

  • Whether comments hit in _all search
  • Wiki content (text field) search behavior
  • Behavior of sort values like project_desc
  • Exact upper limit of size
  • Cause of 500 error with bool + must filter