Exporting Web Annotations via the Hypothes.is API and Converting to TEI/XML

Introduction

Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML.

This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML.

Obtaining an API Key

Log in to Hypothes.is
Go to Developer settings
Generate an API key with “Generate your API token”

Save the obtained key in a .env file.

Exporting Annotations

API Basics

The base URL for the Hypothes.is API is https://api.hypothes.is/api. Authentication is done via the Authorization: Bearer <API_KEY> header.

Key endpoints:

Endpoint	Purpose
`GET /api/profile`	Get authenticated user’s profile
`GET /api/search`	Search annotations
`GET /api/annotations/{id}`	Get individual annotation

Script

The export through TEI/XML conversion is consolidated in a single script hypothes_export.py.

https://github.com/nakamura196/hypothes-export/blob/main/hypothes_export.py

Below, the main processing is excerpted and explained.

Loading .env and API Calls

Fetching All Annotations (with Pagination)

The Search API returns a maximum of 200 results per request, so all annotations are fetched by incrementing the offset.

Execution

Annotation Data Structure

Each annotation in the exported JSON has a structure based on the W3C Web Annotation Data Model.

Three Types of Selectors

Hypothes.is records the text position of annotation targets using three types of selectors.

Selector	Mechanism	Robustness
RangeSelector	Specifies position using XPath on the DOM	Fair - Vulnerable to HTML structure changes
TextPositionSelector	Specifies by character offset position	Fair - Shifts with text additions/deletions
TextQuoteSelector	Specifies by target text + surrounding context	Excellent - Can re-anchor via fuzzy match

When the source document changes, Hypothes.is attempts these selectors as fallbacks in sequence. TextQuoteSelector performs fuzzy matching including prefix/suffix, making it the most robust, but if the target text itself is deleted or significantly modified, the annotation becomes “orphaned.”

Conversion to TEI/XML

The exported JSON is converted to TEI/XML format.

Mapping Strategy

Hypothes.is	TEI/XML
Target document (URI, title)	`<sourceDesc><bibl>`
Group by document	`<div>`
Each annotation	`<ab>`
Highlighted text (`TextQuoteSelector.exact`)	`<quote>`
Comment body	`<note type="annotation">`
Tags	`<note type="tag">`

Conversion Logic

Quote text is extracted from TextQuoteSelector and mapped to TEI elements.

Annotations are grouped by URI and output in the structure <div> -> <ab> -> <quote> / <note>. See the source code for details.

Output Example

Source Document Changes and Annotation Consistency

Hypothes.is annotations use a “standoff annotation” approach, stored separately from the source document. Therefore, when the source document changes, annotation positions may shift.

Minor changes: Often re-anchored via TextQuoteSelector fuzzy matching
Major changes: Annotations become “orphaned” and are no longer linked to their target locations

By exporting to TEI/XML, the highlighted target text is recorded in <quote> elements, so the correspondence with the source document is at least preserved as a record.

Summary

The Hypothes.is API allows programmatic retrieval of your annotations
TextQuoteSelector’s exact/prefix/suffix are most important for identifying annotation target text
Converting to TEI/XML enables storage and utilization in a format widely used in humanities research
However, be aware of anchoring shifts due to source document changes

The source code is published on GitHub.

Introduction#

Obtaining an API Key#

Exporting Annotations#

API Basics#

Script#

Loading .env and API Calls#

Fetching All Annotations (with Pagination)#

Execution#

Annotation Data Structure#

Three Types of Selectors#

Conversion to TEI/XML#

Mapping Strategy#

Conversion Logic#

Output Example#

Source Document Changes and Annotation Consistency#

Summary#