Introduction

Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML.

This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML.

Obtaining an API Key

  1. Log in to Hypothes.is
  2. Go to Developer settings
  3. Generate an API key with “Generate your API token”

Save the obtained key in a .env file.

c#pE.deintv..eexnavmptloes.eetnvtheAPIkey
HYPOTHESIS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Exporting Annotations

API Basics

The base URL for the Hypothes.is API is https://api.hypothes.is/api. Authentication is done via the Authorization: Bearer <API_KEY> header.

Key endpoints:

EndpointPurpose
GET /api/profileGet authenticated user’s profile
GET /api/searchSearch annotations
GET /api/annotations/{id}Get individual annotation

Script

The export through TEI/XML conversion is consolidated in a single script hypothes_export.py.

https://github.com/nakamura196/hypothes-export/blob/main/hypothes_export.py

Below, the main processing is excerpted and explained.

Loading .env and API Calls

ddeefflewaauirrwonipprfeeiavtiilqqtd_h__p.h_pfgk=au=areaooeerrduentprtyfaludrtvhe("mr_lu(nllie=hs+lhlr)=(iifnt:=lein:enndotiabPneelpsp"bd.javikoo.s?.erst_i=n,sie:"rreohpne.nn/e(qn(alvet/+q"u._tfian,iauAel_h:nn=vrpueusof)edipoirsttai.lran.lth.dlasnior[hl.ousestonna"yiRrr(_rte[mHpbeilr_fi.ksYo.qzoe):pls.=Ptpuaps.(ipsNOhaetepp)nltoTersin.aeirnHssto(rr.tieE.e(nrees(p)Si.u"eant"(:Isur,qdta=)S/rl)(r"]_al)f)/t,Ape"a.s=PinBsd"w1I/cee.i)_{oarcet.KedreonhsEneesdv(tYd(rpe""r"pp:(#i]oa{)"pira))(nap)tmia}s_n")kdey"}="")inline:

Fetching All Annotations (with Pagination)

The Search API returns a maximum of 200 results per request, so all annotations are fetched by incrementing the offset.

deffpualortaowrerslifeolfhetoelmfstlfitcfr_isua_sluhiatellaeeraor_l=nttntelfnaen==noslflpo==o+fu_sal=rt2rt=flael_oa00aeastntlaaft0pstlen_npiiiuiit=o+anilo_lomt=no_engtni<aantg[se[stptloae"t".tiiitttu=(teo_omai(s"oxtgnito"e[sttaestinpr]eaelt.osrialn:(en(odr"d"xs)f"c](st:i]hreel"eane,srd"uc(){lhr"t"eu[,ss"uer{lro"t"wu[:ss""eru]ros)"we:sr",u]s)"elri,mi"tl"i:milti"m:itl,im"iotf,fs"eotf"f:se0t}"):offset})

Execution

#p#p#pyyyOtOtCtuhuhohtotonopnpnvnuueththrhyytyJpJppSoSofoOtOtrtNhNhoheeme+soss_n_e_TelexeExyxixIppsp/ootoXrrirMttntL..g.pppyyJySO-N-jtsteooin--TooEnnIll/yyXMLonly
UTSSsoaaetvvraee:ldd:aJTc6SEcOItaN/:n:XynMoooLutu:rat_tpouiuusotten/prsaunntanmaetn@anhtoyitpoaonttsih.oejnsss.o.inxsm(l6annotations)

Annotation Data Structure

Each annotation in the exported JSON has a structure based on the W3C Web Annotation Data Model.

{}"""""""]icuutttdrsreaa{}"eeixgr:ar"tsg""]t":""ess"e:::toe{}{}{}ad""ul,,1""h"[:re""""""""""""l:atI"cctsseetsetepsBctsm[etyttnnytnyxruU"cpe"opaaddpadpaefh2tstm:rerrCOer"ecffP0::ho""ttof"t:"tiid2yi""::COnf:":"xxE6/s]hofts:1:""f-ue,t["nfae"6"::G0rxctRtsitT16T"-2_aopaaen"e67e""L-umrsnite:x6xk2spr:gn"rt3t87ele/ee:"0P,QiTrec/Sr:ou"V1n.tee"3so,73ac?xl:3"itG:mo"ae5/teT0em,mc",miS 38@/ptaoe w:hplminl "3yaeranSe,3pg."i[ec.oec,n1lt4t"o[]eo2h,m1/cr7e/]dt"7sp/io,7.advr2igi["\+sev1,n0""[] 0,,1/ :]p 0/[ 0p1 ""[] ,1/ ]s "p",a,n[4]",

Three Types of Selectors

Hypothes.is records the text position of annotation targets using three types of selectors.

SelectorMechanismRobustness
RangeSelectorSpecifies position using XPath on the DOMFair - Vulnerable to HTML structure changes
TextPositionSelectorSpecifies by character offset positionFair - Shifts with text additions/deletions
TextQuoteSelectorSpecifies by target text + surrounding contextExcellent - Can re-anchor via fuzzy match

When the source document changes, Hypothes.is attempts these selectors as fallbacks in sequence. TextQuoteSelector performs fuzzy matching including prefix/suffix, making it the most robust, but if the target text itself is deleted or significantly modified, the annotation becomes “orphaned.”

Conversion to TEI/XML

The exported JSON is converted to TEI/XML format.

Mapping Strategy

Hypothes.isTEI/XML
Target document (URI, title)<sourceDesc><bibl>
Group by document<div>
Each annotation<ab>
Highlighted text (TextQuoteSelector.exact)<quote>
Comment body<note type="annotation">
Tags<note type="tag">

Conversion Logic

Quote text is extracted from TextQuoteSelector and mapped to TEI elements.

defg"fre"oet"rt_GutetfretaonxrrtegN_xesioqatefnucleotist/nierepnle(ra.taentgunfnaerniortnoxtg(t/ae"sastttetui.ylifogpofneeni.t")xg():e"fts=r(e=o"lmte"acTTrteegoxxerttt"QQ",uu,oo[tt[]ee])SS):ee:lleeccttoorr"":""

Annotations are grouped by URI and output in the structure <div> -> <ab> -> <quote> / <note>. See the source code for details.

Output Example

<<?T/xE<<TmIt/t/Ele<te<tIxif/exb/e>vmHi<<<fito<bxelelt/p/s/iH>dd/otrnaei<tu<po<sleyi<<dd>ssdDttibpuub/oea>vha/iyi=eelitl>bri<<buDdeb<<av>o"rsetliElcbtrireecaqn/b>nh>cSlecxieliebcsrodxuoIn>=t>teSapcDtflec>r>motso"tm>ttoaexl>D>rltet1ptHmirtsmetee:ete.:>ytoticl>assi>tcwh>0/p>neo>:rcpdyohi"/oSdnig>==preswttSde""ernewhmft=t#a=e=cnwetrm"=sn<"s"oc.s>ots"rn/ap2rot.m>rhc-qn=0rdeict-aun"2eiisH-t0<1ooh6cn-y0p"/lttt-tgcAp"s>hBeat0?=.no><:eU>tp2"ont//ahis-Uroht/dPo:2Tgteie>dn/7F/astxE"/T-nt.lafh18siiemGy3"/os>p"p:?1nl>o0>.sAet80P.h:"EIce3>x<os3p/m..op/i4r>ps2ta/7<ga7/e/7t"a2i>1+thl0ltB0etU:>ph0sP0:d"/E>/feGx"ample.com/page</ref>

Source Document Changes and Annotation Consistency

Hypothes.is annotations use a “standoff annotation” approach, stored separately from the source document. Therefore, when the source document changes, annotation positions may shift.

  • Minor changes: Often re-anchored via TextQuoteSelector fuzzy matching
  • Major changes: Annotations become “orphaned” and are no longer linked to their target locations

By exporting to TEI/XML, the highlighted target text is recorded in <quote> elements, so the correspondence with the source document is at least preserved as a record.

Summary

  • The Hypothes.is API allows programmatic retrieval of your annotations
  • TextQuoteSelector’s exact/prefix/suffix are most important for identifying annotation target text
  • Converting to TEI/XML enables storage and utilization in a format widely used in humanities research
  • However, be aware of anchoring shifts due to source document changes

The source code is published on GitHub.