This article explains how to migrate data from Amazon Elasticsearch Service to another OpenSearch cluster. It introduces a simple and reliable migration method using the Scroll API and Bulk API.

Background

The need to migrate data between Elasticsearch/OpenSearch clusters can arise due to cloud service migration or cost optimization. This time, we performed a migration between the following environments.

  • Source: Amazon Elasticsearch Service (AWS)
  • Destination: Self-hosted OpenSearch

Migration Flow

  1. Check indices on source and destination
  2. Retrieve and adjust mapping information
  3. Create indices on the destination
  4. Migrate data with Scroll API + Bulk API
  5. Verify migration results

Preparation: Checking Indices

First, check the index lists on both the source and destination.

#c#cuuSrDrolelusr-t-cuiuen"a"iutunsisdeoeernrx::piplanaisdsssestwxwoorlrdid"s"t""hhttttppss::////sdoeusrtc-ec-lculsutsetre/r_/c_acta/ti/nidnidciecse?sv?&vs&=si=nidnedxe"x"

Step 1: Retrieve Mapping Information

Retrieve the mapping information from the source.

cur">lhtm-tasppsp-:iu/n/g"ssuo.sujersrco:enp-acslsuwsotredr"/i\ndex_name/_mapping"\

Step 2: Adjust Mappings

Depending on the destination environment, custom analyzers may not be available. For example, if the kuromoji plugin for Japanese morphological analysis is not installed, you need to remove the analyzer settings.

defr"iere"flem"itoRifuvesiffrecifoion_unrsrars'ionstadkrnirbaianeeestejlvnalymtemyecl,oamozleyovnvey(zbvecieroeja_en_(rbr[la(aoej''unoonbm,aeabbajoinljjl)vdnaiy,:y:eilnzzcoyeleatbzorirn)jeb(s(a::rjvtil'.a)ty]il:eztumeee)rm)ss(e)t:tingsfrommappings"""

Step 3: Create Indices on the Destination

Create indices using the adjusted mappings.

iifdmmreppofoomrrc#wmci}rprttrrialnereeeLtpedsitjrqaohpaepnuseutaminx'}'ofjatroqeedoan__s,mn"su(nnus_ppgmbeas{otfetimepsaotpednh"rssnanipdt''pe==Cet.dp(n=pyinni=sidrssaepfginuuntneepuxi'smn=gmmgr_dsaot(n{_agsbbseuettnhsgsdps{'ee'qrx_esooap:rr:ul_aeiufuti=__e}bu{.mriran{oocs/otdspclcgrfflt{dhetoeee=se__esdysar___msra.e,tttiijdohenps_unnsavap_utisHddoterlmt_n_Teena_dia(idcTxx.[ascpneoP,}lsn'apdxdB_ooa:sie}eadmaul'nx:seadry1:g}iisp(cz,s"Snctpfee1,tA_i)_ra[uini(t2tngnmu0hdsdas0e.ep,xjxp{,s]ir2o[ne0dn'gs1e'msp]s)a)otpn_apsusiern.lfgs,:st'ad]teusst__caoudteh})":)

Step 4: Data Migration Script

Use the Scroll API to retrieve data from the source and insert it into the destination using the Bulk API.

iifiSSDDdmmrmOOEEeppopUUSSfoomoRRTTrrrCC__mp#sq}ridshtpmesw#rprttrtEEUAircuefaciorirtheree__RUgiIrestrttigraiRqitjrqtUALTrnnorpraosanrorleunuseuiRUHatily""oeprlltarte#bf#hbie#erep#sidshlefjatroqemLT=t(tlsqnsre=l=(ts_uoeufllatrcfacies"su(nnuseH=efi_=iuspit_=fethPlrSalsSataiGrtrtat{otfet='_"auzeeonurid"d=iirkedkbehpeneosaosssSnh"Tss=hHi=lr{erntredadTmte_habbne_fdhaurim:peos=ttlfjacble.O==\rt.'tTn=il"y=s(nstao=0espbicuudrr"aeulefirrwe=(l"surr=l=dU{SnusahHtTd=z:"efp=att:aottllse{tatksgird(ffn_{otoe_SeR"O\eutTpPee=:r."Fo[aa0=rdikkBsDadh_urrnopmt""erSnhlasidclCsUnttTsBxMbesEand'[leyio__u=pE=e=rlefatrr=io\-xeO==lkcdareEcRChpP:a(iSfa{qtrlsah'tnnbbloSbrDetsot(sogtrtsU{S_rtot_rCosB/ssgc"t"uarsetihdiB=ook{nTusEsurefgtraP{pR"Oro=aleUoEmi:a/iorr{cmetoe.atiomuh=dd"s_l=Sp=ld"+rialrrboCsUel[l(Rl_pm/sdcuaoShasurj[stcel"iyyrCeUkhTotiiB=emtoaanEcRsld'LlAlp/ieArtlO_tts:s''su.k"t{eoR_e_nb.tf+usee-gtts_rCp_ahc}_UeoscsucilUscs_o_]'mts"++qn=LbaAsuge=lls.dreceUoEortio/iTtroAttenRih.c{ns[]eir:i==ut}dUelem'ketme:hRl_neatn_dHetuu-h_gACz_por(c'[nmeneerdeT.kteelni/is.=LlAss[sts"drtc(iPEeaode)rh'teqdjjsne_yrHs_(irree(mgs0}"Uep''ee::Hchl'n{I_,lsesoits(uessttqb,str'nrrnrheer:fr/:T.o_]xaTe(uudsUltplto:)exoo-uu,aeeoo(ri(la}e_Hsns[trs{T-'sseoR"(!olsts"nnTeltsrrrrhot)at{qs"tsc'ccmPcutexuL:s=n_'a{t:..yskupre'sirspemdue5aerhhriBlser,r}csi]ltddpt"soost:)-sdioeamt.oi/ogauer'c/{r2ed'o{uues,_nrui+se)gcsr"ujltslrssr',de{}o0.']t"mm".cssln=){sdrstc,sslsclait'e_s}l0t][a_pp:poe'tbt/a/sh_o_'r_tce,'siol:e'lissod.)[i1uaits./"cni]oieArptnu_xv}n(("sej:'tlrfreepsso(dlddu''a_druta"dahatsiektadcoccd)'l}}tpsiecr}l)ecip(=otm__et}srre]",hasnxel"uxttp=ne.rtle/-too,dswd}_,)e"i[l(mgeia{(ll!osoei':o'i2)sesmpitEll=cwrxnj]n_c0'tpesfoT"_uod,dsd)sa0](oetA,i2mr'eoeot::'ndra:d0ed)b{xns+uiisal"0n'ad}=trone>t}{::t)te/q_"cnd.eescs_uie/et0(ts,htsenn'xxe>{ac__erd"]-'xem:r{siaye)n,tl0i.oeinr,xd[sg0lrzdc,+j{:eerflreehas}2la}_o=x?u""o)00stsir1}st_\n:0ee"ds0chin"]d,}}0=r=d"}}0*,0=oS""1ee)=lO:)0nr:"lU0dr)=Rh/=o5Ci/'rmEtt's"_[o,"A't)U_afTillHd}u)'%s])h}=}"True)

Step 5: Execute Migration and Verify

#mmm#ciiiuEgggVrxrrreleaaarcttti-ueeefut___yeiii"nnnmumdddisieeegegxxxrrr(((a:a'''tptiiiiaittmosoeeansnmmgwssero12ser'''sd,,,u"l'''t"cccshjjjt---tiiipttmseea:mmg/sse/12sd'''e)))st-cluster/_cat/indices?v&s=index"

Notes and Best Practices

1. Handling Custom Analyzers

If plugins such as kuromoji are not available on the destination, the following options are available:

  • Migrate without analyzers: Japanese search accuracy will decrease, but data is preserved
  • Install plugins: Pre-install necessary plugins on the destination
  • Use alternative analyzers: Substitute with the standard analyzer, etc.

2. Adjusting Batch Size

#m#miiSgLgmraraaraltgtleee__didiononcdcdueuemxmxe(e(n'n'tstlsmsa:a:rlgulues_s_ededoolcscasmsr'a'g,l,elr'e'drdbeeasbsttatc_t_hsclmhasarilsgzliee'z',e,bbaattcchh__ssiizzee==5500000))

3. Parallel Execution

You can reduce overall migration time by migrating multiple indices in parallel.

#pppwyyyaRtttiuhhhtnooonnninmmmiiipgggarrrraaaatttleeel...eppplyyyiiiinttmeeatmmghssee12sbcccajjjc---kiiigttmreeaommgussen12sd&&&

4. Error Handling

It is recommended to implement a mechanism that checks Bulk API responses and records/retries documents that encountered errors.

ifrefsourltii.tfgeem'teee#l(irrro'nrrrRgeoooegrrrrrcire'__onosdrrgruioed.slncae't_sir)[iionr:'tdnoieertm==r(e.rfmgiio"settrF'teea](mmli:'[[oli''geniiddnn:eddxee{'xxe,''r]]r{[[o}''r)_e_:irddro'oc]r_'i]d[}'r-ea{seornr'o]r_reason}")

5. Verifying Document Counts

After migration, compare the document counts between the source and destination to verify data consistency.

#c#cuuSrDrolelusr-t-csisen-a-dutuoic"o"uunumsseedenrort:c:pupcamaosesusnsnwtwtoorcrdod"u"n"t"hhttttppss::////sdoeusrtc/e_/c_ocuonutn?tq?=q*="*"

Performance Estimates

Migration speed varies depending on the environment and document size, but generally the following can be expected:

  • Small to medium documents: 500-1,000 docs/sec
  • Large documents: 100-500 docs/sec
  • Network bandwidth: The distance and bandwidth between clusters significantly affects speed

Summary

By combining the Scroll API and Bulk API, even large-scale data can be migrated reliably. It is important to proceed with migration while paying attention to environmental differences such as the availability of custom analyzers.