Documentation

Version-controlled schema migrations for Elasticsearch and OpenSearch — Flyway for search engines.

Getting Started

Installation

npm install -g scaledsearch

This installs two equivalent commands — use whichever you prefer:

scaledsearch migrate apply    # full name
ss migrate apply              # shorthand

Requirements: Node.js >= 18. No cluster connection is needed for status, diff, validate, or apply --dry-run — those work fully offline.

Quick start — new project

# 1. Initialize ScaledSearch in your project
scaledsearch migrate init

# 2. Create a migration
scaledsearch migrate create "add-products-index"

# 3. Edit the generated YAML
#    migrations/V001__add-products-index.yaml

# 4. Preview the changes (offline, no cluster needed)
scaledsearch migrate apply --dry-run

# 5. Apply them
scaledsearch migrate apply

# 6. Check what's applied vs pending
scaledsearch migrate status

init creates a .scaledsearch/config.yaml and a migrations/ directory.

Quick start — existing cluster

Already have indices in production? Capture them as a baseline first, then version forward from there:

# 1. Initialize
scaledsearch migrate init

# 2. Import current cluster state as V000
scaledsearch migrate import

# 3. Start versioning from here
scaledsearch migrate create "add-vector-field"
scaledsearch migrate apply

import snapshots indices, mappings, settings, aliases, index templates, and ingest pipelines into V000__baseline.yaml and marks it as already applied so it never re-executes. System-owned objects are excluded automatically — see the importing guide.

Commands

All commands are subcommands of scaledsearch migrate (or ss migrate).

CommandDescription
migrate initInitialize ScaledSearch in the current directory
migrate create <name>Create a new versioned migration file
migrate statusShow applied vs pending migrations
migrate applyApply pending migrations to the cluster
migrate diffShow detailed pending changes
migrate validateValidate files and simulate their end-state (offline)
migrate importImport an existing cluster as a V000 baseline
migrate rollbackUndo the last applied migration

Commands that work offline (no cluster connection): init, create, status, diff, validate, and apply --dry-run.

migrate apply

scaledsearch migrate apply                 # apply all pending
scaledsearch migrate apply --dry-run       # preview only, offline
scaledsearch migrate apply --target V003   # apply up to and including V003
  • --dry-run prints what would run without touching the cluster, and honors --target.
  • --target <version> stops after the given version; an unknown or already-applied target produces a friendly error.
  • A migration that fails is not recorded as applied, so re-running resumes correctly.
Each migration is checksum-validated against the recorded history before running, and a lock prevents concurrent runs.

migrate rollback

Undoes the last applied migration by running its rollback: section. Refuses to run when nothing is applied, or when the last migration has no rollback: section defined.

scaledsearch migrate rollback

Operation Types

Every entry under a migration's operations: (or rollback:) list has a type. ScaledSearch supports 15 operation types across 6 categories.

CategoryOperations
Indexcreate_index, delete_index, close_index, open_index
Schemaput_mapping, put_settings
Datareindex (async with progress)
Aliasadd_alias, remove_alias, swap_alias
Templateput_template, delete_template
Pipelineput_pipeline, delete_pipeline
Genericapi_call (any REST API)

Index & Schema

- type: create_index
  index: products
  settings: { number_of_shards: 2, number_of_replicas: 1 }
  mappings:
    properties:
      title: { type: text }

- type: put_mapping
  index: products
  body:
    properties:
      in_stock: { type: boolean }

Data — reindex async

Reindex runs asynchronously with real-time progress tracking. No configuration needed. If the CLI disconnects, the reindex keeps running on the cluster.

- type: reindex
  source: products_v1
  dest: products_v2
Applying V003 Migrate to products_v2... 45% (4,500,000/10,000,000 docs) done (42m)

Alias

# Add an alias
- type: add_alias
  index: products_v2
  alias: products

# Atomic swap (remove + add in a single cluster call)
- type: swap_alias
  alias: products
  from: products_v1
  to: products_v2

See the zero-downtime guide for the full alias-swap pattern.

Generic — api_call

An escape hatch for any Elasticsearch/OpenSearch REST API not covered by a dedicated operation:

- type: api_call
  method: PUT
  path: /_cluster/settings
  body:
    persistent:
      cluster.routing.allocation.disk.watermark.high: "90%"

Works with any API: ILM policies, cluster settings, component templates, and more.

Migration File Format

Migrations are YAML files in your migrations/ directory, named with an auto-incrementing version prefix:

migrations/
├── V000__baseline.yaml      # optional, created by `import`
├── V001__add-products.yaml
└── V002__add-vector-field.yaml

Structure

description: "Create products index with vector search"
engine: elasticsearch
target_version: ">=8.0"
operations:
  - type: create_index
    index: products
    mappings:
      properties:
        embedding: { type: dense_vector, dims: 768 }
rollback:
  - type: delete_index
    index: products
FieldRequiredDescription
descriptionrecommendedHuman-readable summary of the migration
engineoptionalelasticsearch or opensearch
target_versionoptionalVersion constraint (e.g. ">=8.0") checked at apply time
operationsyesOrdered list of operations to apply
rollbackoptionalOrdered list of operations to undo this migration

Versioning & checksums

Files are applied in ascending version order (V001, V002, …). When a migration is applied, ScaledSearch records a checksum of the file in the history index. On every subsequent run it re-checks that checksum: if an already-applied file has been modified, the run fails loudly rather than silently diverging from what was actually applied to the cluster.

Configuration

migrate init writes .scaledsearch/config.yaml in your project. Commit it to git.

# .scaledsearch/config.yaml
engine: elasticsearch
connection:
  host: http://localhost:9200
migrations:
  location: ./migrations
history:
  index: .scaledsearch_history
KeyDescription
engineelasticsearch or opensearch
connection.hostCluster URL
connection.authOptional auth block — see below
migrations.locationDirectory holding migration files
history.indexInternal index that records applied migrations

History index

ScaledSearch tracks what has been applied in an internal index (default .scaledsearch_history). init derives a per-project history index name, so multiple projects pointing at the same cluster keep separate histories. It stores, per migration: the version, a checksum, and the applied timestamp. Failed migrations are not recorded as applied.

Authentication

# Basic auth
connection:
  host: https://my-cluster:9200
  auth:
    type: basic
    username: elastic
    password: changeme

# API key
connection:
  auth:
    type: apikey
    apiKey: your-base64-api-key
Avoid committing plaintext credentials. Prefer environment-specific config or a secrets manager for production clusters.

Engines

EngineVersionsStatus
Elasticsearch7.x, 8.x, 9.x Verified
OpenSearch1.x, 2.x, 3.x Verified
Solr8.x, 9.xComing soon

Tested against: ES 7.17, ES 8.17, ES 9.0, OpenSearch 2.19, OpenSearch 3.0. Elasticsearch and OpenSearch both use the official @elastic/elasticsearch client, which is wire-compatible across ES 7–9 and OpenSearch.

Version constraints

A migration may declare a target_version constraint that is checked at apply time. If the connected cluster doesn't satisfy it, the migration won't be applied — useful for version-gated features like dense_vector.

target_version: ">=8.0"

Guide — Zero-Downtime Migrations

Changing a mapping in place is often impossible — many mapping changes require a new index. The standard zero-downtime pattern is: build a new index, reindex into it, then atomically swap an alias so reads/writes never point at a half-built index.

description: "Migrate to products_v2 with zero downtime"
operations:
  # 1. Create the new index with the updated mapping
  - type: create_index
    index: products_v2
    mappings:
      properties:
        embedding: { type: dense_vector, dims: 768 }

  # 2. Reindex existing data (runs async with progress)
  - type: reindex
    source: products_v1
    dest: products_v2

  # 3. Atomically point the `products` alias at the new index
  - type: swap_alias
    alias: products
    from: products_v1
    to: products_v2

# Safe rollback: just swap the alias back. Both indices still exist.
rollback:
  - type: swap_alias
    alias: products
    from: products_v2
    to: products_v1

Why this is safe

  • swap_alias is atomic — it removes the old alias target and adds the new one in a single cluster call, so there is no moment where products resolves to nothing.
  • Rollback is instant and lossless — because the old index is left in place, the rollback is just the reverse swap. No data is deleted by the migration itself.
  • Reads/writes use the alias, never the concrete index name, so clients are unaffected by the swap.
Keep the old index around until you've verified the new one in production; delete it in a later migration once you're confident.

Guide — Importing an Existing Cluster

If you already have indices in production, you don't have to recreate them as migrations by hand. migrate import snapshots the live cluster into a baseline migration and marks it as already applied — so you can start version-controlling from where you are today.

scaledsearch migrate init
scaledsearch migrate import

This writes migrations/V000__baseline.yaml and records it in the history index as applied (so apply never tries to re-run it).

What gets captured

  • Indices, with their mappings and settings
  • Aliases, including alias options
  • Closed-index state (closed indices are captured as closed)
  • Index templates and ingest pipelines

What gets excluded

import deliberately skips engine-owned objects so your baseline is your schema, not the cluster's internal plumbing:

  • Leading-dot system indices/templates — universally system-owned in ES and OpenSearch
  • Elasticsearch built-ins — APM, Fleet, ML, monitoring, ILM/SLM history, watcher, connectors, behavioral analytics, and @template / @pipeline convention names
  • OpenSearch plugin state.opensearch-*, .opendistro-*, .plugins-*, top_queries-*, .tasks

import refuses to overwrite an existing V000__baseline.yaml.

Guide — Validating Offline

migrate validate does two things, entirely offline (no cluster connection):

  • Checks file integrity — that every migration file parses, has the required fields, and uses known operation types.
  • Simulates the end-state — it replays your migrations in order against an in-memory model of the cluster, catching ordering and reference problems before you touch a real cluster.

What the simulator catches

  • An operation that targets an index which won't exist yet at that point in the sequence
  • A reindex whose destination is never created
  • Wildcard targets that don't resolve to anything in the simulated state
  • An alias swap referencing an index that was already deleted

validate is ideal in CI: fast, cluster-free confidence that a pull request's migrations are internally consistent before they're ever applied.

scaledsearch migrate validate          # is the whole set consistent?
scaledsearch migrate apply --dry-run   # what would the next apply do?