Documentation

Version-controlled schema migrations for Elasticsearch and OpenSearch — Flyway for search engines.

Getting Started

Installation

npm install -g scaledsearch

This installs two equivalent commands — use whichever you prefer:

scaledsearch migrate apply    # full name
ss migrate apply              # shorthand

Requirements: Node.js >= 18. No cluster connection is needed for status, diff, validate, or apply --dry-run — those work fully offline.

Quick start — new project

# 1. Initialize ScaledSearch in your project
scaledsearch migrate init

# 2. Create a migration
scaledsearch migrate create "add-products-index"

# 3. Edit the generated YAML
#    migrations/V001__add-products-index.yaml

# 4. Preview the changes (offline, no cluster needed)
scaledsearch migrate apply --dry-run

# 5. Apply them
scaledsearch migrate apply

# 6. Check what's applied vs pending
scaledsearch migrate status

init creates a .scaledsearch/config.yaml and a migrations/ directory.

Quick start — existing cluster

Already have indices in production? Capture them as a baseline first, then version forward from there:

# 1. Initialize
scaledsearch migrate init

# 2. Import current cluster state as V000
scaledsearch migrate import

# 3. Start versioning from here
scaledsearch migrate create "add-vector-field"
scaledsearch migrate apply

import snapshots indices, mappings, settings, aliases, index templates, and ingest pipelines into V000__baseline.yaml and marks it as already applied so it never re-executes. System-owned objects are excluded automatically — see the importing guide.

Commands

All commands are subcommands of scaledsearch migrate (or ss migrate).

Command	Description
`migrate init`	Initialize ScaledSearch in the current directory
`migrate create <name>`	Create a new versioned migration file
`migrate status`	Show applied vs pending migrations
`migrate apply`	Apply pending migrations to the cluster
`migrate diff`	Show detailed pending changes
`migrate validate`	Validate files and simulate their end-state (offline)
`migrate import`	Import an existing cluster as a `V000` baseline
`migrate rollback`	Undo the last applied migration

Commands that work offline (no cluster connection): init, create, status, diff, validate, and apply --dry-run.

migrate apply

scaledsearch migrate apply                 # apply all pending
scaledsearch migrate apply --dry-run       # preview only, offline
scaledsearch migrate apply --target V003   # apply up to and including V003

--dry-run prints what would run without touching the cluster, and honors --target.
--target <version> stops after the given version; an unknown or already-applied target produces a friendly error.
A migration that fails is not recorded as applied, so re-running resumes correctly.

Each migration is checksum-validated against the recorded history before running, and a lock prevents concurrent runs.

migrate rollback

Undoes the last applied migration by running its rollback: section. Refuses to run when nothing is applied, or when the last migration has no rollback: section defined.

scaledsearch migrate rollback

Operation Types

Every entry under a migration's operations: (or rollback:) list has a type. ScaledSearch supports 15 operation types across 6 categories.

Category	Operations
Index	`create_index`, `delete_index`, `close_index`, `open_index`
Schema	`put_mapping`, `put_settings`
Data	`reindex` (async with progress)
Alias	`add_alias`, `remove_alias`, `swap_alias`
Template	`put_template`, `delete_template`
Pipeline	`put_pipeline`, `delete_pipeline`
Generic	`api_call` (any REST API)

Index & Schema

- type: create_index
  index: products
  settings: { number_of_shards: 2, number_of_replicas: 1 }
  mappings:
    properties:
      title: { type: text }

- type: put_mapping
  index: products
  body:
    properties:
      in_stock: { type: boolean }

Data — reindex async

Reindex runs asynchronously with real-time progress tracking. No configuration needed. If the CLI disconnects, the reindex keeps running on the cluster.

- type: reindex
  source: products_v1
  dest: products_v2

Applying V003 Migrate to products_v2... 45% (4,500,000/10,000,000 docs) done (42m)

Alias

# Add an alias
- type: add_alias
  index: products_v2
  alias: products

# Atomic swap (remove + add in a single cluster call)
- type: swap_alias
  alias: products
  from: products_v1
  to: products_v2

See the zero-downtime guide for the full alias-swap pattern.

Generic — api_call

An escape hatch for any Elasticsearch/OpenSearch REST API not covered by a dedicated operation:

- type: api_call
  method: PUT
  path: /_cluster/settings
  body:
    persistent:
      cluster.routing.allocation.disk.watermark.high: "90%"

Works with any API: ILM policies, cluster settings, component templates, and more.

Migration File Format

Migrations are YAML files in your migrations/ directory, named with an auto-incrementing version prefix:

migrations/
├── V000__baseline.yaml      # optional, created by `import`
├── V001__add-products.yaml
└── V002__add-vector-field.yaml

Structure

description: "Create products index with vector search"
engine: elasticsearch
target_version: ">=8.0"
operations:
  - type: create_index
    index: products
    mappings:
      properties:
        embedding: { type: dense_vector, dims: 768 }
rollback:
  - type: delete_index
    index: products

Field	Required	Description
`description`	recommended	Human-readable summary of the migration
`engine`	optional	`elasticsearch` or `opensearch`
`target_version`	optional	Version constraint (e.g. `">=8.0"`) checked at apply time
`operations`	yes	Ordered list of operations to apply
`rollback`	optional	Ordered list of operations to undo this migration

Versioning & checksums

Files are applied in ascending version order (V001, V002, …). When a migration is applied, ScaledSearch records a checksum of the file in the history index. On every subsequent run it re-checks that checksum: if an already-applied file has been modified, the run fails loudly rather than silently diverging from what was actually applied to the cluster.

Configuration

migrate init writes .scaledsearch/config.yaml in your project. Commit it to git.

# .scaledsearch/config.yaml
engine: elasticsearch
connection:
  host: http://localhost:9200
migrations:
  location: ./migrations
history:
  index: .scaledsearch_history

Key	Description
`engine`	`elasticsearch` or `opensearch`
`connection.host`	Cluster URL
`connection.auth`	Optional auth block — see below
`migrations.location`	Directory holding migration files
`history.index`	Internal index that records applied migrations

History index

ScaledSearch tracks what has been applied in an internal index (default .scaledsearch_history). init derives a per-project history index name, so multiple projects pointing at the same cluster keep separate histories. It stores, per migration: the version, a checksum, and the applied timestamp. Failed migrations are not recorded as applied.

Authentication

# Basic auth
connection:
  host: https://my-cluster:9200
  auth:
    type: basic
    username: elastic
    password: changeme

# API key
connection:
  auth:
    type: apikey
    apiKey: your-base64-api-key

Avoid committing plaintext credentials. Prefer environment-specific config or a secrets manager for production clusters.

Engines

Engine	Versions	Status
Elasticsearch	7.x, 8.x, 9.x	✓ Verified
OpenSearch	1.x, 2.x, 3.x	✓ Verified
Solr	8.x, 9.x	Coming soon

Tested against: ES 7.17, ES 8.17, ES 9.0, OpenSearch 2.19, OpenSearch 3.0. Elasticsearch and OpenSearch both use the official @elastic/elasticsearch client, which is wire-compatible across ES 7–9 and OpenSearch.

Version constraints

A migration may declare a target_version constraint that is checked at apply time. If the connected cluster doesn't satisfy it, the migration won't be applied — useful for version-gated features like dense_vector.

target_version: ">=8.0"

Guide — Zero-Downtime Migrations

Changing a mapping in place is often impossible — many mapping changes require a new index. The standard zero-downtime pattern is: build a new index, reindex into it, then atomically swap an alias so reads/writes never point at a half-built index.

description: "Migrate to products_v2 with zero downtime"
operations:
  # 1. Create the new index with the updated mapping
  - type: create_index
    index: products_v2
    mappings:
      properties:
        embedding: { type: dense_vector, dims: 768 }

  # 2. Reindex existing data (runs async with progress)
  - type: reindex
    source: products_v1
    dest: products_v2

  # 3. Atomically point the `products` alias at the new index
  - type: swap_alias
    alias: products
    from: products_v1
    to: products_v2

# Safe rollback: just swap the alias back. Both indices still exist.
rollback:
  - type: swap_alias
    alias: products
    from: products_v2
    to: products_v1

Why this is safe

swap_alias is atomic — it removes the old alias target and adds the new one in a single cluster call, so there is no moment where products resolves to nothing.
Rollback is instant and lossless — because the old index is left in place, the rollback is just the reverse swap. No data is deleted by the migration itself.
Reads/writes use the alias, never the concrete index name, so clients are unaffected by the swap.

Keep the old index around until you've verified the new one in production; delete it in a later migration once you're confident.

Guide — Importing an Existing Cluster

If you already have indices in production, you don't have to recreate them as migrations by hand. migrate import snapshots the live cluster into a baseline migration and marks it as already applied — so you can start version-controlling from where you are today.

scaledsearch migrate init
scaledsearch migrate import

This writes migrations/V000__baseline.yaml and records it in the history index as applied (so apply never tries to re-run it).

What gets captured

Indices, with their mappings and settings
Aliases, including alias options
Closed-index state (closed indices are captured as closed)
Index templates and ingest pipelines

What gets excluded

import deliberately skips engine-owned objects so your baseline is your schema, not the cluster's internal plumbing:

Leading-dot system indices/templates — universally system-owned in ES and OpenSearch
Elasticsearch built-ins — APM, Fleet, ML, monitoring, ILM/SLM history, watcher, connectors, behavioral analytics, and @template / @pipeline convention names
OpenSearch plugin state — .opensearch-*, .opendistro-*, .plugins-*, top_queries-*, .tasks

import refuses to overwrite an existing V000__baseline.yaml.

Guide — Validating Offline

migrate validate does two things, entirely offline (no cluster connection):

Checks file integrity — that every migration file parses, has the required fields, and uses known operation types.
Simulates the end-state — it replays your migrations in order against an in-memory model of the cluster, catching ordering and reference problems before you touch a real cluster.

What the simulator catches

An operation that targets an index which won't exist yet at that point in the sequence
A reindex whose destination is never created
Wildcard targets that don't resolve to anything in the simulated state
An alias swap referencing an index that was already deleted

validate is ideal in CI: fast, cluster-free confidence that a pull request's migrations are internally consistent before they're ever applied.

scaledsearch migrate validate          # is the whole set consistent?
scaledsearch migrate apply --dry-run   # what would the next apply do?