Migrate an Index: Vector Quantization, Resume, Backup, and Wizard#
Warning
The index migrator is an experimental feature. APIs, CLI commands, and on-disk formats (plans, backups) may change in future releases. Review migration plans carefully before applying them to production indexes.
This guide walks through a vector quantization migration
(float32 -> float16) end to end using the programmatic API. You will
learn how to:
Build a schema patch that describes the change
Generate and review a migration plan (read-only)
Apply the migration with a mandatory on-disk backup
Find where the backup lives and inspect its progress header
Understand crash-safe resume and safely re-run a migration
Reload original vectors from the backup (rollback)
Build and apply the same migration through the wizard
For conceptual background see Index Migrations and the Migrate an Index how-to.
Prerequisites: a running Redis 8.0+ (or Redis Stack) at
redis://localhost:6379 and redisvl installed.
import os
import glob
import numpy as np
import yaml
from redisvl.index import SearchIndex
from redisvl.redis.utils import array_to_buffer
from redisvl.query import VectorQuery
REDIS_URL = "redis://localhost:6379"
INDEX_NAME = "products"
DIMS = 8
N_DOCS = 600
np.random.seed(42)
def delete_matching(client, pattern, batch_size=500):
deleted = 0
batch = []
for key in client.scan_iter(match=pattern, count=batch_size):
batch.append(key)
if len(batch) >= batch_size:
deleted += client.delete(*batch)
batch = []
if batch:
deleted += client.delete(*batch)
return deleted
1. Create a source index with float32 vectors#
We start with a small Hash index whose embedding field stores full
precision float32 vectors, then load some random data.
schema = {
"index": {
"name": INDEX_NAME,
"prefix": "product",
"storage_type": "hash",
},
"fields": [
{"name": "name", "type": "text"},
{"name": "category", "type": "tag"},
{
"name": "embedding",
"type": "vector",
"attrs": {
"algorithm": "flat",
"dims": DIMS,
"distance_metric": "cosine",
"datatype": "float32",
},
},
],
}
index = SearchIndex.from_dict(schema, redis_url=REDIS_URL)
if index.exists():
index.delete(drop=True)
stale_keys = delete_matching(index.client, "product:*")
if stale_keys:
print(f"Removed {stale_keys} stale demo key(s)")
index.create()
vectors = np.random.rand(N_DOCS, DIMS).astype(np.float32)
data = [
{
"name": f"product {i}",
"category": "electronics" if i % 2 == 0 else "books",
"embedding": array_to_buffer(vectors[i], dtype="float32"),
}
for i in range(N_DOCS)
]
keys = index.load(data)
print(f"Loaded {len(keys)} documents into '{INDEX_NAME}'")
Loaded 600 documents into 'products'
!rvl index listall --url redis://localhost:6379
Indices:
1. products
2. Describe the change with a schema patch#
A schema patch lists only what changes. Here we update the
embedding field’s datatype from float32 to float16 (a 2x memory
reduction). We write it to a YAML file the planner can read.
patch = {
"version": 1,
"changes": {
"update_fields": [
{"name": "embedding", "attrs": {"datatype": "float16"}},
]
},
}
with open("schema_patch.yaml", "w") as f:
yaml.safe_dump(patch, f, sort_keys=False)
print(open("schema_patch.yaml").read())
version: 1
changes:
update_fields:
- name: embedding
attrs:
datatype: float16
3. Create a migration plan (read-only)#
create_plan snapshots the live index, diffs it against the patch, and
returns a MigrationPlan. No data is modified. Review the warnings
and the classified changes before applying.
from redisvl.migration import MigrationPlanner, MigrationExecutor
from redisvl.migration.utils import write_yaml
planner = MigrationPlanner()
plan = planner.create_plan(
index_name=INDEX_NAME,
schema_patch_path="schema_patch.yaml",
redis_url=REDIS_URL,
)
print("Index: ", plan.source.index_name)
print("Mode: ", plan.mode)
print("Requested: ", plan.requested_changes)
print("Warnings: ")
for w in plan.warnings:
print(" -", w)
# Plans can be persisted to YAML and reloaded later (or via the CLI)
write_yaml(plan.model_dump(), "migration_plan.yaml")
print("\nSaved plan to migration_plan.yaml")
Index: products
Mode: drop_recreate
Requested: {'version': 1, 'changes': {'add_fields': [], 'remove_fields': [], 'update_fields': [{'name': 'embedding', 'attrs': {'datatype': 'float16'}, 'options': {}}], 'rename_fields': [], 'index': {}}}
Warnings:
- Index downtime is required
Saved plan to migration_plan.yaml
4. Apply the migration with a mandatory backup#
The executor requires backup_dir before applying any migration. For
quantization, it writes original vectors to disk before mutating them. If
you omit backup_dir, or pass a path that cannot be created or written,
the migration fails before touching the index. The returned report records
the resolved backup directory and any backup file prefixes used.
We also pass a progress_callback to watch each phase.
Note
This drops and recreates the index definition. Documents are preserved; only the index structure and vector encoding change. Pause writes during the migration window.
BACKUP_DIR = "./migration_backups"
# The migration does not start without a usable backup directory.
executor = MigrationExecutor()
try:
executor.apply(plan, redis_url=REDIS_URL, backup_dir=None)
except ValueError as exc:
print("Missing backup_dir is rejected before migration:", exc)
BAD_BACKUP_DIR = "./not_a_backup_dir"
if os.path.isdir(BAD_BACKUP_DIR):
os.rmdir(BAD_BACKUP_DIR)
with open(BAD_BACKUP_DIR, "w") as f:
f.write("this file intentionally blocks directory creation")
try:
executor.apply(plan, redis_url=REDIS_URL, backup_dir=BAD_BACKUP_DIR)
except ValueError as exc:
print("Invalid backup_dir is rejected before migration:", exc)
finally:
if os.path.exists(BAD_BACKUP_DIR):
os.remove(BAD_BACKUP_DIR)
def on_progress(step, detail=None):
print(f"[{step}] {detail or ''}")
report = executor.apply(
plan,
redis_url=REDIS_URL,
backup_dir=BACKUP_DIR,
batch_size=100,
num_workers=1,
progress_callback=on_progress,
)
print("\nResult: ", report.result)
print("Total duration: ", report.timings.total_migration_duration_seconds, "s")
print("Quantize duration:", report.timings.quantize_duration_seconds, "s")
print("Schema match: ", report.validation.schema_match)
print("Doc count match: ", report.validation.doc_count_match)
print("Backup dir: ", report.backup.backup_dir)
print("Backup prefixes: ", report.backup.backup_paths)
BACKUP_PREFIX = report.backup.backup_paths[0]
Missing backup_dir is rejected before migration: A backup directory is required to apply migrations. Provide --backup-dir or backup_dir=...; migrations are not started without a backup directory.
Invalid backup_dir is rejected before migration: Could not create or access backup directory './not_a_backup_dir': [Errno 17] File exists: 'not_a_backup_dir'. A writable backup directory is required to safely migrate.
[enumerate] Enumerating indexed documents...
[enumerate] found 600 documents (0.003s)
[dump] Backing up original vectors...
[dump] 100/600 docs
[dump] 200/600 docs
[dump] 300/600 docs
[dump] 400/600 docs
[dump] 500/600 docs
[dump] 600/600 docs
[dump] done (0.009s)
[drop] Dropping index definition...
[drop] done (0.001s)
[quantize] Re-encoding vectors from backup...
[quantize] 100/600 docs
[quantize] 200/600 docs
[quantize] 300/600 docs
[quantize] 400/600 docs
[quantize] 500/600 docs
[quantize] 600/600 docs
[quantize] done (600 docs in 0.009s)
[create] Creating index with new schema...
[create] done (0.004s)
[index] Waiting for re-indexing...
[index] 22/115 docs (19%)
[index] 600/600 docs (100%)
[index] done (0.508s)
[validate] Validating migration...
[validate] done (0.01s)
Result: succeeded
Total duration: 0.554 s
Quantize duration: 0.009 s
Schema match: True
Doc count match: True
Backup dir: /Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups
Backup prefixes: ['/Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups/migration_backup_products_0a3e27b8']
5. Where is the backup, and what’s in it?#
Backups are written under report.backup.backup_dir. For a single-worker
Hash quantization migration there are two files per index:
migration_backup_<index>_<hash>.header– JSON: phase + progress countersmigration_backup_<index>_<hash>.data– binary: original vectors, batched
The <hash> suffix is a short digest of the index name, which avoids
collisions. report.backup.backup_paths stores the path prefix without
.header or .data. Multi-worker migrations record one prefix per
worker.
Backups are retained after success so you can audit or roll back; delete them manually when no longer needed.
for path in sorted(glob.glob(os.path.join(report.backup.backup_dir, '*'))):
size = os.path.getsize(path)
print(f"{path} ({size:,} bytes)")
/Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups/migration_backup_products_0a3e27b8.data (47,730 bytes)
/Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups/migration_backup_products_0a3e27b8.header (209 bytes)
from redisvl.migration.backup import VectorBackup
# load() takes the path prefix, without .header or .data.
backup = VectorBackup.load(BACKUP_PREFIX)
h = backup.header
print("backup prefix: ", BACKUP_PREFIX)
print("index_name: ", h.index_name)
print("phase: ", h.phase)
print("batch_size: ", h.batch_size)
print("dump_completed_batches: ", h.dump_completed_batches)
print("quantize_completed_batches:", h.quantize_completed_batches)
backup prefix: /Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups/migration_backup_products_0a3e27b8
index_name: products
phase: completed
batch_size: 100
dump_completed_batches: 6
quantize_completed_batches: 6
6. Crash-safe resume and checkpointing#
If a migration is interrupted by a crash, network drop, or Ctrl+C, just
re-run the same command with the same backup_dir. The executor loads the
backup header, validates that it belongs to the planned source index, and
continues from the next unfinished batch.
The header is the checkpoint for single-index migrations:
phaseshows where the previous run stopped:dump,ready,active, orcompleteddump_completed_batchescounts original-vector batches safely written to.dataquantize_completed_batchescounts batches already re-encoded and written back to Redis
# Re-running the same CLI command resumes automatically:
rvl migrate apply --plan migration_plan.yaml \
--backup-dir ./migration_backups --url redis://localhost:6379
Batch migrations use the same per-index backup headers, plus a batch state
YAML file that records the current index, completed indexes, failed
indexes, and the backup_dir. Resume rejects a different backup directory
so the checkpoint and backup files stay together.
When phase is completed, re-running is safe if the live index already
matches the target schema: the executor detects the finished backup, skips
completed work, and leaves the already-created index in place. If you have
rolled back and the live index is back on the source schema, the old
completed backup is stale for a new migration run; the executor discards
that checkpoint and writes a fresh backup.
skip = backup.header.quantize_completed_batches
print(f"Checkpoint says {skip} batch(es) were already quantized.")
print(f"Current phase: {backup.header.phase}")
print("\nRe-running apply with the same backup_dir to exercise resume detection...")
resume_report = executor.apply(
plan,
redis_url=REDIS_URL,
backup_dir=BACKUP_DIR,
batch_size=100,
num_workers=1,
progress_callback=on_progress,
)
print("Resume result: ", resume_report.result)
print("Resume backup dir: ", resume_report.backup.backup_dir)
print("Resume prefixes: ", resume_report.backup.backup_paths)
Checkpoint says 6 batch(es) were already quantized.
Current phase: completed
Re-running apply with the same backup_dir to exercise resume detection...
[enumerate] skipped (resume from backup)
[drop] skipped (already dropped)
[quantize] skipped (already completed)
[create] Creating index with new schema...
[create] done (0.004s)
[index] Waiting for re-indexing...
[index] 600/600 docs (100%)
[index] done (0.001s)
[validate] Validating migration...
[validate] done (0.008s)
Resume result: succeeded
Resume backup dir: /Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups
Resume prefixes: ['/Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/migration_backups/migration_backup_products_0a3e27b8']
7. Verify the quantized index#
The documents were preserved and the embedding field is now float16.
We reconnect to the live index and run a vector query (encoding the query
vector to match the new datatype).
restored = SearchIndex.from_existing(INDEX_NAME, redis_url=REDIS_URL)
emb = next(f for f in restored.schema.to_dict()['fields'] if f['name'] == 'embedding')
print("embedding datatype now:", emb['attrs']['datatype'])
q = VectorQuery(
vector=vectors[0].tolist(),
vector_field_name="embedding",
return_fields=["name", "category"],
dtype="float16",
num_results=3,
)
for r in restored.query(q):
print(r["name"], "| category:", r["category"], "| dist:", r["vector_distance"])
embedding datatype now: float16
product 0 | category: electronics | dist: 0
product 223 | category: books | dist: 0.0458984375
product 23 | category: books | dist: 0.04736328125
8. Recover original vectors from the backup (rollback)#
Because the backup holds the original float32 bytes, you can recover the
pre-migration vector data. The CLI provides a one-liner:
rvl migrate rollback --backup-dir ./migration_backups \
--index products --url redis://localhost:6379
Below is the equivalent Python API: iterate the backup batches and
write the original bytes back with HSET. Rollback restores data only;
afterwards recreate the original index definition so the index encoding
matches the restored vectors again.
client = restored.client
restored_count = 0
for batch_keys, originals in backup.iter_batches():
pipe = client.pipeline(transaction=False)
for key in batch_keys:
if key in originals:
for field_name, original_bytes in originals[key].items():
pipe.hset(key, field_name, original_bytes)
restored_count += 1
pipe.execute()
print(f"Restored original bytes for {restored_count} vector field(s)")
# Recreate the ORIGINAL float32 index definition over the restored data
original_index = SearchIndex.from_dict(schema, redis_url=REDIS_URL)
original_index.create(overwrite=True, drop=False)
check = SearchIndex.from_existing(INDEX_NAME, redis_url=REDIS_URL)
emb = next(f for f in check.schema.to_dict()['fields'] if f['name'] == 'embedding')
print("embedding datatype after rollback:", emb['attrs']['datatype'])
Restored original bytes for 600 vector field(s)
embedding datatype after rollback: float32
9. Build and apply a migration with the wizard#
For exploratory work, MigrationWizard can build the same schema patch and
migration plan interactively. In a notebook, we script the answers so the
cell can execute without blocking. The sequence below means: update a
field, choose embedding, keep the current algorithm, change datatype to
float16, keep the distance metric, then finish.
The wizard still only creates the patch and plan. Applying the plan remains
a separate reviewed step, and backup_dir is still required.
import builtins
import copy
from contextlib import contextmanager
from redisvl.migration import MigrationWizard
from redisvl.migration.utils import wait_for_index_ready
WIZARD_INDEX_NAME = "wizard_products"
WIZARD_PREFIX = "wizard_product"
WIZARD_PATCH_PATH = "wizard_schema_patch.yaml"
WIZARD_PLAN_PATH = "wizard_migration_plan.yaml"
WIZARD_TARGET_SCHEMA_PATH = "wizard_target_schema.yaml"
WIZARD_BACKUP_DIR = "./wizard_migration_backups"
wizard_schema = copy.deepcopy(schema)
wizard_schema["index"]["name"] = WIZARD_INDEX_NAME
wizard_schema["index"]["prefix"] = WIZARD_PREFIX
# Start from a clean wizard demo index and keyspace.
try:
existing_wizard_index = SearchIndex.from_existing(
WIZARD_INDEX_NAME, redis_url=REDIS_URL
)
existing_wizard_index.delete(drop=True)
except Exception:
pass
delete_matching(client, f"{WIZARD_PREFIX}:*")
wizard_index = SearchIndex.from_dict(wizard_schema, redis_url=REDIS_URL)
wizard_index.create()
wizard_index.load(data, id_field=None)
wait_for_index_ready(wizard_index)
print(f"Loaded {N_DOCS} documents into '{WIZARD_INDEX_NAME}'")
@contextmanager
def scripted_inputs(answers):
original_input = builtins.input
iterator = iter(answers)
def fake_input(prompt=""):
answer = next(iterator)
print(f"{prompt}{answer}")
return answer
builtins.input = fake_input
try:
yield
finally:
builtins.input = original_input
wizard_answers = [
"2", # Update field
"embedding", # Select the vector field
"", # Keep algorithm
"float16", # Quantize datatype
"", # Keep distance metric
"8", # Finish
]
with scripted_inputs(wizard_answers):
wizard_plan = MigrationWizard().run(
index_name=WIZARD_INDEX_NAME,
redis_url=REDIS_URL,
plan_out=WIZARD_PLAN_PATH,
patch_out=WIZARD_PATCH_PATH,
target_schema_out=WIZARD_TARGET_SCHEMA_PATH,
)
print("\nWizard patch:")
print(open(WIZARD_PATCH_PATH).read())
print("Wizard plan mode:", wizard_plan.mode)
print("Wizard warnings:", wizard_plan.warnings)
Loaded 600 documents into 'wizard_products'
Building a migration plan for index 'wizard_products'
Current schema:
- Index name: wizard_products
- Storage type: hash
- name (text)
- category (tag)
- embedding (vector)
Choose an action:
1. Add field (text, tag, numeric, geo)
2. Update field (sortable, weight, separator, vector config)
3. Remove field
4. Rename field (rename field in all documents)
5. Rename index (change index name)
6. Change prefix (rename all keys)
7. Preview patch (show pending changes as YAML)
8. Finish
Enter a number: 2
Updatable fields:
1. name (text)
2. category (tag)
3. embedding (vector)
Select a field to update by number or name: embedding
Current vector config for 'embedding':
algorithm: FLAT
datatype: float32
distance_metric: cosine
dims: 8 (cannot be changed)
Leave blank to keep current value.
Algorithm: vector search method (FLAT=brute force, HNSW=graph, SVS-VAMANA=compressed graph)
Algorithm [current: FLAT]:
Datatype: float16, float32, bfloat16, float64, int8, uint8
(float16 reduces memory ~50%, int8/uint8 reduce ~75%)
Datatype [current: float32]: float16
Distance metric: how similarity is measured (cosine, l2, ip)
Distance metric [current: cosine]:
Choose an action:
1. Add field (text, tag, numeric, geo)
2. Update field (sortable, weight, separator, vector config)
3. Remove field
4. Rename field (rename field in all documents)
5. Rename index (change index name)
6. Change prefix (rename all keys)
7. Preview patch (show pending changes as YAML)
8. Finish
Enter a number: 8
Wizard patch:
version: 1
changes:
add_fields: []
remove_fields: []
update_fields:
- name: embedding
attrs:
datatype: float16
options: {}
rename_fields: []
index: {}
Wizard plan mode: drop_recreate
Wizard warnings: ['Index downtime is required']
wizard_report = MigrationExecutor().apply(
wizard_plan,
redis_url=REDIS_URL,
backup_dir=WIZARD_BACKUP_DIR,
batch_size=100,
num_workers=1,
)
wizard_live = SearchIndex.from_existing(WIZARD_INDEX_NAME, redis_url=REDIS_URL)
wizard_embedding = next(
f for f in wizard_live.schema.to_dict()["fields"] if f["name"] == "embedding"
)
print("Wizard migration result:", wizard_report.result)
print("Wizard schema match: ", wizard_report.validation.schema_match)
print("Wizard doc count match: ", wizard_report.validation.doc_count_match)
print("Wizard backup dir: ", wizard_report.backup.backup_dir)
print("Wizard backup prefixes: ", wizard_report.backup.backup_paths)
print("Wizard embedding dtype: ", wizard_embedding["attrs"]["datatype"])
Wizard migration result: succeeded
Wizard schema match: True
Wizard doc count match: True
Wizard backup dir: /Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/wizard_migration_backups
Wizard backup prefixes: ['/Users/nitin.kanukolanu/workspace/redis-vl-python/docs/user_guide/wizard_migration_backups/migration_backup_wizard_products_def8cdf8']
Wizard embedding dtype: float16
10. Cleanup (optional)#
Remove the demo indexes and the artifacts this notebook created. In production, delete backups only once you are certain rollback is no longer needed.
delete_matching(client, "product:*")
if check.exists():
check.delete(drop=False)
delete_matching(client, f"{WIZARD_PREFIX}:*")
if wizard_live.exists():
wizard_live.delete(drop=False)
for backup_dir in (report.backup.backup_dir, wizard_report.backup.backup_dir):
for f in glob.glob(os.path.join(backup_dir, '*')):
os.remove(f)
if os.path.isdir(backup_dir):
os.rmdir(backup_dir)
for f in (
"schema_patch.yaml",
"migration_plan.yaml",
"not_a_backup_dir",
WIZARD_PATCH_PATH,
WIZARD_PLAN_PATH,
WIZARD_TARGET_SCHEMA_PATH,
):
if os.path.exists(f):
os.remove(f)
print("Cleaned up demo indexes, demo keys, backups, and YAML files")
Cleaned up demo indexes, demo keys, backups, and YAML files
Learn more#
Migrate an Index (how-to) – full CLI workflow, batch migration, performance tuning, and troubleshooting
Index Migrations (concepts) – modes, supported vs blocked changes, backup internals, sync vs async
For very large datasets, use
num_workers > 1and the async executor (AsyncMigrationExecutor) to parallelize re-encoding.