LLM Cache

LLM Cache#

SemanticCache#

class SemanticCache(name='llmcache', distance_threshold=0.1, ttl=None, vectorizer=None, filterable_fields=None, redis_client=None, redis_url='redis://localhost:6379', connection_kwargs={}, overwrite=False, **kwargs)[source]#

Bases: BaseLLMCache

Semantic Cache for Large Language Models.

Parameters:

name (str, optional) – The name of the semantic cache search index. Defaults to “llmcache”.
distance_threshold (float, optional) – Semantic distance threshold for the cache in Redis COSINE units [0-2], where lower values indicate stricter matching. Defaults to 0.1.
ttl (Optional[int], optional) – The time-to-live for records cached in Redis. Defaults to None.
vectorizer (Optional[BaseVectorizer], optional) – The vectorizer for the cache. Defaults to HFTextVectorizer.
filterable_fields (Optional[List[Dict[str, Any]]]) – An optional list of RedisVL fields that can be used to customize cache retrieval with filters.
redis_client (Optional[Redis], optional) – A redis client connection instance. Defaults to None.
redis_url (str, optional) – The redis url. Defaults to redis://localhost:6379.
connection_kwargs (Dict[str, Any]) – The connection arguments for the redis client. Defaults to empty {}.
overwrite (bool) – Whether or not to force overwrite the schema for the semantic cache index. Defaults to false.

Raises:

TypeError – If an invalid vectorizer is provided.
TypeError – If the TTL value is not an int.
ValueError – If the threshold is not between 0 and 2 (Redis COSINE distance).
ValueError – If existing schema does not match new schema and overwrite is False.

async acheck(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)[source]#

Async check the semantic cache for results similar to the specified prompt or vector.

This method searches the cache using vector similarity with either a raw text prompt (converted to a vector) or a provided vector as input. It checks for semantically similar prompts and fetches the cached LLM responses.

Parameters:

prompt (Optional[str], optional) – The text prompt to search for in the cache.
vector (Optional[List[float]], optional) – The vector representation of the prompt to search for in the cache.
num_results (int, optional) – The number of cached results to return. Defaults to 1.
return_fields (Optional[List[str]], optional) – The fields to include in each returned result. If None, defaults to all available fields in the cached entry.
filter_expression (Optional[FilterExpression]) – Optional filter expression that can be used to filter cache results. Defaults to None and the full cache will be searched.
distance_threshold (Optional[float]) – The threshold for semantic vector distance.

Returns:

A list of dicts containing the requested: return fields for each similar cached response.

Return type:

List[Dict[str, Any]]

Raises:

ValueError – If neither a prompt nor a vector is specified.
ValueError – if ‘vector’ has incorrect dimensions.
TypeError – If return_fields is not a list when provided.

response = await cache.acheck(
    prompt="What is the capital city of France?"
)

async aclear()#

Async clear the cache of all keys.

Return type:: None

async adelete()[source]#

Async delete the cache and its index entirely.

Return type:: None

async adisconnect()[source]#

Asynchronously disconnect from Redis and search index.

Closes all Redis connections and index connections.

async adrop(ids=None, keys=None)[source]#

Async drop specific entries from the cache by ID or Redis key.

Parameters:

ids (Optional[List[str]]) – List of entry IDs to remove from the cache. Entry IDs are the unique identifiers without the cache prefix.
keys (Optional[List[str]]) – List of full Redis keys to remove from the cache. Keys are the complete Redis keys including the cache prefix.

Return type:

None

Note

At least one of ids or keys must be provided.

Raises:

ValueError – If neither ids nor keys is provided.

Parameters:

ids (list[str] | None)
keys (list[str] | None)

Return type:

None

async aexpire(key, ttl=None)#

Asynchronously set or refresh the expiration time for a key in the cache.

Parameters:

key (str) – The Redis key to set the expiration on.
ttl (Optional[int], optional) – The time-to-live in seconds. If None, uses the default TTL configured for this cache instance. Defaults to None.

Return type:

None

Note

If neither the provided TTL nor the default TTL is set (both are None), this method will have no effect.

async astore(prompt, response, vector=None, metadata=None, filters=None, ttl=None)[source]#

Async stores the specified key-value pair in the cache along with metadata.

Parameters:

prompt (str) – The user prompt to cache.
response (str) – The LLM response to cache.
vector (Optional[List[float]], optional) – The prompt vector to cache. Defaults to None, and the prompt vector is generated on demand.
metadata (Optional[Dict[str, Any]], optional) – The optional metadata to cache alongside the prompt and response. Defaults to None.
filters (Optional[Dict[str, Any]]) – The optional tag to assign to the cache entry. Defaults to None.
ttl (Optional[int]) – The optional TTL override to use on this individual cache entry. Defaults to the global TTL setting.

Returns:

The Redis key for the entries added to the semantic cache.

Return type:

str

Raises:

ValueError – If neither prompt nor vector is specified.
ValueError – if vector has incorrect dimensions.
TypeError – If provided metadata is not a dictionary.

key = await cache.astore(
    prompt="What is the capital city of France?",
    response="Paris",
    metadata={"city": "Paris", "country": "France"}
)

async aupdate(key, **kwargs)[source]#

Async update specific fields within an existing cache entry. If no fields are passed, then only the document TTL is refreshed.

Parameters:

key (str) – the key of the document to update using kwargs.

Raises:

ValueError if an incorrect mapping is provided as a kwarg. –
TypeError if metadata is provided and not of type dict. –

Return type:

None

key = await cache.astore('this is a prompt', 'this is a response')
await cache.aupdate(
    key,
    metadata={"hit_count": 1, "model_name": "Llama-2-7b"}
)

check(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)[source]#

Checks the semantic cache for results similar to the specified prompt or vector.

Parameters:

prompt (Optional[str], optional) – The text prompt to search for in the cache.
vector (Optional[List[float]], optional) – The vector representation of the prompt to search for in the cache.
num_results (int, optional) – The number of cached results to return. Defaults to 1.
return_fields (Optional[List[str]], optional) – The fields to include in each returned result. If None, defaults to all available fields in the cached entry.
filter_expression (Optional[FilterExpression]) – Optional filter expression that can be used to filter cache results. Defaults to None and the full cache will be searched.
distance_threshold (Optional[float]) – The threshold for semantic vector distance.

Returns:

A list of dicts containing the requested: return fields for each similar cached response.

Return type:

List[Dict[str, Any]]

Raises:

ValueError – If neither a prompt nor a vector is specified.
ValueError – if ‘vector’ has incorrect dimensions.
TypeError – If return_fields is not a list when provided.

response = cache.check(
    prompt="What is the capital city of France?"
)

clear()#

Clear the cache of all keys.

Return type:: None

delete()[source]#

Delete the cache and its index entirely.

Return type:: None

disconnect()[source]#

Disconnect from Redis and search index.

Closes all Redis connections and index connections.

drop(ids=None, keys=None)[source]#

Drop specific entries from the cache by ID or Redis key.

Parameters:

ids (Optional[List[str]]) – List of entry IDs to remove from the cache. Entry IDs are the unique identifiers without the cache prefix.
keys (Optional[List[str]]) – List of full Redis keys to remove from the cache. Keys are the complete Redis keys including the cache prefix.

Return type:

None

Note

At least one of ids or keys must be provided.

Raises:

ValueError – If neither ids nor keys is provided.

Parameters:

ids (list[str] | None)
keys (list[str] | None)

Return type:

None

expire(key, ttl=None)#

Set or refresh the expiration time for a key in the cache.

Parameters:

key (str) – The Redis key to set the expiration on.
ttl (Optional[int], optional) – The time-to-live in seconds. If None, uses the default TTL configured for this cache instance. Defaults to None.

Return type:

None

Note

If neither the provided TTL nor the default TTL is set (both are None), this method will have no effect.

set_threshold(distance_threshold)[source]#

Sets the semantic distance threshold for the cache.

Parameters:: distance_threshold (float) – The semantic distance threshold for the cache.
Raises:: ValueError – If the threshold is not between 0 and 2 (Redis COSINE distance).
Return type:: None

set_ttl(ttl=None)#

Set the default TTL, in seconds, for entries in the cache.

Parameters:: ttl (Optional[int], optional) – The optional time-to-live expiration for the cache, in seconds.
Raises:: ValueError – If the time-to-live value is not an integer.
Return type:: None

store(prompt, response, vector=None, metadata=None, filters=None, ttl=None)[source]#

Stores the specified key-value pair in the cache along with metadata.

Parameters:

prompt (str) – The user prompt to cache.
response (str) – The LLM response to cache.
vector (Optional[List[float]], optional) – The prompt vector to cache. Defaults to None, and the prompt vector is generated on demand.
metadata (Optional[Dict[str, Any]], optional) – The optional metadata to cache alongside the prompt and response. Defaults to None.
filters (Optional[Dict[str, Any]]) – The optional tag to assign to the cache entry. Defaults to None.
ttl (Optional[int]) – The optional TTL override to use on this individual cache entry. Defaults to the global TTL setting.

Returns:

The Redis key for the entries added to the semantic cache.

Return type:

str

Raises:

ValueError – If neither prompt nor vector is specified.
ValueError – if vector has incorrect dimensions.
TypeError – If provided metadata is not a dictionary.

key = cache.store(
    prompt="What is the capital city of France?",
    response="Paris",
    metadata={"city": "Paris", "country": "France"}
)

update(key, **kwargs)[source]#

Update specific fields within an existing cache entry. If no fields are passed, then only the document TTL is refreshed.

Parameters:

key (str) – the key of the document to update using kwargs.

Raises:

ValueError if an incorrect mapping is provided as a kwarg. –
TypeError if metadata is provided and not of type dict. –

Return type:

None

key = cache.store('this is a prompt', 'this is a response')
cache.update(key, metadata={"hit_count": 1, "model_name": "Llama-2-7b"})

property aindex: AsyncSearchIndex | None#

The underlying AsyncSearchIndex for the cache.

Returns:: The async search index.
Return type:: AsyncSearchIndex

property distance_threshold: float#

The semantic distance threshold for the cache.

Returns:: The semantic distance threshold.
Return type:: float

property index: SearchIndex#

The underlying SearchIndex for the cache.

Returns:: The search index.
Return type:: SearchIndex

property ttl: int | None#: The default TTL, in seconds, for entries in the cache.

LangCacheSemanticCache#

class LangCacheSemanticCache(name='langcache', server_url='https://aws-us-east-1.langcache.redis.io', cache_id='', api_key='', ttl=None, use_exact_search=True, use_semantic_search=True, distance_scale='normalized', **kwargs)[source]#

Bases: BaseLLMCache

LLM Cache implementation using the LangCache managed service.

This cache uses the LangCache API service for semantic caching of LLM responses. It requires a LangCache account and API key.

Example

from redisvl.extensions.cache.llm import LangCacheSemanticCache

cache = LangCacheSemanticCache(
    name="my_cache",
    server_url="https://api.langcache.com",
    cache_id="your-cache-id",
    api_key="your-api-key",
    ttl=3600
)

# Store a response
cache.store(
    prompt="What is the capital of France?",
    response="Paris"
)

# Check for cached responses
results = cache.check(prompt="What is the capital of France?")

Initialize a LangCache semantic cache.

Parameters:

name (str) – The name of the cache. Defaults to “langcache”.
server_url (str) – The LangCache server URL.
cache_id (str) – The LangCache cache ID.
api_key (str) – The LangCache API key.
ttl (Optional[int]) – Time-to-live for cache entries in seconds.
use_exact_search (bool) – Whether to use exact matching. Defaults to True.
use_semantic_search (bool) – Whether to use semantic search. Defaults to True.
distance_scale (str) – Threshold scale for distance_threshold: - “normalized”: 0–1 semantic distance (lower is better) - “redis”: Redis COSINE distance 0–2 (lower is better)

Raises:

ImportError – If the langcache package is not installed.
ValueError – If cache_id or api_key is not provided.

async acheck(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None, attributes=None)[source]#

Async check the cache for semantically similar prompts.

Parameters:

prompt (Optional[str]) – The text prompt to search for.
vector (Optional[List[float]]) – Not supported by LangCache API.
num_results (int) – Number of results to return. Defaults to 1.
return_fields (Optional[List[str]]) – Not used (for compatibility).
filter_expression (Optional[FilterExpression]) – Not supported.
distance_threshold (Optional[float]) – Maximum distance threshold. Converted to similarity_threshold according to distance_scale: If “redis”, uses norm_cosine_distance(distance_threshold) ([0,2] -> [0,1]). If “normalized”, uses (1.0 - distance_threshold) ([0,1] -> [0,1]).
attributes (Optional[Dict[str, Any]]) – LangCache attributes to filter by. Note: Attributes must be pre-configured in your LangCache instance.

Returns:

List of matching cache entries.

Return type:

List[Dict[str, Any]]

Raises:

ValueError – If prompt is not provided.

async aclear()[source]#

Async clear the cache of all entries.

This is an alias for adelete() to match the BaseCache interface.

Return type:: None

async adelete()[source]#

Async delete the entire cache.

This deletes all entries in the cache by calling the flush API.

Return type:: None

async adelete_by_attributes(attributes)[source]#

Async delete cache entries matching the given attributes.

Parameters:: attributes (Dict[str, Any]) – Attributes to match for deletion. Cannot be empty.
Returns:: Result of the deletion operation.
Return type:: Dict[str, Any]
Raises:: ValueError – If attributes is an empty dictionary.

async adelete_by_id(entry_id)[source]#

Async delete a single cache entry by ID.

Parameters:: entry_id (str) – The ID of the entry to delete.
Return type:: None

async adisconnect()#

Async disconnect from Redis.

Return type:: None

async aexpire(key, ttl=None)#

Asynchronously set or refresh the expiration time for a key in the cache.

Parameters:

key (str) – The Redis key to set the expiration on.
ttl (Optional[int], optional) – The time-to-live in seconds. If None, uses the default TTL configured for this cache instance. Defaults to None.

Return type:

None

Note

If neither the provided TTL nor the default TTL is set (both are None), this method will have no effect.

async astore(prompt, response, vector=None, metadata=None, filters=None, ttl=None)[source]#

Async store a prompt-response pair in the cache.

Parameters:

prompt (str) – The user prompt to cache.
response (str) – The LLM response to cache.
vector (Optional[List[float]]) – Not supported by LangCache API.
metadata (Optional[Dict[str, Any]]) – Optional metadata (stored as attributes).
filters (Optional[Dict[str, Any]]) – Not supported.
ttl (Optional[int]) – Optional TTL override in seconds.

Returns:

The entry ID for the cached entry.

Return type:

str

Raises:

ValueError – If prompt or response is empty.

async aupdate(key, **kwargs)[source]#

Async update specific fields within an existing cache entry.

Note: LangCache API does not support updating individual entries. This method will raise NotImplementedError.

Parameters:

key (str) – The key of the document to update.
**kwargs – Field-value pairs to update.

Raises:

NotImplementedError – LangCache does not support entry updates.

Return type:

None

check(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None, attributes=None)[source]#

Check the cache for semantically similar prompts.

Parameters:

prompt (Optional[str]) – The text prompt to search for.
vector (Optional[List[float]]) – Not supported by LangCache API.
num_results (int) – Number of results to return. Defaults to 1.
return_fields (Optional[List[str]]) – Not used (for compatibility).
filter_expression (Optional[FilterExpression]) – Not supported.
distance_threshold (Optional[float]) – Maximum distance threshold. Converted to similarity_threshold according to distance_scale: If “redis”, uses norm_cosine_distance(distance_threshold) ([0,2] -> [0,1]). If “normalized”, uses (1.0 - distance_threshold) ([0,1] -> [0,1]).
attributes (Optional[Dict[str, Any]]) – LangCache attributes to filter by. Note: Attributes must be pre-configured in your LangCache instance.

Returns:

List of matching cache entries.

Return type:

List[Dict[str, Any]]

Raises:

ValueError – If prompt is not provided.

clear()[source]#

Clear the cache of all entries.

This is an alias for delete() to match the BaseCache interface.

Return type:: None

delete()[source]#

Delete the entire cache.

This deletes all entries in the cache by calling the flush API.

Return type:: None

delete_by_attributes(attributes)[source]#

Delete cache entries matching the given attributes.

Parameters:: attributes (Dict[str, Any]) – Attributes to match for deletion. Cannot be empty.
Returns:: Result of the deletion operation.
Return type:: Dict[str, Any]
Raises:: ValueError – If attributes is an empty dictionary.

delete_by_id(entry_id)[source]#

Delete a single cache entry by ID.

Parameters:: entry_id (str) – The ID of the entry to delete.
Return type:: None

disconnect()#

Disconnect from Redis.

Return type:: None

expire(key, ttl=None)#

Set or refresh the expiration time for a key in the cache.

Parameters:

key (str) – The Redis key to set the expiration on.
ttl (Optional[int], optional) – The time-to-live in seconds. If None, uses the default TTL configured for this cache instance. Defaults to None.

Return type:

None

Note

If neither the provided TTL nor the default TTL is set (both are None), this method will have no effect.

set_ttl(ttl=None)#

Set the default TTL, in seconds, for entries in the cache.

Parameters:: ttl (Optional[int], optional) – The optional time-to-live expiration for the cache, in seconds.
Raises:: ValueError – If the time-to-live value is not an integer.
Return type:: None

store(prompt, response, vector=None, metadata=None, filters=None, ttl=None)[source]#

Store a prompt-response pair in the cache.

Parameters:

prompt (str) – The user prompt to cache.
response (str) – The LLM response to cache.
vector (Optional[List[float]]) – Not supported by LangCache API.
metadata (Optional[Dict[str, Any]]) – Optional metadata (stored as attributes).
filters (Optional[Dict[str, Any]]) – Not supported.
ttl (Optional[int]) – Optional TTL override in seconds.

Returns:

The entry ID for the cached entry.

Return type:

str

Raises:

ValueError – If prompt or response is empty.

update(key, **kwargs)[source]#

Update specific fields within an existing cache entry.

Note: LangCache API does not support updating individual entries. This method will raise NotImplementedError.

Parameters:

key (str) – The key of the document to update.
**kwargs – Field-value pairs to update.

Raises:

NotImplementedError – LangCache does not support entry updates.

Return type:

None

property ttl: int | None#: The default TTL, in seconds, for entries in the cache.

Cache Schema Classes#

CacheEntry#

class CacheEntry(*, entry_id=None, prompt, response, prompt_vector, inserted_at=<factory>, updated_at=<factory>, metadata=None, filters=None)[source]#

Bases: BaseModel

A single cache entry in Redis

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

entry_id (str | None)
prompt (str)
response (str)
prompt_vector (list[float])
inserted_at (float)
updated_at (float)
metadata (dict[str, Any] | None)
filters (dict[str, Any] | None)

entry_id: str | None#: Cache entry identifier

filters: dict[str, Any] | None#: Optional filter data stored on the cache entry for customizing retrieval

inserted_at: float#: Timestamp of when the entry was added to the cache

metadata: dict[str, Any] | None#: Optional metadata stored on the cache entry

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

prompt: str#: Input prompt or question cached in Redis

prompt_vector: list[float]#: Text embedding representation of the prompt

response: str#: Response or answer to the question, cached in Redis

updated_at: float#: Timestamp of when the entry was updated in the cache

CacheHit#

class CacheHit(*, entry_id, prompt, response, vector_distance, inserted_at, updated_at, metadata=None, filters=None, **extra_data)[source]#

Bases: BaseModel

A cache hit based on some input query

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

entry_id (str)
prompt (str)
response (str)
vector_distance (float)
inserted_at (float)
updated_at (float)
metadata (dict[str, Any] | None)
filters (dict[str, Any] | None)
extra_data (Any)

to_dict()[source]#

Convert this model to a dictionary, merging filters into the result.

Return type:: dict[str, Any]

entry_id: str#: Cache entry identifier

filters: dict[str, Any] | None#: Optional filter data stored on the cache entry for customizing retrieval

inserted_at: float#: Timestamp of when the entry was added to the cache

metadata: dict[str, Any] | None#: Optional metadata stored on the cache entry

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

prompt: str#: Input prompt or question cached in Redis

response: str#: Response or answer to the question, cached in Redis

updated_at: float#: Timestamp of when the entry was updated in the cache

vector_distance: float#: The semantic distance between the query vector and the stored prompt vector

Embeddings Cache#

EmbeddingsCache#

class EmbeddingsCache(name='embedcache', ttl=None, redis_client=None, async_redis_client=None, redis_url='redis://localhost:6379', connection_kwargs={})[source]#

Bases: BaseCache

Embeddings Cache for storing embedding vectors with exact key matching.

Initialize an embeddings cache.

Parameters:

name (str) – The name of the cache. Defaults to “embedcache”.
ttl (Optional[int]) – The time-to-live for cached embeddings. Defaults to None.
redis_client (Optional[SyncRedisClient]) – Redis client instance. Defaults to None.
redis_url (str) – Redis URL for connection. Defaults to “redis://localhost:6379”.
connection_kwargs (Dict[str, Any]) – Redis connection arguments. Defaults to {}.
async_redis_client (Redis | RedisCluster | None)

Raises:

ValueError – If vector dimensions are invalid

cache = EmbeddingsCache(
    name="my_embeddings_cache",
    ttl=3600,  # 1 hour
    redis_url="redis://localhost:6379"
)

async aclear()#

Async clear the cache of all keys.

Return type:: None

async adisconnect()#

Async disconnect from Redis.

Return type:: None

async adrop(content, model_name)[source]#

Async remove an embedding from the cache.

Asynchronously removes an embedding from the cache.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.

Return type:

None

await cache.adrop(
    content="What is machine learning?",
    model_name="text-embedding-ada-002"
)

async adrop_by_key(key)[source]#

Async remove an embedding from the cache by its Redis key.

Asynchronously removes an embedding from the cache by its Redis key.

Parameters:: key (str) – The full Redis key for the embedding.
Return type:: None

await cache.adrop_by_key("embedcache:1234567890abcdef")

async aexists(content, model_name)[source]#

Async check if an embedding exists.

Asynchronously checks if an embedding exists for the given content and model.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

True if the embedding exists in the cache, False otherwise.

Return type:

bool

if await cache.aexists("What is machine learning?", "text-embedding-ada-002"):
    print("Embedding is in cache")

async aexists_by_key(key)[source]#

Async check if an embedding exists for the given Redis key.

Asynchronously checks if an embedding exists for the given Redis key.

Parameters:: key (str) – The full Redis key for the embedding.
Returns:: True if the embedding exists in the cache, False otherwise.
Return type:: bool

if await cache.aexists_by_key("embedcache:1234567890abcdef"):
    print("Embedding is in cache")

async aexpire(key, ttl=None)#

Asynchronously set or refresh the expiration time for a key in the cache.

Parameters:

key (str) – The Redis key to set the expiration on.
ttl (Optional[int], optional) – The time-to-live in seconds. If None, uses the default TTL configured for this cache instance. Defaults to None.

Return type:

None

Note

If neither the provided TTL nor the default TTL is set (both are None), this method will have no effect.

async aget(content, model_name)[source]#

Async get embedding by content and model name.

Asynchronously retrieves a cached embedding for the given content and model name. If found, refreshes the TTL of the entry.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

Embedding cache entry or None if not found.

Return type:

Optional[Dict[str, Any]]

embedding_data = await cache.aget(
    content="What is machine learning?",
    model_name="text-embedding-ada-002"
)

async aget_by_key(key)[source]#

Async get embedding by its full Redis key.

Asynchronously retrieves a cached embedding for the given Redis key. If found, refreshes the TTL of the entry.

Parameters:: key (str) – The full Redis key for the embedding.
Returns:: Embedding cache entry or None if not found.
Return type:: Optional[Dict[str, Any]]

embedding_data = await cache.aget_by_key("embedcache:1234567890abcdef")

async amdrop(contents, model_name)[source]#

Async remove multiple embeddings from the cache by their contents and model name.

Asynchronously removes multiple embeddings in a single operation.

Parameters:

contents (Iterable[bytes | str]) – Iterable of content that was embedded.
model_name (str) – The name of the embedding model.

Return type:

None

# Remove multiple embeddings asynchronously
await cache.amdrop(
    contents=["What is machine learning?", "What is deep learning?"],
    model_name="text-embedding-ada-002"
)

async amdrop_by_keys(keys)[source]#

Async remove multiple embeddings from the cache by their Redis keys.

Asynchronously removes multiple embeddings in a single operation.

Parameters:: keys (List[str]) – List of Redis keys to remove.
Return type:: None

# Remove multiple embeddings asynchronously
await cache.amdrop_by_keys(["embedcache:key1", "embedcache:key2"])

async amexists(contents, model_name)[source]#

Async check if multiple embeddings exist by their contents and model name.

Asynchronously checks existence of multiple embeddings in a single operation.

Parameters:

contents (Iterable[bytes | str]) – Iterable of content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

List of boolean values indicating whether each embedding exists.

Return type:

List[bool]

# Check if multiple embeddings exist asynchronously
exists_results = await cache.amexists(
    contents=["What is machine learning?", "What is deep learning?"],
    model_name="text-embedding-ada-002"
)

async amexists_by_keys(keys)[source]#

Async check if multiple embeddings exist by their Redis keys.

Asynchronously checks existence of multiple keys in a single operation.

Parameters:: keys (List[str]) – List of Redis keys to check.
Returns:: List of boolean values indicating whether each key exists. The order matches the input keys order.
Return type:: List[bool]

# Check if multiple keys exist asynchronously
exists_results = await cache.amexists_by_keys(["embedcache:key1", "embedcache:key2"])

async amget(contents, model_name)[source]#

Async get multiple embeddings by their contents and model name.

Asynchronously retrieves multiple cached embeddings in a single operation. If found, refreshes the TTL of each entry.

Parameters:

contents (Iterable[bytes | str]) – Iterable of content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

List of embedding cache entries or None for contents not found.

Return type:

List[Optional[Dict[str, Any]]]

# Get multiple embeddings asynchronously
embedding_data = await cache.amget(
    contents=["What is machine learning?", "What is deep learning?"],
    model_name="text-embedding-ada-002"
)

async amget_by_keys(keys)[source]#

Async get multiple embeddings by their Redis keys.

Asynchronously retrieves multiple cached embeddings in a single network roundtrip. If found, refreshes the TTL of each entry.

Parameters:: keys (List[str]) – List of Redis keys to retrieve.
Returns:: List of embedding cache entries or None for keys not found. The order matches the input keys order.
Return type:: List[Optional[Dict[str, Any]]]

# Get multiple embeddings asynchronously
embedding_data = await cache.amget_by_keys([
    "embedcache:key1",
    "embedcache:key2"
])

async amset(items, ttl=None)[source]#

Async store multiple embeddings in a batch operation.

Each item in the input list should be a dictionary with the following fields: - ‘content’: The content that was embedded - ‘model_name’: The name of the embedding model - ‘embedding’: The embedding vector - ‘metadata’: Optional metadata to store with the embedding

Parameters:

items (list[dict[str, Any]]) – List of dictionaries, each containing content, model_name, embedding, and optional metadata.
ttl (int | None) – Optional TTL override for these entries.

Returns:

List of Redis keys where the embeddings were stored.

Return type:

List[str]

# Store multiple embeddings asynchronously
keys = await cache.amset([
    {
        "content": "What is ML?",
        "model_name": "text-embedding-ada-002",
        "embedding": [0.1, 0.2, 0.3],
        "metadata": {"source": "user"}
    },
    {
        "content": "What is AI?",
        "model_name": "text-embedding-ada-002",
        "embedding": [0.4, 0.5, 0.6],
        "metadata": {"source": "docs"}
    }
])

async aset(content, model_name, embedding, metadata=None, ttl=None)[source]#

Async store an embedding with its content and model name.

Asynchronously stores an embedding with its content and model name.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.
embedding (List[float]) – The embedding vector to store.
metadata (Optional[Dict[str, Any]]) – Optional metadata to store with the embedding.
ttl (Optional[int]) – Optional TTL override for this specific entry.

Returns:

The Redis key where the embedding was stored.

Return type:

str

key = await cache.aset(
    content="What is machine learning?",
    model_name="text-embedding-ada-002",
    embedding=[0.1, 0.2, 0.3, ...],
    metadata={"source": "user_query"}
)

clear()#

Clear the cache of all keys.

Return type:: None

disconnect()#

Disconnect from Redis.

Return type:: None

drop(content, model_name)[source]#

Remove an embedding from the cache.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.

Return type:

None

cache.drop(
    content="What is machine learning?",
    model_name="text-embedding-ada-002"
)

drop_by_key(key)[source]#

Remove an embedding from the cache by its Redis key.

Parameters:: key (str) – The full Redis key for the embedding.
Return type:: None

cache.drop_by_key("embedcache:1234567890abcdef")

exists(content, model_name)[source]#

Check if an embedding exists for the given content and model.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

True if the embedding exists in the cache, False otherwise.

Return type:

bool

if cache.exists("What is machine learning?", "text-embedding-ada-002"):
    print("Embedding is in cache")

exists_by_key(key)[source]#

Check if an embedding exists for the given Redis key.

Parameters:: key (str) – The full Redis key for the embedding.
Returns:: True if the embedding exists in the cache, False otherwise.
Return type:: bool

if cache.exists_by_key("embedcache:1234567890abcdef"):
    print("Embedding is in cache")

expire(key, ttl=None)#

Set or refresh the expiration time for a key in the cache.

Parameters:

key (str) – The Redis key to set the expiration on.
ttl (Optional[int], optional) – The time-to-live in seconds. If None, uses the default TTL configured for this cache instance. Defaults to None.

Return type:

None

Note

If neither the provided TTL nor the default TTL is set (both are None), this method will have no effect.

get(content, model_name)[source]#

Get embedding by content and model name.

Retrieves a cached embedding for the given content and model name. If found, refreshes the TTL of the entry.

Parameters:

content (bytes | str) – The content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

Embedding cache entry or None if not found.

Return type:

Optional[Dict[str, Any]]

embedding_data = cache.get(
    content="What is machine learning?",
    model_name="text-embedding-ada-002"
)

get_by_key(key)[source]#

Get embedding by its full Redis key.

Retrieves a cached embedding for the given Redis key. If found, refreshes the TTL of the entry.

Parameters:: key (str) – The full Redis key for the embedding.
Returns:: Embedding cache entry or None if not found.
Return type:: Optional[Dict[str, Any]]

embedding_data = cache.get_by_key("embedcache:1234567890abcdef")

mdrop(contents, model_name)[source]#

Remove multiple embeddings from the cache by their contents and model name.

Efficiently removes multiple embeddings in a single operation.

Parameters:

contents (Iterable[bytes | str]) – Iterable of content that was embedded.
model_name (str) – The name of the embedding model.

Return type:

None

# Remove multiple embeddings
cache.mdrop(
    contents=["What is machine learning?", "What is deep learning?"],
    model_name="text-embedding-ada-002"
)

mdrop_by_keys(keys)[source]#

Remove multiple embeddings from the cache by their Redis keys.

Efficiently removes multiple embeddings in a single operation.

Parameters:: keys (List[str]) – List of Redis keys to remove.
Return type:: None

# Remove multiple embeddings
cache.mdrop_by_keys(["embedcache:key1", "embedcache:key2"])

mexists(contents, model_name)[source]#

Check if multiple embeddings exist by their contents and model name.

Efficiently checks existence of multiple embeddings in a single operation.

Parameters:

contents (Iterable[bytes | str]) – Iterable of content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

List of boolean values indicating whether each embedding exists.

Return type:

List[bool]

# Check if multiple embeddings exist
exists_results = cache.mexists(
    contents=["What is machine learning?", "What is deep learning?"],
    model_name="text-embedding-ada-002"
)

mexists_by_keys(keys)[source]#

Check if multiple embeddings exist by their Redis keys.

Efficiently checks existence of multiple keys in a single operation.

Parameters:: keys (List[str]) – List of Redis keys to check.
Returns:: List of boolean values indicating whether each key exists. The order matches the input keys order.
Return type:: List[bool]

# Check if multiple keys exist
exists_results = cache.mexists_by_keys(["embedcache:key1", "embedcache:key2"])

mget(contents, model_name)[source]#

Get multiple embeddings by their content and model name.

Efficiently retrieves multiple cached embeddings in a single operation. If found, refreshes the TTL of each entry.

Parameters:

contents (Iterable[bytes | str]) – Iterable of content that was embedded.
model_name (str) – The name of the embedding model.

Returns:

List of embedding cache entries or None for contents not found.

Return type:

List[Optional[Dict[str, Any]]]

# Get multiple embeddings
embedding_data = cache.mget(
    contents=["What is machine learning?", "What is deep learning?"],
    model_name="text-embedding-ada-002"
)

mget_by_keys(keys)[source]#

Get multiple embeddings by their Redis keys.

Efficiently retrieves multiple cached embeddings in a single network roundtrip. If found, refreshes the TTL of each entry.

Parameters:: keys (List[str]) – List of Redis keys to retrieve.
Returns:: List of embedding cache entries or None for keys not found. The order matches the input keys order.
Return type:: List[Optional[Dict[str, Any]]]

# Get multiple embeddings
embedding_data = cache.mget_by_keys([
    "embedcache:key1",
    "embedcache:key2"
])

mset(items, ttl=None)[source]#

Store multiple embeddings in a batch operation.

Each item in the input list should be a dictionary with the following fields: - ‘content’: The input that was embedded - ‘model_name’: The name of the embedding model - ‘embedding’: The embedding vector - ‘metadata’: Optional metadata to store with the embedding

Parameters:

items (list[dict[str, Any]]) – List of dictionaries, each containing content, model_name, embedding, and optional metadata.
ttl (int | None) – Optional TTL override for these entries.

Returns:

List of Redis keys where the embeddings were stored.

Return type:

List[str]

# Store multiple embeddings
keys = cache.mset([
    {
        "content": "What is ML?",
        "model_name": "text-embedding-ada-002",
        "embedding": [0.1, 0.2, 0.3],
        "metadata": {"source": "user"}
    },
    {
        "content": "What is AI?",
        "model_name": "text-embedding-ada-002",
        "embedding": [0.4, 0.5, 0.6],
        "metadata": {"source": "docs"}
    }
])

set(content, model_name, embedding, metadata=None, ttl=None)[source]#

Store an embedding with its content and model name.

Parameters:

content (Union[bytes, str]) – The content to be embedded.
model_name (str) – The name of the embedding model.
embedding (List[float]) – The embedding vector to store.
metadata (Optional[Dict[str, Any]]) – Optional metadata to store with the embedding.
ttl (Optional[int]) – Optional TTL override for this specific entry.

Returns:

The Redis key where the embedding was stored.

Return type:

str

key = cache.set(
    content="What is machine learning?",
    model_name="text-embedding-ada-002",
    embedding=[0.1, 0.2, 0.3, ...],
    metadata={"source": "user_query"}
)

set_ttl(ttl=None)#

Set the default TTL, in seconds, for entries in the cache.

Parameters:: ttl (Optional[int], optional) – The optional time-to-live expiration for the cache, in seconds.
Raises:: ValueError – If the time-to-live value is not an integer.
Return type:: None

property ttl: int | None#: The default TTL, in seconds, for entries in the cache.

LLM Cache

Contents

LLM Cache#

SemanticCache#

LangCacheSemanticCache#

Cache Schema Classes#

CacheEntry#

CacheHit#

Embeddings Cache#

EmbeddingsCache#