API Reference
Create Cache
Create a persistent cache for low-latency completions.
Args:
name: Unique identifier for the cache.
model: The model name to use when generating completions.
gguf_file: Path to the gguf
weights file to load.
filters: Optional metadata filters used to select documents.
docs: Explicit list of document IDs to include in the cache.
auth: Authentication context used for permission checks.
Returns: A dictionary describing the created cache.
POST
Headers
Body
application/json
Response
200
application/json
Successful Response
The response is of type object
.