curl --request POST \
--url https://api.zeroentropy.dev/v1/models/embed \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "<string>",
"input_type": "query",
"input": "<string>",
"dimensions": 123,
"encoding_format": "float",
"latency": "fast"
}
'{
"results": [
{
"embedding": [
123
]
}
],
"usage": {
"total_bytes": 123,
"total_tokens": 123
}
}Embeds the provided input text with ZeroEntropy embedding models.
The results will be returned in the same order as the text provided. The embedding is such that queries will have high cosine similarity with documents that are relevant to that query.
Organizations will, by default, have a ratelimit of 500000 bytes-per-minute. Ratelimits are refreshed every 15 seconds. If this is exceeded, requests will be throttled into latency: "slow" mode, up to 5000000 bytes-per-minute. If even this is exceeded, you will get a 429 error.
The “bytes” used by a request is calculated as sum(150 + s.encode('utf-8') for s in input). Note a baseline overhead of 150 bytes. The maximum per-request payload size is 5000000 bytes.
To increase your ratelimits, subscribe to a higher tier on the ZeroEntropy dashboard. Any payments made for subscriptions in a calendar month will be deducted from your usage charges for that month.
To request even higher ratelimits, please contact [email protected] or message us on Discord or Slack!
curl --request POST \
--url https://api.zeroentropy.dev/v1/models/embed \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "<string>",
"input_type": "query",
"input": "<string>",
"dimensions": 123,
"encoding_format": "float",
"latency": "fast"
}
'{
"results": [
{
"embedding": [
123
]
}
],
"usage": {
"total_bytes": 123,
"total_tokens": 123
}
}Documentation Index
Fetch the complete documentation index at: https://docs.zeroentropy.dev/llms.txt
Use this file to discover all available pages before exploring further.
The model ID to use for embedding. Options are: ["zembed-1"]
The input type. For retrieval tasks, either query or document.
query, document The string, or list of strings, to embed.
The output dimensionality of the embedding model. For zembed-1, the available options are: [2560, 1280, 640, 320, 160, 80, 40].
The output format of the embedding. If float, an array of floats will be returned for each embeddings. If base64, a f32 little endian byte array will be returned, encoded as a base64 string. base64 is significantly more efficient than float. The default is float.
float, base64 Whether the call will be inferenced "fast" or "slow". RateLimits for slow API calls are orders of magnitude higher, but you can expect 2-20 second latency. Fast inferences are guaranteed subsecond, but rate limits are lower. If not specified, first a "fast" call will be attempted, but if you have exceeded your fast rate limit, then a slow call will be executed. If explicitly set to "fast", then 429 will be returned if it cannot be executed fast.
fast, slow