Overview
vLLM provides high-throughput embedding inference for self-hosted deployments.Quick Start
CLI Usage
Setup
- Start vLLM server with embedding model:
- Set environment variable:
Generate embeddings using self-hosted vLLM server
from praisonaiagents import embedding
result = embedding(
input="Hello world",
model="hosted_vllm/intfloat/e5-mistral-7b-instruct",
api_base="http://localhost:8000"
)
print(f"Dimensions: {len(result.embeddings[0])}")
praisonai embed "Hello world" --model hosted_vllm/intfloat/e5-mistral-7b-instruct
python -m vllm.entrypoints.openai.api_server \
--model intfloat/e5-mistral-7b-instruct \
--task embed
export HOSTED_VLLM_API_BASE="http://localhost:8000"