Skip to main content

Overview

vLLM provides high-throughput embedding inference for self-hosted deployments.

Quick Start

from praisonaiagents import embedding

result = embedding(
    input="Hello world",
    model="hosted_vllm/intfloat/e5-mistral-7b-instruct",
    api_base="http://localhost:8000"
)
print(f"Dimensions: {len(result.embeddings[0])}")

CLI Usage

praisonai embed "Hello world" --model hosted_vllm/intfloat/e5-mistral-7b-instruct

Setup

  1. Start vLLM server with embedding model:
python -m vllm.entrypoints.openai.api_server \
    --model intfloat/e5-mistral-7b-instruct \
    --task embed
  1. Set environment variable:
export HOSTED_VLLM_API_BASE="http://localhost:8000"