> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# vLLM Embeddings

> Generate embeddings using self-hosted vLLM server

## Overview

vLLM provides high-throughput embedding inference for self-hosted deployments.

## Quick Start

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonaiagents import embedding

result = embedding(
    input="Hello world",
    model="hosted_vllm/intfloat/e5-mistral-7b-instruct",
    api_base="http://localhost:8000"
)
print(f"Dimensions: {len(result.embeddings[0])}")
```

## CLI Usage

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai embed "Hello world" --model hosted_vllm/intfloat/e5-mistral-7b-instruct
```

## Setup

1. Start vLLM server with embedding model:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
python -m vllm.entrypoints.openai.api_server \
    --model intfloat/e5-mistral-7b-instruct \
    --task embed
```

2. Set environment variable:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
export HOSTED_VLLM_API_BASE="http://localhost:8000"
```

## Related

* [Embedding Providers Overview](/docs/embeddings/index)
* [Infinity Embeddings](/docs/embeddings/providers/infinity)
