Deploy Self-Hosted
Guia de deploy self-hosted da Polar com Docker Compose. Serviços, requisitos de GPU, variáveis de ambiente e monitoramento.
Visão Geral
A Polar pode ser implantada em sua própria infraestrutura usando Docker Compose. Este guia cobre a configuração completa incluindo todos os serviços necessários, requisitos de GPU, variáveis de ambiente e monitoramento.
Requisitos
Hardware
| Componente | Mínimo | Recomendado |
|---|---|---|
| GPU | NVIDIA H20 (96GB) | NVIDIA H100 (80GB) x2 |
| CPU | 16 cores | 32+ cores |
| RAM | 64 GB | 128 GB |
| Armazenamento | 500 GB SSD | 2 TB NVMe |
| Rede | 1 Gbps | 10 Gbps |
Software
- Docker 24.0+
- Docker Compose v2.20+
- NVIDIA Container Toolkit
- NVIDIA Driver 535+
- Ubuntu 22.04 LTS (recomendado)
Docker Compose
# docker-compose.yml
version: "3.8"
services:
# API Principal
polar-api:
image: ghcr.io/polar-ai/polar-api:latest
ports:
- "8000:8000"
environment:
- POLAR_MODEL_PATH=/models
- POLAR_REDIS_URL=redis://redis:6379
- POLAR_ES_URL=http://elasticsearch:9200
- POLAR_QDRANT_URL=http://qdrant:6333
- POLAR_API_KEYS=${POLAR_API_KEYS}
- POLAR_JWT_SECRET=${POLAR_JWT_SECRET}
- POLAR_SUPABASE_URL=${POLAR_SUPABASE_URL}
- POLAR_SUPABASE_KEY=${POLAR_SUPABASE_KEY}
- POLAR_LOG_LEVEL=info
- CUDA_VISIBLE_DEVICES=0,1
volumes:
- ./models:/models
- ./data:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
depends_on:
- redis
- elasticsearch
- qdrant
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Redis (cache e filas)
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
command: redis-server --appendonly yes --maxmemory 4gb --maxmemory-policy allkeys-lru
restart: unless-stopped
# OpenSearch (busca e RAG)
opensearch:
image: opensearchproject/opensearch:2.19.1
ports:
- "9200:9200"
environment:
- discovery.type=single-node
- plugins.security.disabled=true
- "OPENSEARCH_JAVA_OPTS=-Xms4g -Xmx4g"
volumes:
- es_data:/usr/share/opensearch/data
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
# Qdrant (vector store)
qdrant:
image: qdrant/qdrant:v1.8.0
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
environment:
- QDRANT__SERVICE__GRPC_PORT=6334
restart: unless-stopped
# Nginx (reverse proxy)
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./nginx/ssl:/etc/nginx/ssl
depends_on:
- polar-api
restart: unless-stopped
# Prometheus (monitoramento)
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
restart: unless-stopped
# Grafana (dashboards)
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
- ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources
depends_on:
- prometheus
restart: unless-stopped
volumes:
redis_data:
es_data:
qdrant_data:
prometheus_data:
grafana_data:Variáveis de Ambiente
Crie um arquivo .env na raiz do projeto:
# .env
# API
POLAR_API_KEYS=pk-chave1,pk-chave2
POLAR_JWT_SECRET=seu-jwt-secret-seguro-aqui
# Supabase (autenticação)
POLAR_SUPABASE_URL=https://seu-projeto.supabase.co
POLAR_SUPABASE_KEY=sua-chave-supabase
# Grafana
GRAFANA_PASSWORD=sua-senha-grafana
# GPU (opcional)
CUDA_VISIBLE_DEVICES=0,1Configuração Nginx
# nginx/nginx.conf
events {
worker_connections 1024;
}
http {
upstream polar_api {
server polar-api:8000;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/m;
server {
listen 80;
server_name api.seu-dominio.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name api.seu-dominio.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
# Headers de seguranca
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
location /v1/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://polar_api;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Streaming support
proxy_buffering off;
proxy_cache off;
chunked_transfer_encoding on;
proxy_http_version 1.1;
proxy_set_header Connection "";
# Timeouts
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
# WebSocket (Urso Eco)
location /v1/personaplex/stream {
proxy_pass http://polar_api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 3600s;
}
location /health {
proxy_pass http://polar_api;
}
}
}Monitoramento
Prometheus
# monitoring/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: "polar-api"
static_configs:
- targets: ["polar-api:8000"]
metrics_path: /metrics
- job_name: "redis"
static_configs:
- targets: ["redis:6379"]
- job_name: "elasticsearch"
static_configs:
- targets: ["elasticsearch:9200"]Métricas Disponíveis
| Métrica | Tipo | Descrição |
|---|---|---|
polar_requests_total | Counter | Total de requisições |
polar_request_duration_seconds | Histogram | Latência das requisições |
polar_tokens_processed_total | Counter | Total de tokens processados |
polar_gpu_utilization | Gauge | Utilização da GPU (%) |
polar_gpu_memory_used_bytes | Gauge | Memória GPU utilizada |
polar_active_connections | Gauge | Conexões ativas |
Deploy
Iniciar
# Baixar modelos (primeira vez)
./scripts/download_models.sh
# Iniciar todos os serviços
docker compose up -d
# Verificar status
docker compose ps
# Ver logs
docker compose logs -f polar-apiVerificar Saúde
# Health check
curl http://localhost:8000/health
# Testar API
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer pk-sua-chave" \
-H "Content-Type: application/json" \
-d '{"model": "urso-mabe", "messages": [{"role": "user", "content": "Olá!"}]}'Atualizar
docker compose pull
docker compose up -d --force-recreateSolução de Problemas
| Problema | Solução |
|---|---|
| GPU não detectada | Verifique NVIDIA Container Toolkit e drivers |
| Out of Memory (GPU) | Reduza batch size ou use modelo menor |
| Elasticsearch não inicia | Verifique vm.max_map_count (sysctl -w vm.max_map_count=262144) |
| Timeout nas requisições | Aumente proxy_read_timeout no Nginx |
| Latência alta | Verifique utilização de GPU e rede |