S

Deploy AI Models on DigitalOcean

Complete guide to deploying AI models on DigitalOcean with Kubernetes and App Platform

Deploy AI Models on DigitalOcean

DigitalOcean provides simple, cost-effective infrastructure for deploying AI models.

Prerequisites

  • DigitalOcean account
  • doctl CLI installed
  • Basic Kubernetes knowledge
  • Docker installed

Deployment Options

1. App Platform

Simplest deployment method:

# .do/app.yaml
name: ai-model-app
services:
  - name: api
    github:
      repo: your-username/ai-model
      branch: main
    build_command: pip install -r requirements.txt
    run_command: uvicorn app:app --host 0.0.0.0 --port 8080
    envs:
      - key: HUGGING_FACE_TOKEN
        scope: RUN_TIME
        type: SECRET
    instance_count: 2
    instance_size_slug: professional-s

2. Droplets (VMs)

For full control:

# Create droplet
doctl compute droplet create ai-model   --image ubuntu-22-04-x64   --size g-2vcpu-8gb   --region nyc1   --ssh-keys your-ssh-key-id

# SSH into droplet
doctl compute ssh ai-model

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Run model
docker run -d -p 8000:8000   -e HUGGING_FACE_TOKEN=your-token   vllm/vllm-openai:latest   --model meta-llama/Llama-3.1-8B-Instruct

3. Kubernetes (DOKS)

For production workloads:

# Create cluster
doctl kubernetes cluster create ai-cluster   --region nyc1   --node-pool "name=worker-pool;size=s-2vcpu-4gb;count=3"

# Get kubeconfig
doctl kubernetes cluster kubeconfig save ai-cluster

# Deploy
kubectl apply -f deployment.yaml

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ai-model
  template:
    metadata:
      labels:
        app: ai-model
    spec:
      containers:
      - name: model
        image: vllm/vllm-openai:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "8Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "2"
---
apiVersion: v1
kind: Service
metadata:
  name: ai-model-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8000
  selector:
    app: ai-model

Spaces (Object Storage)

Store models in Spaces:

import boto3

s3 = boto3.client('s3',
    endpoint_url='https://nyc3.digitaloceanspaces.com',
    aws_access_key_id='your-key',
    aws_secret_access_key='your-secret'
)

# Download model
s3.download_file('my-space', 'models/llama.bin', '/app/model.bin')

Monitoring

Built-in Monitoring

  • CPU usage
  • Memory usage
  • Disk I/O
  • Network traffic

Custom Alerts

doctl monitoring alert create   --type v1/insights/droplet/cpu   --compare GreaterThan   --value 80   --window 5m   --entities droplet-id

Cost Optimization

  • Use appropriate droplet sizes
  • Enable auto-scaling on DOKS
  • Use Spaces for model storage
  • Implement caching
  • Use reserved instances for predictable workloads

Backup Strategy

# Create snapshot
doctl compute droplet-action snapshot ai-model   --snapshot-name ai-model-backup-$(date +%Y%m%d)

# Backup volumes
doctl compute volume-action snapshot volume-id   --snapshot-name volume-backup-$(date +%Y%m%d)

Production Checklist

  • [ ] Set up load balancer
  • [ ] Configure firewall rules
  • [ ] Add monitoring and alerts
  • [ ] Set up automated backups
  • [ ] Configure auto-scaling
  • [ ] Add custom domain
  • [ ] Enable SSL/TLS
  • [ ] Set up logging
  • [ ] Implement rate limiting
  • [ ] Document deployment