DocsEndpoints

Trainwave API Reference

The Trainwave REST API enables you to programmatically control and monitor your machine learning jobs. This reference provides detailed information about available endpoints, authentication, and example usage.

Quick Start

# Authentication using API key
curl -H "Accept: application/json" \
     -H "X-API-KEY: your-api-key" \
     https://backend.trainwave.ai/api/v1/jobs/
 
# Create a new job
curl -X POST \
     -H "Accept: application/json" \
     -H "X-API-KEY: your-api-key" \
     -H "Content-Type: application/json" \
     -d '{
       "name": "mnist-training",
       "project": "p-abc123",
       "gpu_type": "RTX A5000",
       "gpus": 1
     }' \
     https://backend.trainwave.ai/api/v1/jobs/

Base URL

All API requests should be made to: https://backend.trainwave.ai/api/v1/

Authentication

See the Authentication Guide for detailed information about securing your API requests.

Response Format

All responses follow this standard format:

{
    "success": true,
    "data": {
        // Response data here
    },
    "meta": {
        "request_id": "req_abc123",
        "timestamp": "2024-03-21T12:00:00Z"
    }
}

Error responses:

{
    "success": false,
    "error": {
        "code": "error_code",
        "message": "Human-readable error message",
        "details": {
            // Additional error details
        }
    },
    "meta": {
        "request_id": "req_xyz789",
        "timestamp": "2024-03-21T12:00:00Z"
    }
}

Rate Limits

  • Free tier: 100 requests per minute
  • Pro tier: 1000 requests per minute
  • Enterprise tier: Custom limits

Rate limit headers are included in all responses:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1616876400

Available Endpoints

Jobs

List Jobs

GET /api/v1/jobs/

Query parameters:

  • org (string): Organization ID
  • project (string): Project ID
  • status (string): Filter by status (running, completed, failed)
  • limit (integer, default: 20): Number of results per page
  • offset (integer): Pagination offset

Example request:

curl -H "X-API-KEY: your-api-key" \
     "https://backend.trainwave.ai/api/v1/jobs/?project=p-abc123&status=running"

Example response:

{
    "success": true,
    "data": {
        "count": 25,
        "next": "https://backend.trainwave.ai/api/v1/jobs/?offset=20",
        "previous": null,
        "results": [
            {
                "id": "j-789xyz",
                "rid": "training-job-1",
                "created_at": "2024-03-21T09:00:00Z",
                "state": "RUNNING",
                "project": "p-def456",
                "owner": {
                    "id": "u-abc123",
                    "email": "user@example.com",
                    "username": "johndoe"
                },
                "total_cost": 25.5,
                "gpu_hours": 2.5,
                "metrics": {
                    "gpu_utilization": 95.2,
                    "memory_usage": 14.3
                }
            }
        ]
    }
}

Create Job

POST /api/v1/jobs/

Request body:

{
    "name": "mnist-training",
    "project": "p-abc123",
    "description": "Training MNIST classifier",
    "gpu_type": "RTX A5000",
    "gpus": 1,
    "cpu_cores": 4,
    "memory_gb": 16,
    "hdd_size_mb": 51200,
    "image": "trainwave/pytorch:2.3.1",
    "setup_command": "pip install -r requirements.txt",
    "run_command": "python train.py",
    "env_vars": {
        "WANDB_API_KEY": "xxx",
        "PYTORCH_CUDA_ALLOC_CONF": "max_split_size_mb:512"
    },
    "expires": "4h",
    "compliance_soc2": true
}

Example using Python:

import requests
 
api_key = "your-api-key"
headers = {
    "Accept": "application/json",
    "X-API-KEY": api_key,
    "Content-Type": "application/json"
}
 
job_config = {
    "name": "mnist-training",
    "project": "p-abc123",
    "gpu_type": "RTX A5000",
    "gpus": 1,
    # ... other configuration
}
 
response = requests.post(
    "https://backend.trainwave.ai/api/v1/jobs/",
    headers=headers,
    json=job_config
)
 
if response.status_code == 201:
    job = response.json()["data"]
    print(f"Created job: {job['id']}")

Get Job Details

GET /api/v1/jobs/{job_id}/

Example response:

{
    "success": true,
    "data": {
        "id": "j-789xyz",
        "name": "mnist-training",
        "state": "RUNNING",
        "created_at": "2024-03-21T09:00:00Z",
        "started_at": "2024-03-21T09:01:00Z",
        "finished_at": null,
        "project": "p-abc123",
        "gpu_type": "RTX A5000",
        "gpus": 1,
        "cpu_cores": 4,
        "memory_gb": 16,
        "cost_per_hour": 2.5,
        "total_cost": 5.0,
        "metrics": {
            "gpu_utilization": 95.2,
            "memory_usage": 14.3,
            "network_rx_bytes": 1024000,
            "network_tx_bytes": 512000
        },
        "artifacts": {
            "model": "s3://bucket/model.pt",
            "logs": "s3://bucket/logs.txt"
        }
    }
}

Stop Job

POST /api/v1/jobs/{job_id}/stop/

Example using curl:

curl -X POST \
     -H "X-API-KEY: your-api-key" \
     https://backend.trainwave.ai/api/v1/jobs/j-789xyz/stop/

Projects

List Projects

GET /api/v1/projects/

Query parameters:

  • org (string): Organization ID
  • limit (integer, default: 20): Results per page
  • offset (integer): Pagination offset

Example response:

{
    "success": true,
    "data": {
        "count": 2,
        "results": [
            {
                "id": "p-abc123",
                "name": "MNIST Classification",
                "description": "Image classification research",
                "created_at": "2024-03-01T12:00:00Z",
                "organization": "org-xyz789",
                "active_job_count": 2,
                "total_job_count": 15,
                "total_cost": 150.25
            }
        ]
    }
}

Organizations

List Organizations

GET /api/v1/organizations/

Example response:

{
    "success": true,
    "data": {
        "count": 2,
        "results": [
            {
                "id": "org-xyz789",
                "name": "Research Team",
                "created_at": "2024-01-15T08:00:00Z",
                "credit_balance": 1000.5,
                "member_count": 5,
                "project_count": 3
            }
        ]
    }
}

Metrics

Get Job Metrics

GET /api/v1/metrics/{metric_name}/?job_id={job_id}

Available metrics:

  • cpu: CPU utilization
  • memory: Memory usage
  • network: Network I/O
  • gpu_utilization: GPU utilization
  • gpu_memory: GPU memory usage
  • disk: Disk I/O

Example request:

curl -H "X-API-KEY: your-api-key" \
     "https://backend.trainwave.ai/api/v1/metrics/gpu_utilization/?job_id=j-789xyz"

Example response:

{
    "success": true,
    "data": {
        "metric": "gpu_utilization",
        "job_id": "j-789xyz",
        "values": [
            [1711027200, 95.2],
            [1711027260, 94.8],
            [1711027320, 96.1]
        ],
        "unit": "percent",
        "interval": "60s"
    }
}

SDKs and Libraries

Official SDKs:

Webhooks

Trainwave can send webhooks for important events. Configure webhooks in your organization settings.

Example webhook payload:

{
    "event": "job.completed",
    "job": {
        "id": "j-789xyz",
        "state": "COMPLETED",
        "exit_code": 0,
        "duration": 3600,
        "cost": 25.5
    },
    "timestamp": "2024-03-21T10:00:00Z",
    "webhook_id": "wh_abc123"
}

Best Practices

  1. Rate Limiting

    • Implement exponential backoff
    • Cache responses when appropriate
    • Use bulk operations when possible
  2. Error Handling

    • Always check the success field
    • Log the request_id for debugging
    • Handle rate limits gracefully
  3. Security

    • Never expose API keys in client-side code
    • Rotate API keys regularly
    • Use environment variables for sensitive data
  4. Monitoring

    • Track rate limit headers
    • Monitor webhook delivery status
    • Log API response times

Support