DocsQuickstart

Get Started: Your First Trainwave Job

This guide walks you through the process of launching your first machine learning job on Trainwave. Let’s get started!

Step 1: Create an Organization

An organization helps you manage all your projects under a single umbrella. Each organization has its own separate billing, making it easy to track and manage costs.

When you first create a Trainwave account, you’ll need to set up an organization. You can easily do this from the top left corner of the Trainwave web UI.

create organization

Step 2: Create a Project

Once you have an organization, you can create a project to house your training jobs. To create a new project, navigate to: https://trainwave.ai/projects

Make a note of your project ID, as you’ll need it later.

Step 3: (Optional) Invite team members

If you’re working with a team, you can invite them to your organization. You can all share the same billing and collaborate on projects.

You can invite them by going to this page: https://trainwave.ai/orgs/members

Step 4: Fund your account with some credits

To run jobs on Trainwave, you’ll need to add credits to your account. Visit the billing page to add funds: https://trainwave.ai/orgs/billing

fund account

NOTE: If you run out of funds, your jobs will be terminated and you will not be able to launch another job until you add more credits to your account.

Step 5: Install the CLI

The Trainwave CLI gives you powerful command-line control over your training jobs. Install it using pip:

pip install trainwave-cli

Make sure you have your preferred Python environment set up before installing.

More information on the CLI can be found in the CLI Reference and a more in-depth installation guide in the Installation section.

Step 6: Authenticate the CLI

Log in to Trainwave through your CLI for secure access:

wave auth login

Alternatively you can create an API key in the web UI and configure it with the CLI:

wave auth set-token <API_KEY>

This will open a browser window for authentication.

Verify your login by running:

wave auth whoami

Step 7: Create your first job

In order to run a job, you will need a trainwave.toml configuration file in your project.

There is two ways to create this file:

  1. Through the web UI using the wave config command
  2. Manually creating the file

Option 1: Using the wave config command

wave config

This will open a browser window where you can configure your job. Once you’re done, the configuration file will be saved in your project.

The file will be saved to trainwave.toml in your project directory or you can specify a different path. Once the file is saved, you can edit it manually if needed.

We recommend adding any variables that your job requires and setting them as environment variables in the configuration file.

Option 2: Manually creating the file

For this simple setup, we will assume you only need one. Here is a sample configuration file:

name = "Finetune LLAMA3"           # The name of job
project = "p-eqhplsmc"             # The job id from what we got in step 2
 
expires = "1h"                     # Optional: Will kill the job after 1h
 
gpu_type = "RTX A5000"             # The type of GPU to use
gpus = 2                           # The number of GPUs to use
hdd_size_mb = 51200                # Size of the disk you need
setup_command = "bash setup.sh"    # This is a command that will run first to set up your env
run_command = "bash run.sh"        # this is the command that should start your training
compliance_soc2 = true             # Optional: If you care about compliance
image = "trainwave/pytorch:2.3.1"  # You can find the list of images under the "Images" documentation
 
env_vars.WANDB_API_KEY = "${WANDB_API_KEY}"  # This will take your current env value for "WANDB_API_KEY"
env_vars.HUGGINGFACE_TOKEN = "${HF_TOKEN}"   # Same for "HF_TOKEN"

Copy and paste this confirguration into a file called trainwave.toml in your project directory and customize it to your needs.

Step 8: Launch!

Once you’ve configured your job, simply do:

wave jobs launch

Which will upload your code and run it on a machine in the cloud!

Additional documentation

How to manage secrets and environment variables: Variables

To see the full documentation for the configuration file please see: Configuration docs

To see the full documentation for the CLI please see: CLI docs