GCP GKE Setup

Prerequisites

Here are Pre-requistics:

A GPU machine with 40 GB or more GPU memory is required, For example, the a2-highgpu machine types. For reference, see GCP GPU machine types.
A high-performance network drive, such as a Network File System (NFS), is necessary for storing models, datasets, and fine-tuned parameters. This setup enables sharing across different pods. For instance, you can use FileStore NFS on GCP.

Setup

Install gcloud CLI

# Install the Google Cloud SDK
./google-cloud-sdk/install.sh

Modify profile to update your $PATH and enable shell command completion? Do you want to continue (Y/n)? Y

Enter a path to an rc file to update, or leave blank to use /Users/{userName}/.bash_profile: /Users/{userName}/.bash_profile

Start a new shell for the changes to take effect.

Run gcloud init to initialize the SDK

# Initialize the Google Cloud SDK
gcloud init

Install GKE plugins

Install the kubectl if not installed. Follow the instructions here to install the GKE plugins

Install gcloud auth plugin.

gcloud components install gke-gcloud-auth-plugin

Login via gloud auth.

gcloud auth login
````

## Setup GKE Cluster

- Create an GKE cluster, example:

```bash
# Create a GKE cluster with NVIDIA L4 GPUs and specified configurations
gcloud container clusters create l4-gpu-cluster \
   --zone=us-west4-c \
   --machine-type=g2-standard-96 \
   --accelerator=type=nvidia-l4,count=8 \
   --enable-gvnic \
   --enable-image-streaming \
   --enable-shielded-nodes \
   --shielded-secure-boot \
   --shielded-integrity-monitoring \
   --enable-autoscaling \
   --num-nodes=1 \
   --min-nodes=0 \
   --max-nodes=3 \
   --cluster-version=1.32.3-gke.1440000 \
   --node-locations=us-west4-c

Verify the installation

# Verify the GKE cluster nodes are running
kubectl get nodes

Access the GKE cluster

gcloud container clusters get-credentials l4-gpu-cluster --zone us-west4-c

Scaling the GKE

Scale down:

gcloud container clusters resize l4-gpu-cluster --node-pool gpu-pool --num-nodes 0 --zone us-west4-c

Scale up:

gcloud container clusters resize l4-gpu-cluster --node-pool gpu-pool --num-nodes 1 --zone us-west4-c

Setup the NFS

Enable the Google Cloud FileStore

# Enable the Google Cloud FileStore API
gcloud services enable file.googleapis.com

Create the FileStore, example:

# Create a FileStore instance with 1TB capacity
gcloud filestore instances create nfs-share \
   --zone=us-west4-c \
   --tier=BASIC_HDD \
   --file-share=name="share1",capacity=1TB \
   --network=name="default"

Install NFS provisioner

# Add and update the NFS provisioner Helm repository
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm repo update

Create the Storage Class, example:

# Install the NFS provisioner with specified configurations
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
   --set nfs.server=10.217.139.170 \
   --set nfs.path=/share1 \
   --set storageClass.name=nfs-client

Create the PVC

# Example: filestore-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lamini-volume
  annotations:
    volume.beta.kubernetes.io/storage-class: nfs-client
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs-client
  resources:
    requests:
      storage: 200Gi

# Apply the FileStore PVC configuration
kubectl -n lamini apply -f filestore-pvc.yaml

Verify the installation

# Verify the PVC was created successfully
kubectl -n lamini get pvc

Install Lamini

Follow this link to install the Lamini.

It is highly recommended to start the Lamini installation with the following bare minimal requirements before adding and making any custom requirements and changes.

Update configs/helm-config.yaml for minimal resources

inference.offline = 1   
training.worker.num_pods = 1
training.worker.resources.gpu.request = 1

Enable local database, persistent-lamini/values.yaml

Uncomment the followings:

folder-creation:
  storage:
    pvc_name: lamini-volume # must be identical to the pvc_name in database

database:
  enabled: true
  storage:
    pvc_name: lamini-volume  # must be identical to the pvc_name in folder-creation

Generate the Helm charts for Lamini

# Generate Helm charts for Lamini deployment
./generate_helm_charts.sh

Install the Persistent Lamini

# Install the persistent Lamini components
NAMESPACE=lamini
helm install persistent-lamini ./persistent-lamini --namespace ${NAMESPACE} --create-namespace --debug

Install the Lamini

# Install the main Lamini components
helm install lamini ./lamini --namespace ${NAMESPACE} --create-namespace --debug

Port forwarding the service

# Forward the API service port to localhost
kubectl -n lamini port-forward svc/api 8000:8000

Verify the Lamini Frontend.

Open the http://localhost:8000

The Lamini portal should open and display corrctly.

Verify the tuning

Create a new tuning job with the Tiny Random Mistral model and Llama 3.1 model, under the Tune tab in the Lamini portal. The jobs should finish with the completed status.

Verify the inference

Run a simple inference test. The inference should return a prompt response without any errors or timeout.

import lamini
import random
lamini.api_url = "http://localhost:8000"
lamini.api_key = "test_token"
model_name = "hf-internal-testing/tiny-random-MistralForCausalLM"
llm = lamini.Lamini(model_name=model_name)
prompt = f"Generate a random number between 0 and {random.random()}"
print(llm.generate(prompt=prompt))

Replace hf-internal-testing/tiny-random-MistralForCausalLM with meta-llama/Llama-3.1-8B-Instruct, and try again.