GCP GKE Setup
Prerequisites
Here are Pre-requistics:
-
A GPU machine with a minimum of 40 GB of GPU memory. For GCP, we recommend using the a2-highgpu machine types, such as a2-highgpu-8g. For reference, GCP GPU machine types.
-
A high-performance network drive, such as Network File System (NFS), is recommended for storing models, datasets, and fine-tuned parameters to enable sharing across different pods. On GCP, we recommend using FileStore NFS.
Setup
- Install gcloud CLI
Install the Google Cloud CLI.
Modify profile to update your $PATH and enable shell command completion? Do you want to continue (Y/n)? Y
Enter a path to an rc file to update, or leave blank to use /Users/{userName}/.bash_profile: /Users/{userName}/.bash_profile
Start a new shell for the changes to take effect.
Run gcloud init to initialize the SDK
- Install GKE plugins
Install the kubectl if not installed. Example:
Follow the instructions here to install the GKE plugins
Setup GKE Cluster
- Create a K8s cluster, example:
gcloud container clusters create l4-gpu-cluster \
--zone=us-west4-c \
--machine-type=g2-standard-96 \
--accelerator=type=nvidia-l4,count=8 \
--enable-gvnic \
--enable-image-streaming \
--enable-shielded-nodes \
--shielded-secure-boot \
--shielded-integrity-monitoring \
--enable-autoscaling \
--num-nodes=1 \
--min-nodes=0 \
--max-nodes=3 \
--cluster-version=1.32.3-gke.1440000 \
--node-locations=us-west4-c
- Verify the installation
Setup the NFS
- Enable the Google Cloud FileStore
- Create the FileStore, example:
gcloud filestore instances create nfs-share \
--zone=us-west4-c \
--tier=BASIC_HDD \
--file-share=name="share1",capacity=1TB \
--network=name="default"
- Install NFS provisioner
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm repo update
- Create the Storage Class, example:
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=10.217.139.170 \
--set nfs.path=/share1 \
--set storageClass.name=nfs-client
- Create the PVC
Use the enclosed filestore-pvc.yaml
- Verify the installation
Install Lamini
Follow this link to install the Lamini.
It is highly recommended to start the Lamini installation with the following bare minimal requirements before adding and making any custom requirements and changes.
- Update configs/helm-config.yaml for minimal resources
- Enable local database, persistent-lamini/values.yaml
Uncomment the followings:
database:
enabled: true
storage:
pvc_name: lamini-volume # must be identical to the pvc_name in folder-creation
- Generate the Helm charts for Lamini
- Install the Persistent Lamini
NAMESPACE=lamini
helm install persistent-lamini ./persistent-lamini --namespace ${NAMESPACE} --create-namespace --debug
- Install the Lamini
- Port forwarding the service
- Verify the Lamini Frontend.
Open the http://localhost:8000
The Lamini portal should open and display corrctly.
- Verify the tuning
Create a new tuning job with the Tiny Random Mistral model and Llama 3.1 model, under the Tune tab in the Lamini portal. The jobs should finish with the completed
status.
- Verify the inference
Run a simple inference test. The inference should return a prompt response without any errors or timeout.
import lamini
import random
lamini.api_url = "http://localhost:8000"
lamini.api_key = "test_token"
model_name = "hf-internal-testing/tiny-random-MistralForCausalLM"
llm = lamini.Lamini(model_name=model_name)
prompt = f"Generate a random number between 0 and {random.random()}"
print(llm.generate(prompt=prompt))
Replace hf-internal-testing/tiny-random-MistralForCausalLM
with meta-llama/Llama-3.1-8B-Instruct
, and try again.