GCP GKE Setup
Prerequisites
Here are Pre-requistics:
-
A GPU machine with 40 GB or more GPU memory is required, For example, the a2-highgpu machine types. For reference, see GCP GPU machine types.
-
A high-performance network drive, such as a Network File System (NFS), is necessary for storing models, datasets, and fine-tuned parameters. This setup enables sharing across different pods. For instance, you can use FileStore NFS on GCP.
Setup
- Install gcloud CLI
Install the Google Cloud CLI.
Modify profile to update your $PATH and enable shell command completion? Do you want to continue (Y/n)? Y
Enter a path to an rc file to update, or leave blank to use /Users/{userName}/.bash_profile: /Users/{userName}/.bash_profile
Start a new shell for the changes to take effect.
Run gcloud init to initialize the SDK
- Install GKE plugins
Install the kubectl if not installed. Follow the instructions here to install the GKE plugins
- Install gcloud auth plugin.
- Login via gloud auth.
gcloud auth login
````
## Setup GKE Cluster
- Create an GKE cluster, example:
```bash
# Create a GKE cluster with NVIDIA L4 GPUs and specified configurations
gcloud container clusters create l4-gpu-cluster \
--zone=us-west4-c \
--machine-type=g2-standard-96 \
--accelerator=type=nvidia-l4,count=8 \
--enable-gvnic \
--enable-image-streaming \
--enable-shielded-nodes \
--shielded-secure-boot \
--shielded-integrity-monitoring \
--enable-autoscaling \
--num-nodes=1 \
--min-nodes=0 \
--max-nodes=3 \
--cluster-version=1.32.3-gke.1440000 \
--node-locations=us-west4-c
- Verify the installation
- Access the GKE cluster
- Scaling the GKE
Scale down:
gcloud container clusters resize l4-gpu-cluster --node-pool gpu-pool --num-nodes 0 --zone us-west4-c
Scale up:
gcloud container clusters resize l4-gpu-cluster --node-pool gpu-pool --num-nodes 1 --zone us-west4-c
Setup the NFS
- Enable the Google Cloud FileStore
- Create the FileStore, example:
# Create a FileStore instance with 1TB capacity
gcloud filestore instances create nfs-share \
--zone=us-west4-c \
--tier=BASIC_HDD \
--file-share=name="share1",capacity=1TB \
--network=name="default"
- Install NFS provisioner
# Add and update the NFS provisioner Helm repository
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm repo update
- Create the Storage Class, example:
# Install the NFS provisioner with specified configurations
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=10.217.139.170 \
--set nfs.path=/share1 \
--set storageClass.name=nfs-client
- Create the PVC
# Example: filestore-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: lamini-volume
annotations:
volume.beta.kubernetes.io/storage-class: nfs-client
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-client
resources:
requests:
storage: 200Gi
- Verify the installation
Install Lamini
Follow this link to install the Lamini.
It is highly recommended to start the Lamini installation with the following bare minimal requirements before adding and making any custom requirements and changes.
- Update configs/helm-config.yaml for minimal resources
- Enable local database, persistent-lamini/values.yaml
Uncomment the followings:
database:
enabled: true
storage:
pvc_name: lamini-volume # must be identical to the pvc_name in folder-creation
- Generate the Helm charts for Lamini
- Install the Persistent Lamini
# Install the persistent Lamini components
NAMESPACE=lamini
helm install persistent-lamini ./persistent-lamini --namespace ${NAMESPACE} --create-namespace --debug
- Install the Lamini
# Install the main Lamini components
helm install lamini ./lamini --namespace ${NAMESPACE} --create-namespace --debug
- Port forwarding the service
- Verify the Lamini Frontend.
Open the http://localhost:8000
The Lamini portal should open and display corrctly.
- Verify the tuning
Create a new tuning job with the Tiny Random Mistral model and Llama 3.1 model, under the Tune tab in the Lamini portal. The jobs should finish with the completed
status.
- Verify the inference
Run a simple inference test. The inference should return a prompt response without any errors or timeout.
import lamini
import random
lamini.api_url = "http://localhost:8000"
lamini.api_key = "test_token"
model_name = "hf-internal-testing/tiny-random-MistralForCausalLM"
llm = lamini.Lamini(model_name=model_name)
prompt = f"Generate a random number between 0 and {random.random()}"
print(llm.generate(prompt=prompt))
Replace hf-internal-testing/tiny-random-MistralForCausalLM
with meta-llama/Llama-3.1-8B-Instruct
, and try again.