Skip to content

FAQ

Core Development Questions

How do I set up authentication?

See the Authentication guide for getting and configuring your Lamini API key.

How does model loading work?

  • Model weights are loaded to GPU memory once and persist between requests
  • Loading only happens on initial startup or after unexpected events
  • Loading time scales with model size

What systems can I develop with Lamini on?

  • Recommended: Ubuntu 22.04+ with Python 3.10-3.12
  • Not officially supported on Windows (use Docker with Linux container instead)

Training & Tuning

What models can I use?

Check the Models page for the full list of supported models.

How long can training jobs run?

  • Default timeout: 4 hours
  • Jobs automatically checkpoint and resume if timeout occurs
  • For longer runs:
  • Request more GPUs via gpu_config
  • Contact us for dedicated instances

Can I disable memory tuning (MoME)?

Yes, use these settings for cases like summarization where qualitative output is preferred:

finetune_args={
  "batch_size": 1,
  "index_ivf_nlist": 1,
  "index_method": "IndexFlatL2",
  "index_max_size": 1,
}

How does Lamini optimize model training?

  • Uses LoRAs (low-rank adapters) automatically
  • 266x fewer parameters than full model finetuning
  • 1.09B times faster model switching
  • No manual configuration needed

Infrastructure

Why might my job be queued?

The On-Demand plan uses shared resources. For dedicated compute: - Consider Lamini Reserved plans - Contact us about running on your own infrastructure

What GPU can Lamini run on?

  • Lamini can run on AMD and NVIDIA GPUs

How do I get started with Lamini private servers or enterprise plans?

  • Contact us to learn more about our reserved plans
  • Run your own jobs on dedicated compute