FAQ
Core Development Questions
How do I set up authentication?
See the Authentication guide for getting and configuring your Lamini API key.
How does model loading work?
- Model weights are loaded to GPU memory once and persist between requests
- Loading only happens on initial startup or after unexpected events
- Loading time scales with model size
What systems can I develop with Lamini on?
- Recommended: Ubuntu 22.04+ with Python 3.10-3.12
- Not officially supported on Windows (use Docker with Linux container instead)
Training & Tuning
What models can I use?
Check the Models page for the full list of supported models.
How long can training jobs run?
- Default timeout: 4 hours
- Jobs automatically checkpoint and resume if timeout occurs
- For longer runs:
- Request more GPUs via
gpu_config
- Contact us for dedicated instances
Can I disable memory tuning (MoME)?
Yes, use these settings for cases like summarization where qualitative output is preferred:
finetune_args={
"batch_size": 1,
"index_ivf_nlist": 1,
"index_method": "IndexFlatL2",
"index_max_size": 1,
}
How does Lamini optimize model training?
- Uses LoRAs (low-rank adapters) automatically
- 266x fewer parameters than full model finetuning
- 1.09B times faster model switching
- No manual configuration needed
Infrastructure
Why might my job be queued?
The On-Demand plan uses shared resources. For dedicated compute: - Consider Lamini Reserved plans - Contact us about running on your own infrastructure
What GPU can Lamini run on?
- Lamini can run on AMD and NVIDIA GPUs
How do I get started with Lamini private servers or enterprise plans?
- Contact us to learn more about our reserved plans
- Run your own jobs on dedicated compute