Memory RAG
Memory RAG is a simple approach that boosts LLM accuracy from ~50% to 90-95% compared to GPT-4. It creates contextual embeddings that capture meaning and relationships, allowing smaller models to achieve high accuracy without complex RAG setups or fine-tuning overhead.
Quickstart
First, make sure your API key is set (get yours at app.lamini.ai):
To use Memory RAG, build an index by uploading documents and selecting a base open-source LLM. This is the model you'll use to query over your documents.
Next, wait for training to complete by polling for status.
Finally, query the model.
Create a prompt:
user_prompt = "How is lamini related to llamas?"
prompt_template = "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n {prompt} <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
prompt = prompt_template.format(prompt=user_prompt)
Pass the prompt to the Memory RAG model:
curl --location 'https://api.lamini.ai/alpha/memory-rag/completions' \
--header 'Authorization: Bearer $LAMINI_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n How are you? <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"job_id": 1
}'