JSON Output

Enforcing structured JSON schema output is important for handling LLM outputs downstream with other systems and APIs in your applications.

For an in-depth technical deep dive of how we implemented this feature, see our blog post.

Python SDKREST API

You can enforce JSON schema via the Lamini class, which is the base class for all runners. Lamini wraps our REST API endpoint.

First, return a string:

from lamini import Lamini

llm = Lamini(model_name="meta-llama/Llama-3.1-8B-Instruct")
output = llm.generate(
    "How are you?",
    output_type={"answer": "str"}
)

First, get a basic string output out:

curl --location "https://api.lamini.ai/v1/completions" \
--header "Authorization: Bearer $LAMINI_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model_name": "meta-llama/Llama-3.1-8B-Instruct",
    "prompt": "How are you?",
    "output_type": {
        "answer": "str"
    }
}'

Expected Output

{
    "answer":"I'm doing well, thanks for asking! How about you"
}

Values other than strings

You can change the output type to be a different type. This typing is strictly enforced. We currently support str, int, float, bool, and enums structured as str lists, or int lists. For example, "answer": ["A","B","C","D"] would always return one of A, B, C, or D for the answer field. "answer": [1, 2, 3] would always return one of 1, 2, or 3 for the answer field.

Please let us know if there are additional types you'd like to see supported.

Examples

Python SDKPython SDKPython SDKREST API

llm.generate(
    "How old are you?",
    output_type={"age": "int"}
)

llm.generate(
    "Pick a color.",
    output_type={"name": ["red", "white", "blue"]}
)

llm.generate(
    "Pick an odd digit",
    output_type={"name": [1, 3, 5, 7, 9]}
)

curl --location "https://api.lamini.ai/v1/completions" \
--header "Authorization: Bearer $LAMINI_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model_name": "meta-llama/Llama-3.1-8B-Instruct",
    "prompt": "How old are you?",
    "output_type": {
        "age": "int"
    }
}'

Expected Output

{
    "age": 25
}

Multiple outputs in JSON schema

You can also add multiple output types in one call. The output is a JSON schema that is also strictly enforced.

Python SDKREST API

llm.generate(
    "How old are you?",
    output_type={"age": "int", "units": "str"}
)

curl --location "https://api.lamini.ai/v1/completions" \
--header "Authorization: Bearer $LAMINI_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model_name": "meta-llama/Llama-3.1-8B-Instruct",
    "prompt": "How old are you?",
    "output_type": {
        "age": "int",
        "units": "str"
    }
}'

Expected Output

{
    "age": 25,
    "units": "years"
}

Great! You've successfully run an LLM with structured JSON schema outputs.

Known issue: JSON output truncation

Truncation may occur when JSON output generates double quotation marks ("). This is a known limitation with using output_type since a quotation at the end of the output and a quotation as a part of the output are not distinguished.

Workaround

Use prompt tuning instead of output type if the response may contain double quotation marks. e.g. "Only return the relevant quote, do not include any other text".
Use prompt tuning to avoid generating double quotation marks. e.g. "Only use single quotes when there is dialogue".
Contact us to discuss alternative solutions or workarounds for your use case.

Future support

We are evaluating the feasibility of improving our system to handle double quotes in JSON output in the future. If we decide to support this feature, we will update our documentation and notify users.

Known issue: Long JSON output times out

To ensure that requests complete in a reasonable amount of time, there is a time limit on all requests including json requests. If your requests exceeds the time limit, try guiding the model to generate a shorter json object, e.g. write a description in 3 sentences or less. Timed out requests may result in failed, incomplete, or missing output.

Workaround

Reduce the size of the output by limiting the number of fields or the prompt.
Break down the JSON output into separate smaller requests.
Contact us to discuss alternative solutions or workarounds for your use case.

Future support

We are evaluating the feasibility of improving our system to handle large JSON output in the future. If we decide to support this feature, we will update our documentation and notify users.

Feel free to contact us with any questions or concerns.