Matthew’s Blog - Guidance and Jsonformer

Recently Microsoft released the guidance library which allows you to template the output of a langauge model. The jsonformer library has also been created which allows you to restrict the output to a json schema. How do these two techniques compare?

In this post I am going to run through the examples for each of these libraries and then apply them to an extractive task.

Model

Throughout this I will be using a large language model. To make all of the comparisons fair I want to use the same model for both. The Open LLM Leaderboard shows that the mosaicml/mpt-7b model is well performing for it’s size, so let’s try that.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "mosaicml/mpt-7b",
    # init_device="meta",
    init_device="cuda",
    trust_remote_code=True,
    torch_dtype=torch.float16,
)

# The documentation explicitly states that this is the tokenizer that was used
# You can load the tokenizer with the name "mosaicml/mpt-7b"
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b')

We can quickly try the model out on a simple query. I’m trying it with the prompt that FastChat uses for the vicuna model, which has performed well for me before.

One thing to note is that I have to add the USER token id to the stopping tokens as the model appears to continue to generate after the end of the output. Being able to restrict this generation using the two frameworks would be great.

Code

from transformers import GenerationConfig
import torch

with torch.inference_mode():
    inputs = tokenizer(
        """
A chat between a curious user and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: What is the capital of France?
ASSISTANT:""".strip(),
        return_tensors="pt",
    )
    inputs = inputs.to(model.device)

    config = GenerationConfig(
        early_stopping=True,
        max_new_tokens=20,
        do_sample=True,
        temperature=0.7,
        top_p=1,
        eos_token_id=[
            tokenizer.eos_token_id,
            tokenizer("USER").input_ids[0],
        ],
        pad_token_id=tokenizer.eos_token_id,
    )
    output = model.generate(
        **inputs,
        generation_config=config,
    )

print(tokenizer.decode(output[0]))

A chat between a curious user and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: What is the capital of France?
ASSISTANT: Paris is the capital of France.
USER

It’s easy to use this model and this tiny test produces good results.

Microsoft Guidance Example

Now we can try out the json example from the Microsoft Guidance github page:

import guidance

# we use LLaMA here, but any GPT-style model will do
guidance_model = guidance.llms.Transformers(
    model=model,
    tokenizer=tokenizer,
    device="cuda",
)

# we can pre-define valid option sets
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

# define the prompt
character_maker = guidance(
"""The following is a character profile for an RPG game in JSON format.
```json
{
    "id": "{{id}}",
    "description": "{{description}}",
    "name": "{{gen 'name' stop='"'}}",
    "age": {{gen 'age' pattern='[0-9]+' stop=','}},
    "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
    "weapon": "{{select 'weapon' options=valid_weapons}}",
    "class": "{{gen 'class' stop='"'}}",
    "mantra": "{{gen 'mantra' temperature=0.7 stop='"'}}",
    "strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
    "items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7 stop='"'}}"{{/geneach}}]
}```"""
)

# generate a character
character_maker(
    id="e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
    description="A quick and nimble fighter.",
    valid_weapons=valid_weapons,
    llm=guidance_model,
    stream=False,
)

The following is a character profile for an RPG game in JSON format.
```json
{
    "id": "e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
    "description": "A quick and nimble fighter.",
    "name": "Fighter",
    "age": 20,
    "armor": "leatherchainmailplateleather",
    "weapon": "sword",
    "class": "fighter",
    "mantra": "Death be to my enemies.",
    "strength": 10,
    "items": ["sword", "shield", "shortbow", "arrows", "leather armor"]
}```

I’ve had to adjust this to add a stopping token to every open ended utterance. I think this relates to the inability of the model to correctly terminate during generation. It may be worth using the mosaicml/mpt-7b-instruct variant of the model.

Jsonformer Example

Now we can try the same with Jsonformer.

from jsonformer import Jsonformer

json_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number"},
        "is_student": {"type": "boolean"},
        "courses": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.11/lib/python3.11/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(

RuntimeError: The expanded size of the tensor (50432) must match the existing size (50277) at non-singleton dimension 1.  Target sizes: [1, 50432].  Tensor sizes: [50277]

The example has failed with a tensor size mismatch. I wonder why that is.

We can start by checking the size of the output that the model returns:

import torch

with torch.inference_mode():
    input_ids = tokenizer(
        "hello world",
        return_tensors="pt",
        return_attention_mask=False,
    ).input_ids
    input_ids = input_ids.to(model.device)
    output = model(input_ids)
output.logits.shape

torch.Size([1, 2, 50432])

We can see here that the model does return 50,432 separate values. I’ve reviewed the code of the OutputNumbersTokens which was being used at the time of the failure, and it gets the vocab size by taking the length of the tokenizer. The core assumption is that this vocab size matches the size of the final dimension. So this should return 50,432:

len(tokenizer)

The tokenizer does not match the model. This is a very strange.

I can get this working by adding junk tokens to the tokenizer until the size matches:

tokenizer.add_tokens([
    f"||xxxxxx-special-{id}-xxxxxx||"
    for id in range(50432-50277)
])
len(tokenizer)

Now it should be possible to get the example working:

from jsonformer import Jsonformer

json_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number"},
        "is_student": {"type": "boolean"},
        "courses": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

{'name': 'John', 'age': 25.0, 'is_student': True, 'courses': ['Math']}

I think that the fault here lies with the model. The output from the model potentially could not be decoded with the tokenizer for the model!

Extractive Task

We have got both tools working, but how do they compare. One way to compare them would be to perform an extractive task, as that has a true value.

To start with we can try extracting the speaker details from a passage.

prompt = """
I will provide you with a passage from a book. You will extract details
of the speaker from that passage and return them to me encoded as json
data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"name": "Ishmael", "occupation": "sailor"}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:"""

import guidance

# we use LLaMA here, but any GPT-style model will do
guidance_model = guidance.llms.Transformers(
    model=model,
    tokenizer=tokenizer,
    device="cuda",
)

# define the prompt
speaker_describer = guidance(
"""{{prompt}}
{"name": "{{gen 'name' stop='"'}}", "occupation": "{{gen 'occupation' stop='"'}}"}"""
)

# generate a character
speaker_describer(
    prompt=prompt,
    llm=guidance_model,
    stream=False,
)


I will provide you with a passage from a book. You will extract details
of the speaker from that passage and return them to me encoded as json
data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"name": "Ishmael", "occupation": "sailor"}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:
{"name": "Mr. Carraway", "occupation": "hardware dealer"}

This is a great performance by guidance. It has correctly named the individual and made a reasonable guess at their occupation based on the information available. The passage comes from The Great Gatsby so we know that the name of the speaker is Nick Carraway, who was in the military and intends to study a new profession.

Let’s see how jsonformer performs.

from jsonformer import Jsonformer

json_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "occupation": {"type": "string"},
    }
}

jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

{'name': 'Ishmael', 'occupation': 'sailor'}

This has failed. Rather unfortunately it has chosen to repeat the example that was provided.

We can try another utterance where we want to extract an unknown number of things.

prompt = """
I will provide you with a passage from a book. You will list all locations
mentioned in that passage and return them to me encoded as json data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"locations": ["the shore", "the sea"]}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:"""

import guidance

# we use LLaMA here, but any GPT-style model will do
guidance_model = guidance.llms.Transformers(
    model=model,
    tokenizer=tokenizer,
    device="cuda",
)

# define the prompt
speaker_describer = guidance(
"""{{prompt}}
{"locations": [{{#geneach 'items' join=', '}}"{{gen 'this' stop='"'}}"{{/geneach}}]}"""
)

# generate a character
speaker_describer(
    prompt=prompt,
    llm=guidance_model,
    stream=False,
)


I will provide you with a passage from a book. You will list all locations
mentioned in that passage and return them to me encoded as json data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"locations": ["the shore", "the sea"]}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:
{"locations": ["the city", "the Civil War", "the wholesale hardware business", "my father", "my grandfather", "my line", "my brother", "my grandfather’s brother", "fifty-one", "the Dukes of Buccleuch", "the Middle Western city", "the Dukes of Buccleuch", "the wholesale hardware business", "my father", "my grandfather", "my line", "my brother", "my grandfather’s brother", "fifty-one", "the Middle Western city", "the Civil War", "the wholesale hardware business", "my father", "my grandfather", "my line", "my brother", "my grandfather’s brother", "fifty-one", "the Middle Western city", "the Civil War", "the wholesale hardware business", "my father", "

KeyboardInterrupt:

I’ve had to stop this as it is endlessly generating the same responses. There is clearly a problem with getting the list generation to stop, so coming up with a variable number of entries is hard. We can try to get it to generate the underlying list itself using regex and stopping on the closing bracket.

import guidance

# we use LLaMA here, but any GPT-style model will do
guidance_model = guidance.llms.Transformers(
    model=model,
    tokenizer=tokenizer,
    device="cuda",
)

# define the prompt
speaker_describer = guidance(
"""{{prompt}}
{"locations": ["{{gen 'this' stop=']' pattern='[A-Za-z0-9 ]+(", "[A-Za-z0-9 ]+)*"]'}}]}"""
)

# generate a character
speaker_describer(
    prompt=prompt,
    llm=guidance_model,
    stream=False,
)


I will provide you with a passage from a book. You will list all locations
mentioned in that passage and return them to me encoded as json data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"locations": ["the shore", "the sea"]}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:
{"locations": ["the city", "the Civil War"]}

This has done fairly well. It would’ve been nice if it could describe the city as the Middle Western city. Also the Civil War is not a location.

The use of the regular expression and stopping token worked quite well. Now it’s time to see how jsonformer fares.

from jsonformer import Jsonformer

json_schema = {
    "type": "object",
    "properties": {
        "locations": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

{'locations': ['the shore', 'the sea']}

Once again this has repeated the example. I wonder if the jsonformer code itself is just extracting the example from the prompt and imagining that it was generated.

We could test this by capturing the raw output of generate and inspecting it afterwards:

original_generate = model.generate
generated_tokens = []

def counting_generate(*args, **kwargs):
    global generated_tokens
    response = original_generate(*args, **kwargs)
    generated_tokens.append(response)
    return response

model.generate = counting_generate

from jsonformer import Jsonformer

json_schema = {
    "type": "object",
    "properties": {
        "locations": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print("number of generations: ", len(generated_tokens))
for tokens in generated_tokens:
    text = tokenizer.decode(tokens[0])
    text = text.splitlines()[-1]
    print(text)

number of generations:  2
Result: {"locations": ["the shore",
Result: {"locations": ["the shore", "the sea"]

I’ve cut down the output to only show the final part. We can see that it is really generating this text, which is very odd. Remember that the model will produce the correct output if the prompt is right.

Is something being added that has broken the generation? Let’s see the full output for the last generation call:

Code

print(tokenizer.decode(generated_tokens[-1][0]))


I will provide you with a passage from a book. You will list all locations
mentioned in that passage and return them to me encoded as json data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"locations": ["the shore", "the sea"]}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:
Output result in the following JSON schema format:
{"type": "object", "properties": {"locations": {"type": "array", "items": {"type": "string"}}}}
Result: {"locations": ["the shore", "the sea"]

Yes, the generator is adding the prompt

Output result in the following JSON schema format:
{“type”: “object”, “properties”: {“locations”: {“type”: “array”, “items”: {“type”: “string”}}}}
Result:

It turns out that the Jsonformer class itself has added this in a method called get_prompt:

Code

Jsonformer.get_prompt??

Signature: Jsonformer.get_prompt(self)
Docstring: <no docstring>
Source:   
    def get_prompt(self):
        template = """{prompt}\nOutput result in the following JSON schema format:\n{schema}\nResult: {progress}"""
        progress = json.dumps(self.value)
        gen_marker_index = progress.find(f'"{self.generation_marker}"')
        if gen_marker_index != -1:
            progress = progress[:gen_marker_index]
        else:
            raise ValueError("Failed to find generation marker")
        prompt = template.format(
            prompt=self.prompt,
            schema=json.dumps(self.json_schema),
            progress=progress,
        )
        return prompt
File:      ~/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.11/lib/python3.11/site-packages/jsonformer/main.py
Type:      function

To fix this we can subclass Jsonformer itself and alter the get_prompt method to return only the original prompt and the current generation progress:

from jsonformer import Jsonformer
import json

class UnpromptedJsonformer(Jsonformer):
    def get_prompt(self):
        template = """{prompt}{progress}"""
        progress = json.dumps(self.value)
        gen_marker_index = progress.find(f'"{self.generation_marker}"')
        if gen_marker_index != -1:
            progress = progress[:gen_marker_index]
        else:
            raise ValueError("Failed to find generation marker")

        prompt = template.format(
            prompt=self.prompt,
            progress=progress,
        )

        return prompt

With this we can try the two tasks again. Let’s start with The Great Gatsby speaker details:

Code

prompt = """
I will provide you with a passage from a book. You will extract details
of the speaker from that passage and return them to me encoded as json
data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"name": "Ishmael", "occupation": "sailor"}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:
"""

json_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "occupation": {"type": "string"},
    }
}

jsonformer = UnpromptedJsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

{'name': 'Mr. Carraway', 'occupation': 'hardware dealer'}

This now matches the output of guidance. The auto added prompt was the problem!

Now it’s time to try location extraction again. This is what I thought that Jsonformer would do well, as one of the examples is generation of an array of values. The guidance examples have a fixed iteration count and as you can see I had to put some work in to get it to generate a variable length list.

Code

prompt = """
I will provide you with a passage from a book. You will list all locations
mentioned in that passage and return them to me encoded as json data.

PASSAGE:
Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me on
shore, I thought I would sail about a little and see the watery part of
the world.

SPEAKER:
{"locations": ["the shore", "the sea"]}

PASSAGE:
My family have been prominent, well-to-do people in this Middle Western
city for three generations. The Carraways are something of a clan, and we
have a tradition that we’re descended from the Dukes of Buccleuch, but the
actual founder of my line was my grandfather’s brother, who came here in
fifty-one, sent a substitute to the Civil War, and started the wholesale
hardware business that my father carries on today.

SPEAKER:
"""

json_schema = {
    "type": "object",
    "properties": {
        "locations": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

jsonformer = UnpromptedJsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

{'locations': ['the city']}

This time it’s worked out slightly better as it hasn’t extracted the civil war as a location. Overall I would say that guidance has the edge over jsonformer, just because the code seems more resilient (even though the model is wrong). Jsonformer would be better if you had more direct control over the prompt.

I think that prompting is so critical to the success of the task that automatically adding prompts in is almost always a bad idea. It gets done because it makes a nice demo. WHen it comes to applying these tools it can get in the way.