Conversational Interface

Creating a conversational interface to an API
Published

February 28, 2024

A competitor to the company got acquired by Microsoft recently. They now have acccess to Chat GPT and have used that to create a conversational interface to their social media monitoring system.

I previously had a go at doing this. The mistral model seems high quality and quite cheap to run, so I wonder if it would be able to do this. There is already an API available so can I create a conversational interface for it?

Loading the Model

The Mistral model (Jiang et al. 2023) is 7B which would take 28 GB of GPU RAM to load. I don’t have that amount so I have to load the model in a quantized way. Luckily the integration of accelerate with transformers allows loading the model in 4bit quantized mode. With that it will take around 3.5GB.

Jiang, Albert Q., Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, et al. 2023. “Mistral 7B.” https://arxiv.org/abs/2310.06825.
Code
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.2",
    load_in_4bit=True,
)

Using the API Documentation

The API is quite large and I need to handle a lot of different calls. Ideally I would be able to pass details about the different calls from the API documentation itself (which is available here). That would save a lot of time developing this agent at the cost of dramatically increasing the prompt size.

I can try that now.

Retrieving Projects

To start with we can try generating some code to retrieve a list of the current projects. For a conversational interface the RAG approach seems good.

The complete tool will need to generate code or API calls that will then produce the data that is required. Once that has been retrieved it can then formulate the response to the user.

To start with I just want it to generate the code that would return the project names. We will start with the example user task:

Can you tell me the names of the projects that I have?

I have quite a lot of API documentation available and have copied the most relevant pages to markdown files:

It would be great if I can provide this documentation to the model and then get it to solve the task. The Mistral model uses the longformer attention mechanism (Beltagy, Peters, and Cohan 2020) which allows it to handle arbitrary sequence lengths.

Beltagy, Iz, Matthew E. Peters, and Arman Cohan. 2020. “Longformer: The Long-Document Transformer.” https://arxiv.org/abs/2004.05150.
Code
from pathlib import Path

API_DOCUMENTATION_FOLDER = Path(".").resolve() / "api-documentation"
aggregates_documentation = (API_DOCUMENTATION_FOLDER / "aggregates.txt").read_text().strip()
data_retrieval_basic_charts_documentation = (API_DOCUMENTATION_FOLDER / "data-retrieval-basic-charts.txt").read_text().strip()
data_retrieval_topics_documentation = (API_DOCUMENTATION_FOLDER / "data-retrieval-topics.txt").read_text().strip()
data_retrieval_total_mentions_documentation = (API_DOCUMENTATION_FOLDER / "data-retrieval-total-mentions.txt").read_text().strip()
dimensions_documentation = (API_DOCUMENTATION_FOLDER / "dimensions.txt").read_text().strip()
retrieving_projects_documentation = (API_DOCUMENTATION_FOLDER / "retrieving-projects.txt").read_text().strip()
retrieving_queries_documentation = (API_DOCUMENTATION_FOLDER / "retrieving-queries.txt").read_text().strip()

tool_documentation = "\n\n".join(map(str.strip, [
    retrieving_projects_documentation,
    retrieving_queries_documentation,
    data_retrieval_total_mentions_documentation,
    data_retrieval_basic_charts_documentation,
    data_retrieval_topics_documentation,
    aggregates_documentation,
    dimensions_documentation,
]))

The overall prompting approach for large language models follows this overall structure which I quite like:

chatgpt prompting

(source)

With this I need a persona and a task description. The tool descriptions will come in the context.

Another thing that is good to do is to version your prompts. This is because getting the prompt right is tricky and time consuming, and testing different prompts and saving them helps a lot. I’ve written the current prompt to this file and it follows a structure similar to that of the huggingface agent prompt.

Code
from pathlib import Path

PROMPT_FOLDER = Path(".").resolve() / "prompts"
rag_task_description = (PROMPT_FOLDER / "01-conversational.txt").read_text().strip()

With the model, api documentation and prompt introduction I then need a way to run the model. This is a conversational setup so it’s a chance to use the huggingface chat templating system. As we are going to try this out a few times, wrapping up the code in a nice function seems appropriate:

Code
# from src/main/python/blog/conversational_interface/generator/chat.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


@torch.inference_mode()
def generate_chat(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    prompt: str,
    max_new_tokens: int = 100,
    do_sample: bool = False,
) -> str:
    chat_input = tokenizer.apply_chat_template(
        [{"role": "user", "content": prompt}],
        return_tensors="pt",
        padding="longest",
    )
    chat_input = chat_input.to(model.device)
    generated_ids = model.generate(
        chat_input,
        max_new_tokens=max_new_tokens,
        do_sample=do_sample,
        pad_token_id=tokenizer.pad_token_id,
    )
    output = tokenizer.decode(
        generated_ids[0, chat_input.shape[1] :],
        skip_special_tokens=True,
    )
    return output.strip()
Code
from transformers import AutoTokenizer, AutoModelForCausalLM
from IPython.display import Markdown

def respond(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    prompt: str,
    max_new_tokens: int = 1_000,
    do_sample: bool = False,
    **kwargs,
) -> Markdown:
    
    response = generate_chat(
        model=model,
        tokenizer=tokenizer,
        prompt=prompt,
        max_new_tokens=max_new_tokens,
        do_sample=do_sample,
        **kwargs,
    )
    
    return Markdown("---\n\n" + response + "\n\n---")

Now with all of this together we can try it out! The simple query for the model is

Can you tell me the names of the projects that I have?

We want to see if the model can answer this. The correct answer is to make a request to the /projects/summary endpoint, as documented in the Retrieving Projects page:

The following call will list your Projects:

curl -X GET https://api.brandwatch.com/projects/summary

It is also possible to call the /projects endpoint which provides more detailed information. Both of these calls are documented and will be provided to the model.

%%time
from IPython.display import Markdown

user_query = "Can you tell me the names of the projects that I have?"

respond(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
{rag_task_description}

# Tools

{tool_documentation}

# User Request

{user_query}
""".strip(),
    max_new_tokens=1_000,
    do_sample=True,
)
CPU times: user 16.7 s, sys: 2.76 s, total: 19.5 s
Wall time: 19.5 s

To retrieve the names of the projects you have access to, you can use the following API endpoint:

projects_names = requests.get("/projects/summary")["results"]["results"]
project_names = [project["name"] for project in projects_names]
print(project_names)

This code makes a GET request to the /projects/summary endpoint using the requests library, then extracts the project names from the resulting list of projects. The list of project names is printed for your reference.


I’ve rendered the response to markdown, which has removed the code block delimiters. Where the text became monospaced and colorful it was marked up.

It has made a valid request to the correct endpoint and has attempted to extract the project names. Unfortunately the response.json()[\"results\"][\"results\"] lookup is incorrect, it should only be response.json()[\"results\"].

So this code is good and has translated the curl requests from the documentation into python well, using the requests library. The prompt is very large though and it has made a mistake when interpreting the response structure.

This code is a good start however it needs to keep closer to the exact documentation that was provided. To do this I am changing the prompt to get it to use functions that I have defined instead of requests. The function is called get and I’ve stated that it will not accept the api key or any token (new prompt here).

Finally the initial response we just saw was generated using sampling. If I turn that off then how does it compare?

Code
%%time
from pathlib import Path

PROMPT_FOLDER = Path(".").resolve() / "prompts"
rag_task_description = (PROMPT_FOLDER / "02-conversational.txt").read_text().strip()

user_query = "Can you tell me the names of the projects that I have?"

respond(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
{rag_task_description}

# Tools

{tool_documentation}

# User Request

{user_query}
""".strip(),
    max_new_tokens=1_000,
    do_sample=False,
)
CPU times: user 14.3 s, sys: 2.71 s, total: 17 s
Wall time: 17 s

To get a list of the names of the projects you have access to, you can use the following API call:

projects = get("/projects/summary")
project_names = [project["name"] for project in projects["results"]]
print(project_names)

This will return a list of the names of all the projects you have access to.


This has generated the correct code and used it in the correct way. That’s great.

The next thing to do is to test it on something more complex. An actual user query would be to compare how people feel about their brand from this week to the one prior. We can do that with the query:

Do people feel more positively about coca-cola this week compared to last week?

This is a significantly more complex question and I’m interested to see what the model produces.

Code
%%time
from pathlib import Path

PROMPT_FOLDER = Path(".").resolve() / "prompts"
rag_task_description = (PROMPT_FOLDER / "02-conversational.txt").read_text().strip()

user_query = "Do people feel more positively about coca-cola this week compared to last week?"

respond(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
{rag_task_description}

# Tools

{tool_documentation}

# User Request

{user_query}
""".strip(),
    max_new_tokens=1_000,
    do_sample=False,
)
CPU times: user 59.8 s, sys: 2.75 s, total: 1min 2s
Wall time: 1min 2s

To answer the question “Do people feel more positively about Coca-Cola this week compared to last week?”, we can use the Brandwatch Consumer Research API to retrieve the net sentiment score for mentions of Coca-Cola during the current week and the previous week, and then compare the two values.

Here’s a Python function that performs this task:

import get

# Set up the API endpoint and query parameters
endpoint = "/projects/summary"
query_params = {}

# Get the list of projects and find the project ID for the one containing the Coca-Cola query
projects = get.get(endpoint)
coca_cola_project_id = None
for project in projects["results"]:
 if project["name"].lower().__contains__("coca-cola"):
 cocca_cola_project_id = project["id"]
 break

# If the project ID was not found, return an error message
if cocca_cola_project_id is None:
 print("Error: Cannot find project containing Coca-Cola query.")
 exit()

# Set up the API endpoint and query parameters for retrieving sentiment data
sentiment_endpoint = "/projects/{project_id}/data/sentiment/days"
start_date = "2023-03-19"  # Start of current week
end_date = "2023-03-25"  # End of current week
last_week_start_date = "2023-03-12"  # Start of previous week
last_week_end_date = "2023-03-18"  # End of previous week
query_params = {
 "startDate": start_date,
 "endDate": end_date,
 "projectId": cocca_cola_project_id,
}
current_week_sentiment = get.get(sentiment_endpoint, query_params)["results"][0]["sentimentScore"]

query_params = {
 "startDate": last_week_start_date,
 "endDate": last_week_end_date,
 "projectId": cocca_cola_project_id,
}
last_week_sentiment = get.get(sentiment_endpoint, query_params)["results"][0]["sentimentScore"]

# Compare the sentiment scores and print the result
if current_week_sentiment > last_week_sentiment:
 print("People feel more positively about Coca-Cola this week compared to last week.")
else:
 print("People feel less positively about Coca-Cola this week compared to last week.")

This function first retrieves the list of projects using the /projects/summary endpoint and searches for the project containing the Coca-Cola query. Once the project ID is found, it sets up the /projects/{project_id}/data/sentiment/days endpoint with the start and end dates for the current week and the previous week to retrieve the net sentiment scores for each week. Finally, it compares the two sentiment scores and prints the result.


I’m extremely impressed right now.

This code is broken in a few ways, however the model was capable of taking a very broad request and turning it into concrete code. I can see the following problems with the code:

  • query_params is unused when getting the project list

  • It uses get.get to make requests, so it appears to think that the get function is a renamed requests package. I should’ve provided an example of using this function.

  • There are the coca_cola_project_id and cocca_cola_project_id variables (double c in the second). The code consistently uses the misspelt version however if there are no projects for the user then the misspelt version will not be initialized and there will be an error.

  • It uses if project["name"].lower().__contains__("coca-cola"): instead of if "coca-cola" in project["name"].lower() which is unconventional but not an error (in calls __contains__). The body of the if statement also lacks indentation, which would cause a compilation error.

  • It calls exit() without importing it from the sys package. This call to exit would also be problematic if this code were evaluated as part of a larger program as sys.exit is rather abrupt.

  • It believes that the current week ends on 2023-03-25 and has calculated the dates based on this belief. This is forgivable as at no point was the model informed of the current date.

  • It calls the /projects/{project_id}/data/sentiment/days endpoint instead of the /projects/{project_id}/data/volume/sentiment/days endpoint. It’s missing the volume aggregate. This is quite a subtle bug.

  • It inspects the results incorrectly: get.get(sentiment_endpoint, query_params)["results"][0]["sentimentScore"]. Part of this is due to the wrong path being passed in however in the examples there is no sentimentScore value. It’s actually quite complex to process this correctly (there are positive, neutral and negative quantities for each day in the timeperiod, so aggregation and comparison would be required).

All of this said I am still really impressed. The shape of the correct answer is here, the model just needs more guidance around how to produce it.

I think that this approach has failed in the best possible way. The code it produces for complex requests is not useable due to bugs in the invocation of the API. The tool descriptions have resulted in a prompt that is over 10,000 tokens long and the model is not reliably using that to produce the code.

Fixing this will require simplifying the problem. Clearly the model is capable of reasoning through the problem, it just needs less scope for error.

Using Custom Tools

The model has failed using the developer documentation. Using the developer documentation gives the model a lot of ways to fail - it can call the wrong endpoints, it can inspect the output incorrectly, and it can generate unrelated code. Malicious prompting could even cause the tool to generate exploitative code.

Given all of this a more restricted code generator is very desirable. I can write a new persona and a description of each tool that the model can use and it can focus on the relevant ways to alter the invocation. This can start with a really simple description that becomes more complex when required. For example, getting the project list is just:

get-projects: gets the project names available to the user

Conversational Custom Tool

With a very simple set of tools how does the conversational approach work?

Code
%%time
from pathlib import Path

PROMPT_FOLDER = Path(".").resolve() / "prompts"
persona = (PROMPT_FOLDER / "02-continuation-persona.txt").read_text().strip()
tools = (PROMPT_FOLDER / "03-continuation-tools.txt").read_text().strip()

user_query = "Can you tell me the names of the projects that I have?"

respond(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
{persona}

# Tools

{tools}

# Task

{user_query}
""".strip(),
    max_new_tokens=1_000,
    do_sample=False,
)
CPU times: user 2.47 s, sys: 156 ms, total: 2.62 s
Wall time: 2.63 s

To get the names of the projects you have, you can use the following tool:

get-projects()

This tool will return a list of project names that you can use for further queries.


This still produces output that wants to invoke the tool that I indicated. Providing examples of tool use would fix this.

Even so I have to inspect the output and extract the tool invocation from the conversational output. This makes me think that the continuation mode would be a better fit for this as that would allow the output to be restricted to only the tool use.

Custom Tool Continuation

The continuation mode harnesses the training of the large language model more directly. Continuation is the process of predicting the next token in the text sequence. This means that the text is not split into separate user and model conversation utterances.

The change in the prompt that triggers this behaviour is providing examples of tool use. Each example will follow a fixed pattern and will show both the user query and the output. Then the actual task is presented as an example and the model generates the tool usage as the continuation. We can then extract this and parse it more reliably.

After producing the solution to our task the model will predict further tokens. Since it has seen repeated examples of task and solution it will generate further tasks. We can prevent the model from generating endless tasks by looking for the term that introduces the next task and stopping generation at that point. This is done using the huggingface StoppingCriteria.

We start with the end, here is how you can define the StoppingCriteria and use it to generate limited output:

Code
# from src/main/python/blog/conversational_interface/generator/continuation.py
from __future__ import annotations

from typing import Optional

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    StoppingCriteria,
    StoppingCriteriaList,
)


class TokenSequenceStoppingCriteria(StoppingCriteria):
    @staticmethod
    def make(
        tokenizer: AutoTokenizer,
        sequence: str,
        device: Optional[str | torch.device] = None,
    ) -> TokenSequenceStoppingCriteria:
        stopping_tokens = tokenizer(
            sequence,
            add_special_tokens=False,
        ).input_ids
        # mistral tokenization is unusual, a zero length token can
        # get added at the start of the sequence which can prevent
        # the tokenized sequence matching the generated tokens.
        # this filter drops any zero length tokens.
        stopping_tokens = [
            token for token in stopping_tokens if len(tokenizer.decode(token)) > 0
        ]
        return TokenSequenceStoppingCriteria(stopping_tokens, device=device)

    def __init__(
        self,
        sequence: list[int] | torch.Tensor,
        device: Optional[str | torch.device] = None,
    ) -> None:
        super().__init__()
        if isinstance(sequence, list):
            sequence = torch.Tensor(sequence)
        if device is not None:
            sequence = sequence.to(device)
        self.sequence = sequence

    def to(self, device: str | torch.device) -> TokenSequenceStoppingCriteria:
        self.sequence = self.sequence.to(device)
        return self

    def as_list(self) -> StoppingCriteriaList:
        return StoppingCriteriaList([self])

    def __call__(
        self,
        input_ids: torch.LongTensor,
        scores: torch.FloatTensor,
        **kwargs,
    ) -> bool:
        # this assumes only a single sequence is being generated
        return self.is_end(input_ids[0])

    def is_end(self, tokens: torch.Tensor) -> bool:
        assert len(tokens.shape) == 1
        end = tokens[-len(self.sequence) :]
        per_token_matches = end == self.sequence
        return bool(per_token_matches.all())

    def truncate(self, tokens: torch.Tensor) -> torch.Tensor:
        if self.is_end(tokens):
            return tokens[: -len(self.sequence)]
        return tokens


@torch.inference_mode()
def generate_continuation(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    prompt: str,
    max_new_tokens: int = 100,
    stopping: str = "\n\n# Task",
) -> str:
    stopping_criteria = TokenSequenceStoppingCriteria.make(
        tokenizer=tokenizer,
        sequence=stopping,
        device=model.device,
    )

    model_input = tokenizer(
        prompt,
        return_tensors="pt",
        padding="longest",
    )
    input_tokens = model_input.input_ids.shape[1]
    model_input = model_input.to(model.device)
    generated_ids = model.generate(
        **model_input,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
        stopping_criteria=StoppingCriteriaList([stopping_criteria]),
    )
    filtered_ids = generated_ids[0, input_tokens:]
    filtered_ids = stopping_criteria.truncate(filtered_ids)
    output = tokenizer.decode(
        filtered_ids,
        skip_special_tokens=True,
    )
    return output.strip()

The prompt has to include the persona and tools, just like before. It now has to end with the examples and user task prompt. To do this I’ve written several examples which take the form:

# Task

How many conversations were there about my brand last week?

# Response

get-projects

This shows both the example task and the correct usage of the get-projects tool (i.e. without the parenthesis for invocation). Since this response will be more structured I am going to stop rendering it as markdown, as that will interfere with the overall post structure.

Code
%%time
import string
from pathlib import Path

PROMPT_FOLDER = Path(".").resolve() / "prompts"
persona = (PROMPT_FOLDER / "02-continuation-persona.txt").read_text().strip()
tools = (PROMPT_FOLDER / "03-continuation-tools.txt").read_text().strip()
examples = (PROMPT_FOLDER / "04-continuation-examples.txt").read_text().strip()

tool_selection_prompt_template = string.Template(f"""
{persona}

# Tools

{tools}

{examples}

# Task

${{task}}

# Response
""".strip()
)

user_query = "Can you tell me the names of the projects that I have?"

model_response = generate_continuation(
    model=model,
    tokenizer=tokenizer,
    prompt=tool_selection_prompt_template.substitute(task=user_query),
)

print("For the task:")
print(user_query)
print("I would do:")
print(model_response)
For the task:
Can you tell me the names of the projects that I have?
I would do:
get-projects
CPU times: user 1.1 s, sys: 218 ms, total: 1.32 s
Wall time: 2.67 s

This is working much better. It has produced the correct tool invocation in the correct format and runs in a fraction of the time of the conversational mode.

A second to produce one stage of the response means that this would be a sluggish talker (remember, a complete version of this would feed the output of get-projects back into itself). I’m not overly concerned with that at this point in time. It’s better to get something working and then focus on performance than waste time when proving the concept.

At this point this post has already become quite long. This technique is interesting so I am going to continue it in another post.

I feel the current post would not be complete without trying the more complex question on the new approach. Remember that the number of tools available for the continuation approach is drastically limited!

Code
%%time
import string
from pathlib import Path

PROMPT_FOLDER = Path(".").resolve() / "prompts"
persona = (PROMPT_FOLDER / "02-continuation-persona.txt").read_text().strip()
tools = (PROMPT_FOLDER / "03-continuation-tools.txt").read_text().strip()
examples = (PROMPT_FOLDER / "04-continuation-examples.txt").read_text().strip()

tool_selection_prompt_template = string.Template(f"""
{persona}

# Tools

{tools}

{examples}

# Task

${{task}}

# Response
""".strip()
)

user_query = "Do people feel more positively about coca-cola this week compared to last week?"

model_response = generate_continuation(
    model=model,
    tokenizer=tokenizer,
    prompt=tool_selection_prompt_template.substitute(task=user_query),
)

print("For the task:")
print(user_query)
print("I would do:")
print(model_response)
For the task:
Do people feel more positively about coca-cola this week compared to last week?
I would do:
get-topics project="pepsi", query="coca-cola comparison", start="-1 week", end="now"

get-topics project="pepsi", query="coca-cola comparison", start="now", end="now"

compare the sentiment scores of the two sets of topics.
CPU times: user 3.39 s, sys: 145 ms, total: 3.53 s
Wall time: 3.53 s

This shows that with the simpler tools it can produce more consistent output. It has rambled at the end as there is no way for it to perform the desired comparison. I think there is great potential here.