Matthew’s Blog - Conversational Interface Automation

This is the second round of me attempting to make a full conversational interface to the Brandwatch Consumer Research API. In this post I am going to try to use the continuation approach from the last post. This will involve using the model to generate parseable code which is then executed to update the state and the model is run again until the task is completed.

To do this I need a model, a parser, a representation of the state. Then the model has to be invoked with a prompt that includes the current state and all the various actions that the model may perform to solve the task.

Lots of Code

The model loading and generation code comes from the previous post. I like to make it so that every post is complete and self contained, so that you could do this same if you wanted to.

Code

from transformers import AutoTokenizer, AutoModelForCausalLM
from pathlib import Path

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.2",
    load_in_4bit=True,
)

PROMPT_FOLDER = Path("prompts").resolve()

Code

# from src/main/python/blog/conversational_interface/generator/continuation.py
from __future__ import annotations

from typing import Optional

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    StoppingCriteria,
    StoppingCriteriaList,
)


class TokenSequenceStoppingCriteria(StoppingCriteria):
    @staticmethod
    def make(
        tokenizer: AutoTokenizer,
        sequence: str,
        device: Optional[str | torch.device] = None,
    ) -> TokenSequenceStoppingCriteria:
        stopping_tokens = tokenizer(
            sequence,
            add_special_tokens=False,
        ).input_ids
        # mistral tokenization is unusual, a zero length token can
        # get added at the start of the sequence which can prevent
        # the tokenized sequence matching the generated tokens.
        # this filter drops any zero length tokens.
        stopping_tokens = [
            token for token in stopping_tokens if len(tokenizer.decode(token)) > 0
        ]
        return TokenSequenceStoppingCriteria(stopping_tokens, device=device)

    def __init__(
        self,
        sequence: list[int] | torch.Tensor,
        device: Optional[str | torch.device] = None,
    ) -> None:
        super().__init__()
        if isinstance(sequence, list):
            sequence = torch.Tensor(sequence)
        if device is not None:
            sequence = sequence.to(device)
        self.sequence = sequence

    def to(self, device: str | torch.device) -> TokenSequenceStoppingCriteria:
        self.sequence = self.sequence.to(device)
        return self

    def as_list(self) -> StoppingCriteriaList:
        return StoppingCriteriaList([self])

    def __call__(
        self,
        input_ids: torch.LongTensor,
        scores: torch.FloatTensor,
        **kwargs,
    ) -> bool:
        # this assumes only a single sequence is being generated
        return self.is_end(input_ids[0])

    def is_end(self, tokens: torch.Tensor) -> bool:
        assert len(tokens.shape) == 1
        end = tokens[-len(self.sequence) :]
        per_token_matches = end == self.sequence
        return bool(per_token_matches.all())

    def truncate(self, tokens: torch.Tensor) -> torch.Tensor:
        if self.is_end(tokens):
            return tokens[: -len(self.sequence)]
        return tokens


@torch.inference_mode()
def generate_continuation(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    prompt: str,
    max_new_tokens: int = 100,
    stopping: str = "\n\n# Task",
) -> str:
    stopping_criteria = TokenSequenceStoppingCriteria.make(
        tokenizer=tokenizer,
        sequence=stopping,
        device=model.device,
    )

    model_input = tokenizer(
        prompt,
        return_tensors="pt",
        padding="longest",
    )
    input_tokens = model_input.input_ids.shape[1]
    model_input = model_input.to(model.device)
    generated_ids = model.generate(
        **model_input,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
        stopping_criteria=StoppingCriteriaList([stopping_criteria]),
    )
    filtered_ids = generated_ids[0, input_tokens:]
    filtered_ids = stopping_criteria.truncate(filtered_ids)
    output = tokenizer.decode(
        filtered_ids,
        skip_special_tokens=True,
    )
    return output.strip()

The new code that we have here is the parser. It’s based heavily on the combinatorial parser that David Beazley wrote about in this excellent blog post.

The tool output that will be produced by the model will be statements like:

get-document-count start=“-2 day”, end=“-1 day”

So something similar to regular method invocation using keyword arguments. The parser will convert the above statement into an object that can be easily used to invoke the desired code.

Code

from blog.parser.combinator import (
    Input,
    char,
    choice,
    filt,
    fmap,
    left,
    one_or_more,
    right,
    seq,
    shift,
    zero_or_more,
)
from blog.conversational_interface.parser import parse_output, parse_date, Invocation

Manual Run

I’m going to do a first run by hand to check how well this will work. The user task will be a bit more complex:

What’s the sentiment like for coke this week versus last week?

The aim will be to get the model to follow these steps:

get the user to identify the project of interest
get the user to identify the query of interest
get a chart of sentiment by date
show that to the user

To make this a slightly smoother experience the list of projects will be provided to the model.

Code

from itertools import starmap

def format_contextual_data(**kwargs: list[str]) -> str:
    def _format_section(name: str, values: list[str]) -> str:
        values_str = "\n".join([f" * {value}" for value in values])
        return "\n".join([f"{name}:", values_str])

    sections = starmap(_format_section, kwargs.items())
    return "\n\n".join(sections)

Code

from pathlib import Path
import string
from transformers import AutoModelForCausalLM, AutoTokenizer

def manual_run(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    task: str,
    persona: str = "persona-01.txt",
    tools: str = "tools-01.txt",
    examples: str = "examples-01.txt",
    show_prompt: bool = False,
    **context,
) -> None:
    tool_selection_context_prompt_template = string.Template("""
${persona}

# Tools

${tools}

${examples}

# Task

${task}

# Context

${context}

# Response
""".strip()
    )
    context_str = format_contextual_data(**context)
    
    prompt = tool_selection_context_prompt_template.substitute(
        persona=_load_file(persona),
        tools=_load_file(tools),
        examples=_load_file(examples),
        task=task,
        context=context_str,
    )
    if show_prompt:
        print("Prompt:")
        print(prompt)

    model_response = generate_continuation(
        model=model,
        tokenizer=tokenizer,
        prompt=prompt,
    )

    print("For the task:")
    print(task)
    print("With the context:")
    print(context_str)
    print("I would do:")
    print(model_response)

def _load_file(name_or_str: str) -> str:
    file = PROMPT_FOLDER / name_or_str
    if not file.exists():
        return name_or_str
    return file.read_text().strip()

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="What's the sentiment like for coke this week versus last week?",
    projects=["Live", "Test"],
)

For the task:
What's the sentiment like for coke this week versus last week?
With the context:
projects:
 * Live
 * Test
I would do:
get-projects
get-queries project="Live"
get-document-count project="Live", query="coke", start="last week", end="this week"
get-document-count project="Live", query="coke", start="last week-1", end="last week"
get-queries project="Test"
get-document-count project="Test", query="coke", start="this week", end="now"

This is very interesting. The set of tools that I have defined are as follows:

get-projects: gets the project names available to the user
get-queries: gets the query names in the project. You must have a project name
get-document-count: gets the number of documents in the named query for the specified date range
get-topics: gets the top 10 topics for the named query for the specified date range
answer: provide the complete answer to the user

So it has used these different tools to the best of it’s ability to perform the required task in a single iteration. It’s encouraging but wrong.

I am going to provide the list of projects all the time, so there will be no need to have a get-projects action. I am also going to provide the list of queries when the project is selected. Selecting a project or query can update the context so that commands like get-document-count do not need to use it.

Let’s see how an updated version goes.

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="What's the sentiment like this week versus last week?",
    projects=["Live", "Test"],
    persona="persona-02.txt",
    tools="tools-02.txt",
    examples="examples-02.txt",
)

For the task:
What's the sentiment like this week versus last week?
With the context:
projects:
 * Live
 * Test
I would do:
ask question="You have projects Live and Test. Which project are you asking about for sentiment comparison?"

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="I want to use the live project",
    projects=["Live", "Test"],
    persona="persona-02.txt",
    tools="tools-02.txt",
    examples="examples-02.txt",
)

For the task:
I want to use the live project
With the context:
projects:
 * Live
 * Test
I would do:
set-project project="Live"

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="What's the sentiment like this week versus last week?",
    project=["Live"],
    queries=["Coca-Cola", "Pepsi", "Irn Bru"],
    persona="persona-02.txt",
    tools="tools-02.txt",
    examples="examples-02.txt",
)

For the task:
What's the sentiment like this week versus last week?
With the context:
project:
 * Live

queries:
 * Coca-Cola
 * Pepsi
 * Irn Bru
I would do:
ask question="Which query are you asking about for sentiment comparison, Coca-Cola, Pepsi or Irn Bru?"

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="I want to use the coke query",
    project=["Live"],
    queries=["Coca-Cola", "Pepsi", "Irn Bru"],
    persona="persona-02.txt",
    tools="tools-02.txt",
    examples="examples-02.txt",
)

For the task:
I want to use the coke query
With the context:
project:
 * Live

queries:
 * Coca-Cola
 * Pepsi
 * Irn Bru
I would do:
set-query query="Coca-Cola"

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="What's the sentiment like for coke this week versus last week?",
    project=["Live"],
    query=["Coca-Cola"],
    persona="persona-02.txt",
    tools="tools-02.txt",
    examples="examples-02.txt",
)

For the task:
What's the sentiment like for coke this week versus last week?
With the context:
project:
 * Live

query:
 * Coca-Cola
I would do:
ask question="Do you want to compare sentiment for the current week versus the previous week?"

I think that the problem here is that the prompts or examples are slightly wrong. I’ve adjusted the code so I can easily swap out the different parts of the prompt. Let’s try to come up with a tool example that can let it do the sentiment chart.

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="What's the sentiment like for coke this week versus last week?",
    project=["Live"],
    query=["Coca-Cola"],
    persona="persona-02.txt",
    tools="tools-03.txt",
    examples="examples-03.txt",
)

For the task:
What's the sentiment like for coke this week versus last week?
With the context:
project:
 * Live

query:
 * Coca-Cola
I would do:
get-chart from="-2 week", to="now", dimension="sentiment", frequency="days"
answer statement="I have created this chart about the sentiment change for Coca-Cola in the last two weeks."

This is just what I wanted. I think that the big changes were:

including an example of tool use with each tool description (here)
a more complete set of examples (here)

Automating It All

Now that we have a prompt that can solve the problem end to end, let’s try to make it into an automated system. This will have a state object which can be updated by actions. The actions will invoke the BCR API as well as speaking to the user or asking questions.

It’s going to be another chunk of code to handle all of this, and it will use the parser that was introduced before.

I’m going to use the following queries as my test set:

What’s the sentiment like for coke this week versus last week?
desired outcome: A chart showing sentiment breakdown for the last 7 days versus the 7 days before that. Also needs to clarify what the actual dates are, and the query.

I’ve already used this one for the trial run.

What’s the sentiment like for coke this month versus last month?
desired outcome: Same as above but this month vs last month. Also needs to clarify what the actual dates are (is the previous month just 28 days, or all of January? I think either could be justified, but we should make it clear what data we are actually summarising). Clarify the query.
What are the main topics this month for coke?
desired outcome: Word cloud request. Top n topics within conversation for coke this month. Clarify the query.
What are the main negative topics this month for coke?
desired outcome: Word cloud request. Top n topics within negative conversation for coke this month. Clarify the query.
What is the page type breakdown for coke this month?
desired outcome: Chart showing page type breakdown within coke conversation for this month. Clarify what actual dates are, and the query.

Automation Code

I’ve gone and copied a facade that I have written to work with the BCR API. It just makes it easier for me to do what I have to, and it happens that it would work well for these tasks.

Each of the runners will require information about the tool itself. It will need to know the name of it and the parameters for it. With this it can almost be used to generate the tool descriptions, so adding in the usage and description would help.

It’s not enough to define the tools, they need to be able to work over a state. The state will have the task from the user, the context for the LLM, the list of outcomes that have been done, and any shared variables that are required to work with the API.

Code

import blog.bcr.project as bcr

Code

# from src/main/python/blog/conversational_interface/state.py
from __future__ import annotations

from dataclasses import dataclass, field, replace
from datetime import date
from itertools import starmap
from typing import Optional

import pandas as pd

import blog.bcr.project as bcr
from blog.conversational_interface import parser


@dataclass
class Context:
    projects: list[str]
    queries: Optional[list[str]] = None
    document_count: Optional[int] = None
    topics: Optional[list[str]] = None

    def __post_init__(self) -> None:
        assert self.projects, "impossible state, no projects"

    def set_project(self, name: str, queries: list[str]) -> Context:
        assert (
            name in self.projects
        ), f"cannot set project to {name}, not found in {self.projects}"
        assert queries, f"cannot set project to {name}, no queries"
        return replace(
            self,
            projects=[name],
            queries=queries,
        )

    def set_query(self, name: str) -> Context:
        assert (
            name in self.queries
        ), f"cannot set query to {name}, not found in {self.queries}"
        return replace(
            self,
            queries=[name],
        )

    def set_document_count(self, value: int) -> Context:
        return replace(
            self,
            document_count=value,
        )

    def set_topics(self, topics: list[str]) -> Context:
        return replace(
            self,
            topics=topics,
        )

    def as_dict(self) -> dict[str, list[str] | str | int]:
        result = {}
        assert self.projects, "impossible state, no projects"
        if len(self.projects) == 1:
            result["project"] = self.projects
        else:
            result["projects"] = self.projects

        if self.queries:
            if len(self.queries) == 1:
                result["query"] = self.queries
            else:
                result["queries"] = self.queries

        if self.document_count is not None:
            result["document-count"] = [self.document_count]
        if self.topics is not None:
            result["topics"] = self.topics

        return result

    def as_string(self) -> str:
        def _format_section(name: str, values: list[str]) -> str:
            values_str = "\n".join([f" * {value}" for value in values])
            return "\n".join([f"{name}:", values_str])

        context = self.as_dict()
        sections = starmap(_format_section, context.items())
        return "\n\n".join(sections)


@dataclass
class ApiClient:
    user: bcr.User = field(default_factory=bcr.User.make)
    project: Optional[bcr.Project] = None
    query: Optional[bcr.Query] = None

    def get_projects(self) -> list[str]:
        return [project.name for project in self.user.get_projects()]

    def set_project(self, name: str) -> ApiClient:
        project = self.user.get_project(name)
        return replace(self, project=project)

    def get_queries(self) -> list[str]:
        assert (
            self.project
        ), "cannot get the queries without setting the current project"
        return [query.name for query in self.project.get_queries()]

    def set_query(self, name: str) -> ApiClient:
        assert self.project, "cannot set the query without setting the current project"
        query = self.project.get_query(name)
        return replace(
            self,
            query=query,
        )

    def get_document_count(self, start: str, end: str) -> int:
        assert (
            self.project
        ), "cannot get document count without setting the current project"
        assert self.query, "cannot get document count without setting the current query"
        start_date = self._to_date(start)
        end_date = self._to_date(end)
        return self.query.count(
            start_date=start_date,
            end_date=end_date,
        )

    def get_topics(self, start: str, end: str) -> pd.DataFrame:
        assert (
            self.project
        ), "cannot get document count without setting the current project"
        assert self.query, "cannot get document count without setting the current query"
        start_date = self._to_date(start)
        end_date = self._to_date(end)
        output = self.query.aggregate_wordcloud(
            start_date=start_date,
            end_date=end_date,
            limit=10,
        )
        return output

    def get_chart(
        self, start: str, end: str, dimension: str, frequency: str
    ) -> pd.DataFrame:
        assert (
            self.project
        ), "cannot get document count without setting the current project"
        assert self.query, "cannot get document count without setting the current query"
        start_date = self._to_date(start)
        end_date = self._to_date(end)
        output = self.query.aggregate_two(
            aggregate="volume",
            dimension1=dimension,
            dimension2=frequency,
            start_date=start_date,
            end_date=end_date,
        )
        return output

    @staticmethod
    def _to_date(text: str) -> date:
        datetime = parser.parse_date(text)
        return datetime.to_date()


@dataclass
class State:
    task: str
    context: Context
    client: ApiClient
    last: Optional[State] = None

    @staticmethod
    def make(task: str) -> State:
        client = ApiClient()
        projects = client.get_projects()
        if len(projects) == 1:
            client = client.set_project(projects[0])
            queries = client.get_queries()
            if len(queries) == 1:
                client.set_query(queries[0])
        else:
            queries = None
        context = Context(projects=projects, queries=queries)
        return State(task=task, context=context, client=client)

    def first_task(self) -> str:
        if self.last is None:
            return self.task
        return self.last.first_task()

    def next(self, **kwargs) -> State:
        return replace(
            self,
            last=self,
            **kwargs,
        )

    def invoke(  # pylint: disable=too-many-return-statements
        self, invocation: parser.Invocation
    ) -> Optional[State]:
        if invocation.name == "ask":
            return self.ask(**invocation.arguments)
        if invocation.name == "answer":
            return self.answer(**invocation.arguments)
        if invocation.name == "set-project":
            return self.set_project(**invocation.arguments)
        if invocation.name == "set-query":
            return self.set_query(**invocation.arguments)
        if invocation.name == "get-document-count":
            return self.get_document_count(**invocation.arguments)
        if invocation.name == "get-topics":
            return self.get_topics(**invocation.arguments)
        if invocation.name == "get-chart":
            return self.get_chart(**invocation.arguments)
        raise AssertionError(f"unknown invocation: {invocation}")

    def ask(self, question: str) -> State:
        answer = input(question + "\n")
        return self.next(task=answer)

    def answer(self, statement: str) -> None:
        print(statement)

    def set_project(self, project: str) -> State:
        client = self.client.set_project(project)
        queries = client.get_queries()
        context = self.context.set_project(project, queries=queries)
        return self.next(
            task=self.first_task(),
            client=client,
            context=context,
        )

    def set_query(self, query: str) -> State:
        client = self.client.set_query(query)
        context = self.context.set_query(query)
        return self.next(
            task=self.first_task(),
            client=client,
            context=context,
        )

    def get_document_count(self, start: str, end: str) -> State:
        count = self.client.get_document_count(
            start=start,
            end=end,
        )
        context = self.context.set_document_count(count)
        return self.next(context=context)

    def get_topics(self, start: str, end: str) -> State:
        output = self.client.get_topics(
            start=start,
            end=end,
        )
        print(output)  # it's a dataframe, need to inspect it to see how to handle it

    def get_chart(self, start: str, end: str, dimension: str, frequency: str) -> State:
        output = self.client.get_chart(
            start=start,
            end=end,
            dimension=dimension,
            frequency=frequency,
        )
        print(output)  # it's a dataframe, need to inspect it to see how to handle it

Code

from pathlib import Path
import string
from transformers import AutoModelForCausalLM, AutoTokenizer
from typing import Tuple

def run_and_parse(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    state: State,
    persona: str = "persona-01.txt",
    tools: str = "tools-01.txt",
    examples: str = "examples-01.txt",
    show_prompt: bool = False,
    show_output: bool = False,
) -> list[Invocation]:
    tool_selection_context_prompt_template = string.Template("""
${persona}

# Tools

${tools}

${examples}

# Task

${task}

# Context

${context}

# Response
""".strip()
    )
    context_str = state.context.as_string()
    
    prompt = tool_selection_context_prompt_template.substitute(
        persona=_load_file(persona),
        tools=_load_file(tools),
        examples=_load_file(examples),
        task=state.task,
        context=context_str,
    )
    if show_prompt:
        print("prompt:")
        print(prompt)
        print()

    model_response = generate_continuation(
        model=model,
        tokenizer=tokenizer,
        prompt=prompt,
        max_new_tokens=1_000,
    )
    invocations = parse_output(model_response)

    if show_output:
        print("For the task:")
        print(state.task)
        print("With the context:")
        print(context_str)
        print("I would do:")
        print(model_response)
        print(invocations)

    return invocations

def _load_file(name_or_str: str) -> str:
    file = PROMPT_FOLDER / name_or_str
    if not file.exists():
        return name_or_str
    return file.read_text().strip()

Code

def run_and_update(
    model: AutoModelForCausalLM,
    tokenizer: AutoTokenizer,
    state: State,
    persona: str = "persona-01.txt",
    tools: str = "tools-01.txt",
    examples: str = "examples-01.txt",
    show_prompt: bool = False,
    show_output: bool = False,
) -> State:
    actions = run_and_parse(
        model=model,
        tokenizer=tokenizer,
        state=state,
        persona=persona,
        tools=tools,
        examples=examples,
        show_prompt=show_prompt,
        show_output=show_output,
    )
    for action in actions:
        state = state.invoke(action)
    return state

Code

state = State.make("What's the sentiment like for burger king this week versus last week?")

16:54:26 INFO: Configuration: loaded from /home/matthew/.config/bcr_api/auth.json
16:54:27 DEBUG: https://api.brandwatch.com/projects

Code

state = run_and_update(
    model=model,
    tokenizer=tokenizer,
    state=state,
    persona="persona-02.txt",
    tools="tools-03.txt",
    examples="examples-03.txt",
)

You have projects AAAAAAAAA, Age data collection, Age panel testing, Aidan, auto_segmentation, Billiejoe ABSA testing, Billiejoe image analysis, Billiejoe Test, Brandwatch, Chloe Russell-Sharp, Cision, Colin, Crisis Management, Dan test, Demographics test panels, ds_everything_is_fine, ds_konmari, ds sprint test, Hackathon LLM Classifiers, Hamish, Healthcare data collection, intern_intro, Joe, karim_labs_test, karim-test, Katie A, Kevin Gee, LB practice project, Lei, Lucy, Ludovica Test, Lynda, Malibu, Matthew, missed gp appointments, oh-verk, Pablo Funes, Paul, peter_test, Random things, Rina, Some real client brand queries, Steph, test, test_karim, Trend Analysis, vinay_test, Wild Gender & Emotion. Which project are you asking about?
 matthew

Code

state = run_and_update(
    model=model,
    tokenizer=tokenizer,
    state=state,
    persona="persona-02.txt",
    tools="tools-03.txt",
    examples="examples-03.txt",
)

Which project are you asking about?
 use the matthew project

Code

state = run_and_update(
    model=model,
    tokenizer=tokenizer,
    state=state,
    persona="persona-02.txt",
    tools="tools-03.txt",
    examples="examples-03.txt",
)

16:50:35 DEBUG: https://api.brandwatch.com/projects

KeyError: 'No such project matthew'

This has shown a couple of problems. First the LLM should be able to interpret the response ‘matthew’ as an instruction to use the ‘Matthew’ project. Secondly it is taking the name from the user input instead of from the context list.

The first is more difficult to fix as it involves taking the conversation so far and reformulating the task to include the conversation context. I have been looking into frameworks for this work and it’s possible to use a LLM to create the task description based on the conversation history. I’m going to try this now.

Task Summarization

One way to fit this into the current approach is to take the conversation history and use it to generate the next task. The objective is to create a new task that includes the context of the conversation so far. Then the model will be able to act in an appropriate way.

I’m going to start by using the conversational summary prompt from deepset:

Condense the following chat transcript by shortening and summarizing the content without losing important information:
{chat_transcript}
Condensed Transcript:

Code

chat_transcript = "\n".join(
    f"{role}: {utterance}"
    for role, utterance in [
        ("user", (
            "What's the sentiment like for burger king "
            "this week versus last week?"
        )),
        ("assistant", (
            "You have projects AAAAAAAAA, Age data collection, "
            "Age panel testing, Aidan, auto_segmentation, "
            "Billiejoe ABSA testing, Billiejoe image analysis, "
            "Billiejoe Test, Brandwatch, Chloe Russell-Sharp, "
            "Cision, Colin, Crisis Management, Dan test, "
            "Demographics test panels, ds_everything_is_fine, "
            "ds_konmari, ds sprint test, Hackathon LLM Classifiers, "
            "Hamish, Healthcare data collection, intern_intro, "
            "Joe, karim_labs_test, karim-test, Katie A, Kevin Gee, "
            "LB practice project, Lei, Lucy, Ludovica Test, Lynda, "
            "Malibu, Matthew, missed gp appointments, oh-verk, "
            "Pablo Funes, Paul, peter_test, Random things, Rina, "
            "Some real client brand queries, Steph, test, test_karim, "
            "Trend Analysis, vinay_test, Wild Gender & Emotion. "
            "Which project are you asking about?"
        )),
        ("user", "matthew"),
    ])

generate_continuation(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
Condense the following chat transcript by shortening and summarizing the content 
without losing important information:\n{chat_transcript}\nCondensed Transcript:
""".strip()
)

"The user asked about Burger King's sentiment this week compared to last, but the assistant provided a list of various ongoing projects instead. The user then asked about a specific project, Matthew."

Code

state = State.make(
    "The user asked about Burger King's sentiment this week compared to last, "
    "but the assistant provided a list of various ongoing projects instead. "
    "The user then asked about a specific project, Matthew."
)

state = run_and_update(
    model=model,
    tokenizer=tokenizer,
    state=state,
    persona="persona-02.txt",
    tools="tools-03.txt",
    examples="examples-03.txt",
)

16:50:43 INFO: Configuration: loaded from /home/matthew/.config/bcr_api/auth.json
16:50:45 DEBUG: https://api.brandwatch.com/projects

You have mentioned various projects, but I see no mention of a project named Matthew. Do you mean one of the other projects?
 exit

There is a problem with this. The model is sure that there is no such project named Matthew yet it exists in the list. While there are a lot of projects on the list the model is clearly capable of handling much longer prompts than this.

I’m going to try adjust the summarization, as this is really a task refinement instead of a conversational summary.

Code

chat_transcript = "\n".join(
    f"{role}: {utterance}"
    for role, utterance in [
        ("task", (
            "What's the sentiment like for burger king "
            "this week versus last week?"
        )),
        ("question", (
            "You have projects AAAAAAAAA, Age data collection, "
            "Age panel testing, Aidan, auto_segmentation, "
            "Billiejoe ABSA testing, Billiejoe image analysis, "
            "Billiejoe Test, Brandwatch, Chloe Russell-Sharp, "
            "Cision, Colin, Crisis Management, Dan test, "
            "Demographics test panels, ds_everything_is_fine, "
            "ds_konmari, ds sprint test, Hackathon LLM Classifiers, "
            "Hamish, Healthcare data collection, intern_intro, "
            "Joe, karim_labs_test, karim-test, Katie A, Kevin Gee, "
            "LB practice project, Lei, Lucy, Ludovica Test, Lynda, "
            "Malibu, Matthew, missed gp appointments, oh-verk, "
            "Pablo Funes, Paul, peter_test, Random things, Rina, "
            "Some real client brand queries, Steph, test, test_karim, "
            "Trend Analysis, vinay_test, Wild Gender & Emotion. "
            "Which project are you asking about?"
        )),
        ("response", "matthew"),
    ])

generate_continuation(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
You are an assistant for a user. The user has requested that you perform a task.
You have discussed the task with the user and they have responded. Take the
following conversation and use it to formulate a new task that includes all relevant
context:
{chat_transcript}
Task:
""".strip()
)

"What's the sentiment like for Burger King this week versus last week, specifically regarding the Matthew project?"

Code

state = State.make(
    "What's the sentiment like for Burger King this week "
    "versus last week, specifically regarding the Matthew project?"
)

state = run_and_update(
    model=model,
    tokenizer=tokenizer,
    state=state,
    persona="persona-02.txt",
    tools="tools-03.txt",
    examples="examples-03.txt",
)

16:51:01 INFO: Configuration: loaded from /home/matthew/.config/bcr_api/auth.json
16:51:03 DEBUG: https://api.brandwatch.com/projects

You have projects AAAAAAAAA, Age data collection, Age panel testing, Aidan, auto_segmentation, Billiejoe ABSA testing, Billiejoe image analysis, Billiejoe Test, Brandwatch, Chloe Russell-Sharp, Cision, Colin, Crisis Management, Dan test, Demographics test panels, ds_everything_is_fine, ds_konmari, ds sprint test, Hackathon LLM Classifiers, Hamish, Healthcare data collection, intern_intro, Joe, karim_labs_test, karim-test, Katie A, Kevin Gee, LB practice project, Lei, Lucy, Ludovica Test, Lynda, Malibu, Matthew, missed gp appointments, oh-verk, Pablo Funes, Paul, peter_test, Random things, Rina, Some real client brand queries, Steph, test, test_karim, Trend Analysis, vinay_test, Wild Gender & Emotion. Which project are you asking about?
 exit

This looks like a strong task summarization to me, yet the model does not use this information to set the project. I think that the examples need to be updated to give a similar situation to this.

I’ve added more examples and expanded the tool descriptions. It still wants to ask the user to set the project.

If I actually remove the examples then it gets closer to the desired behaviour. This is very annoying. I want to be able to come up with something that can work with all the context consistently.

At this point I might have to accept that setting the project and query are not that important. I’ve been using them as a proxy for conversational understanding and continuation. Instead I am going to move on to actually having a conversation about the data.

Simplified Conversational Interface

If I drop all of the examples and tool documentation around projects and queries then how well does the model perform? Can it answer a simple question?

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task="What's the sentiment like for Burger King this week versus last week?",
    persona="persona-03.txt",
    tools="tools-05.txt",
    examples="examples-05.txt",
)

For the task:
What's the sentiment like for Burger King this week versus last week?
With the context:

I would do:
get-chart from="-1 week", to="now", dimension="sentiment", frequency="days"
get-chart from="-2 week", to="-1 week", dimension="sentiment", frequency="days"
answer statement="I have created two charts to compare the sentiment for Burger King in the last two weeks."

This has worked ok. Let’s see if the conversation summarizer allows it to continue on from this.

I’m going to imagine that the negative sentiment for the last week is dramatically higher than the preceding week.

Code

chat_transcript = "\n".join(
    f"{role}: {utterance}"
    for role, utterance in [
        ("task", (
            "What's the sentiment like for burger king "
            "this week versus last week?"
        )),
        ("answer", (
            "I have created two charts to compare the sentiment "
            "for Burger King in the last two weeks."
        )),
        ("response", "Why is sentiment so much more negative?"),
    ])

generate_continuation(
    model=model,
    tokenizer=tokenizer,
    prompt=f"""
You are an assistant for a user. The user has requested that you perform a task.
You have discussed the task with the user and they have responded. Take the
following conversation and use it to formulate a new task that includes all relevant
context. Be as brief as possible:
{chat_transcript}
Task:
""".strip()
)

'Analyze the reason behind the negative sentiment towards Burger King in the last two weeks.'

Code

manual_run(
    model=model,
    tokenizer=tokenizer,
    task=(
        "Analyze the reason behind the negative sentiment "
        "towards Burger King in the last two weeks."
    ),
    persona="persona-03.txt",
    tools="tools-05.txt",
    examples="examples-05.txt",
)

For the task:
Analyze the reason behind the negative sentiment towards Burger King in the last two weeks.
With the context:

I would do:
ask question="What topics are driving the negative sentiment towards Burger King in the last two weeks?"
get-topics from="-2 week", to="now"
answer statement="The top topics driving the negative sentiment towards Burger King in the last two weeks are:"

This is really close to being correct. The model needs to understand that using ask here is incorrect and that the output of get-topics needs to be integrated into an answer.

More broadly I have quite a bit of code involved in this. I think it would be good to try transitioning to a framework. The haystack framework has been recommended to me by a friend so I’m going to try that out in the next post.