Checking out the new composition of deep learning models by Huggingface
Published
May 13, 2023
Huggingface have recently released Huggingface Agents which provide a natural language API on top of different tools. The example code invokes the hosted agents on huggingface itself. I like playing around with this stuff and it would be nice to be able to get it running locally.
Preamble
The example provides several different models that can be used. It also mentions that you need to log in to huggingface. Since that requires your access token I have this SUPER SECRET way of loading my API key without showing it in this blog.
I’ve created a file at ~/.config/huggingface/auth.json which has the access token in it!
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid.
Your token has been saved to /home/matthew/.cache/huggingface/token
Login successful
You can see that the token has been saved to a local file. I had logged in earlier so the token was already cached. This meant that the agent requests were already working. I like these blog posts to be complete though so I covered this for completeness.
The other affordance is turning off the logging messages. CLIP is quite noisy about being updated and so a lot of messages would otherwise be shown.
Let’s try out their sample query with one of the possible models. They specifically call out quality issues with these models:
StarCoder and OpenAssistant are free to use and perform admirably well on simple tasks. However, the checkpoints don’t hold up when handling more complex prompts. If you’re facing such an issue, we recommend trying out the OpenAI model which, while sadly not open-source, performs better at this given time.
I am not going to use OpenAI. This is an example of using the OpenAssistant model:
from transformers import HfAgentagent = HfAgent( url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")agent.run("Draw me a picture of rivers and lakes.")
ValueError: not enough values to unpack (expected 2, got 1)
The model has apparently not produced output of the correct format. Let’s see if we can use the debugger to see what was actually produced.
%pdb
Automatic pdb calling has been turned ON
from transformers import HfAgentagent = HfAgent( url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")agent.run("Draw me a picture of rivers and lakes.")
168 def clean_code_for_run(result):
169 result = f"I will use the following {result}"
--> 170 explanation, code = result.split("Answer:")
171 explanation = explanation.strip()
172 code = code.strip()
ipdb> result
'I will use the following tools:
`image_segmenter` to segment the image,
`image_qa` to answer the question about the image,
`document_captioner` to generate a caption,
`image_qa` to answer the question about the image,
`image_transform` to transform the image,
`image_segmenter` to segment the image,
`_qa` to answer,
`text_reader` to read the text,
`summarizer` to summarize the text,
`document_qa` to answer the question about the question about the image,
`image_generator` to generate an image according to caption,
`image_qa` to answer,
`text_to_downloader` to download the image,
`imageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimageimage`
ipdb> c
You can see that the model wants to use multiple unrelated tools and then endlessly repeats the word image. This is broken. We can try the starcoder (Li et al. 2023) alternative for this query.
Li, Raymond, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, et al. 2023. “StarCoder: May the Source Be with You!”https://arxiv.org/abs/2305.06161.
from transformers import HfAgentagent = HfAgent( url_endpoint="https://api-inference.huggingface.co/models/bigcode/starcoder")agent.run("Draw me a picture of rivers and lakes.")
==Explanation from the agent==
I will use the following tool: `image_segmenter` to create a segmentation mask of rivers and lakes.
==Code generated by the agent==
mask = image_segmenter(image, label="rivers and lakes")
==Result==
Evaluation of the code stopped at line 0 before the end because of the following error:
The variable `image` is not defined.
Now it has produced output that is the correct shape but the choice made by the model relies on a variable which does not yet exist. Since this is able to produce the correct style of output I feel like it’s worth trying again. We can try one of the other prompts from the examples:
image = agent.run("Draw me a picture of the sea then transform the picture to add an island")image.save("island.jpg")
==Explanation from the agent==
I will use the following tools: `image_generator` to generate an image, then `image_transformer` to transform the image.
==Code generated by the agent==
image = image_generator(prompt="draw me a picture of the sea")
image = image_transformer(image=image, prompt="add an island")
==Result==
The quality of this is quite varied, which is true for most generated art. This version is ok. I like how the alteration is being done by a hand with a pencil, meta!
More broadly we can see how this has worked (or at least, how it reported it works). The model that we interact with on huggingface is prompted to produce a selection of tools and code that uses them. This is then executed and the result is returned.
The next thing is to find the prompt that was provided to the model. We can get that if we just look at the agent:
Code
agent??
Type: HfAgent
String form: <transformers.tools.agents.HfAgent object at 0x7f1d3c556b90>
File: ~/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/transformers/tools/agents.py
Source:class HfAgent(Agent):""" Agent that uses and inference endpoint to generate code. Args: url_endpoint (`str`): The name of the url endpoint to use. token (`str`, *optional*): The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when running `huggingface-cli login` (stored in `~/.huggingface`). chat_prompt_template (`str`, *optional*): Pass along your own prompt if you want to override the default template for the `chat` method. run_prompt_template (`str`, *optional*): Pass along your own prompt if you want to override the default template for the `run` method. additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*): Any additional tools to include on top of the default ones. If you pass along a tool with the same name as one of the default tools, that default tool will be overridden. Example: ```py from transformers import HfAgent agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder") agent.run("Is the following `text` (in Spanish) positive or negative?", text="¡Este es un API muy agradable!") ``` """def __init__(
self, url_endpoint, token=None, chat_prompt_template=None, run_prompt_template=None, additional_tools=None):
self.url_endpoint = url_endpoint
if token isNone:
self.token =f"Bearer {HfFolder().get_token()}"elif token.startswith("Bearer")or token.startswith("Basic"):
self.token = token
else:
self.token =f"Bearer {token}"
super().__init__(
chat_prompt_template=chat_prompt_template,
run_prompt_template=run_prompt_template,
additional_tools=additional_tools,)def generate_one(self, prompt, stop):
headers ={"Authorization": self.token}
inputs ={"inputs": prompt,"parameters":{"max_new_tokens":200,"return_full_text":False,"stop": stop},}
response = requests.post(self.url_endpoint, json=inputs, headers=headers)if response.status_code ==429:
print("Getting rate-limited, waiting a tiny bit before trying again.")
time.sleep(1)return self._generate_one(prompt)elif response.status_code !=200:raise ValueError(f"Error {response.status_code}: {response.json()}")
result = response.json()[0]["generated_text"]# Inference API returns the stop sequencefor stop_seq in stop:if result.endswith(stop_seq):
result = result[:-len(stop_seq)]return result
This entire agent is just one method, and that takes a prompt argument. It looks like we could patch that method to get the prompt.
Code
import textwrapfrom pathlib import Pathprovided_prompt =Noneprovided_stop =Nonedef patched_generate_one(prompt, stop):global provided_prompt, provided_stop provided_prompt = prompt provided_stop = stopraiseException()agent.generate_one = patched_generate_onetry: agent.run("Draw me a picture of the sea then transform the picture to add an island" )except:# we just threw to stop generationpassPath("prompt.txt").write_text("\n".join( linefor paragraph in provided_prompt.split("\n")for line in textwrap.wrap(paragraph) )) ;None
The prompt is quite long and you can read it here.
It is very clear and split into three sections. The first describes the overall task that the model has to perform (identify tools then use them in python code). A complete list of tools is then provided. Finally there are some examples of correct inputs and outputs.
What’s interesting to me is that this is a continuation style prompt, where the natural continuation of the text is used. This is in contrast to the instruction finetuned models where you can direct them to peform a task without including the start of the task as a primer.
When I look at the model I can see that it explicitly calls out this trait:
The model was trained on GitHub code. As such it is not an instruction model and commands like “Write a function that computes the square root.” do not work well. However, by using the Tech Assistant prompt you can turn it into a capable technical assistant.
The Tech Assistant prompt mentioned is a dataset used to refine the model. It has labelled inputs that form a dialog.
What is more interesting to me is that this is a 15.5B parameter model. That’s large and would be infeasible to run in full precision mode on my machine. I might be able to quantize it to get it down to 15.5G though.
As an aside I do want to call out the license agreement that you have to adhere to. I think it’s excellent, and while it is worth reading in it’s entirety this final clause stood out:
[You may not use this model] For fully automated decision making in administration of justice, law enforcement, immigration or asylum processes.
Anyway before downloading this model it might be worth looking at both starcoderbase and santacoder (santacoder is trained only on python).
from transformers import HfAgentagent = HfAgent( url_endpoint="https://api-inference.huggingface.co/models/bigcode/santacoder")agent.run("Draw me a picture of rivers and lakes.")
ValueError: Error 422: {'error': 'Input validation error: `inputs` tokens + `max_new_tokens` must be <= 1512. Given: 1549 `inputs` tokens and 200 `max_new_tokens`', 'error_type': 'validation'}
The large prompt we saw before is our downfall here. Unfortunately it’s just too big for this model. The starcoder model accepts up to 8k tokens, which is a lot, while this one is only 1512.
One way to reduce this would be to cut some of the tools that it could use. This can work because the tools are held in a dictionary on the agent, and their description is interpolated into the prompt. If I delete some of them then it should make enough space to run the model.
from transformers import HfAgentagent = HfAgent( url_endpoint="https://api-inference.huggingface.co/models/bigcode/santacoder")for tool in ["document_qa","image_captioner","image_qa","transcriber","summarizer","text_classifier","text_qa","text_reader","translator","text_downloader",]:del agent._toolbox[tool]agent.run("Draw me a picture of rivers and lakes.")
==Explanation from the agent==
I will use the following tools: `image_drawer` to draw the image.
==Code generated by the agent==
image = image_drawer(image)
==Result==
Evaluation of the code stopped at line 0 before the end because of the following error:
It is not permitted to evaluate other functions than the provided tools (tried to execute image_drawer).
To get this to execute I had to delete 10 of the 14 available tools because there is a second restriction that the input length not exceed 1024 tokens. Deleting these tools means that the model doesn’t fully understand the task that it is being asked to perform. The examples of correct behaviour reference these unknown tools and this may be why the agent made up the function to call.
from transformers import HfAgentagent = HfAgent( url_endpoint="https://api-inference.huggingface.co/models/bigcode/starcoderbase")image = agent.run("Draw me a picture of rivers and lakes.")image.save("rivers-and-lakes.jpg")
==Explanation from the agent==
I will use the following tool: `image_generator` to generate an image.
==Code generated by the agent==
image = image_generator(prompt="rivers and lakes")
==Result==
This is a nice image. It seems that the star coder base model was able to correctly interpret the first request. Looking at the model page I am not sure how this model differs from the full starcoder model - it’s still 15.5B parameters, and looking at the files suggests that is not a typo.
Local Agent
If I want to run this agent along with all of the tools that it might invoke then I have a problem. I can try to load the starcoder model in int8 precision as that should work on the CPU and have a large enough input window to
Code
from transformers import Agent, AutoModelForCausalLM, AutoTokenizerimport torchfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigclass LocalAgent(Agent):def__init__(self, model_name: str, chat_prompt_template=None, run_prompt_template=None, additional_tools=None, device_map: str="auto", load_in_8bit: bool=True, llm_int8_enable_fp32_cpu_offload: bool=True, torch_dtype: type= torch.float16, max_new_tokens: int=200,**generation_kwargs ):super().__init__( chat_prompt_template=chat_prompt_template, run_prompt_template=run_prompt_template, additional_tools=additional_tools, ) quantization_config = BitsAndBytesConfig( load_in_8bit=load_in_8bit, llm_int8_enable_fp32_cpu_offload=llm_int8_enable_fp32_cpu_offload, )self.model = AutoModelForCausalLM.from_pretrained( model_name, device_map=device_map, quantization_config=quantization_config, torch_dtype=torch_dtype, resume_download=True, )self.tokenizer = AutoTokenizer.from_pretrained(model_name)self.generation_kwargs = {"max_new_tokens": max_new_tokens, } | generation_kwargs@torch.inference_mode()def generate_one(self, prompt, stop): inputs =self.tokenizer(prompt, return_tensors="pt") inputs.to(self.model.device) output =self.model.generate(**inputs,**self.generation_kwargs, ) generated_tokens = output[0, inputs.input_ids.shape[1]:] result =self.tokenizer.decode(generated_tokens)# the model will generate further "tasks" following the conclusion of this one# the stop list contains 'Task:' and we can use that to remove the subsequent tasks# ideally we would adjust generation to stop after the first stop_seq was issuedfor stop_seq in stop: result = result.split(stop_seq)[0]return result
agent = LocalAgent(model_name="bigcode/starcoder")image = agent.run("Draw me a picture of an astronaut")image.save("local-astronaut.jpg")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
==Explanation from the agent==
I will use the following tool: `image_generator` to generate an image.
==Code generated by the agent==
image = image_generator(prompt="astronaut")
==Result==
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
This works and it’s a nice point to wrap up. Loading the local model takes a few minutes and I have to be very careful about the available GPU memory. Quantizing and saving to disk would be a good way to improve it. I can see why they created this using a remote API though.