Trying out the smolagents framework for this course
Published
February 17, 2025
Huggingface have released a course on Agents, it’s available here. I’ve been working on agents for a task at work and I evaluated smolagents shortly after it was released. Since this course directly uses it in the first section I thought it would be fun to follow along the tutorial and recreate it locally.
I’m not going to go over the structure of agents or how they work directly. Instead this will focus on the tool definitions and the creation and use of an agent.
Tools
The first thing they start with is the format for the tools. Tools are invocable python functions, and they are presented to the agent by using meta programming to extract the name, arguments, return type and description of the tool. These function details are then composed into a description of the tool. When the agent wants to use the tool it responds with a specially formatted message which will invoke the tool. Once the tool has been invoked the return value is presented to the agent and it can then proceed.
The following code defines a dummy tool and a timezone conversion tool:
Code
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel, load_tool, toolimport datetimeimport requestsimport pytzimport yamlfrom tools.final_answer import FinalAnswerTool@tooldef my_custom_tool(arg1:str, arg2:int)->str: # it's important to specify the return type# Keep this format for the tool description / args description but feel free to modify the tool"""A tool that does nothing yet Args: arg1: the first argument arg2: the second argument """return"What magic will you build ?"@tooldef get_current_time_in_timezone(timezone: str) ->str:"""A tool that fetches the current local time in a specified timezone. Args: timezone: A string representing a valid timezone (e.g., 'America/New_York'). """try:# Create timezone object tz = pytz.timezone(timezone)# Get current time in that timezone local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")returnf"The current local time in {timezone} is: {local_time}"exceptExceptionas e:returnf"Error fetching time for timezone '{timezone}': {str(e)}"
We can define our own tools easily enough. Since there is a timezone tool it would be nice if the agent could create a picture of a location that has the correct weather and time of day. This would then involve three tools:
timezone conversion
weather api
image generation
Let’s see if this can be done. The timezone conversion from the definition above can be used already.
Weather Tool
There is the Open-Meteo weather api which is free and open source. This takes the latitude and longitude, which means we now need a way to resolve a location into coordinates. Nominatim is a free and open source geocoding api which seems quite simple to use.
Let’s start with the geocoding then move on to weather. To make it easier for the agent I am going to combine these into a single tool. (This might be the wrong thing to do, as there often are multiple places that have the same name, and the agent could disambiguate between them).
from smolagents import toolimport httpx@tooldef get_current_weather_in_location(location: str) ->str:"""A tool that fetches the current weather for a specified location. You should use this to make your picture of that location more accurate by including the current weather. Args: location: A string containing a valid place name (e.g., 'London, UK'). """try: coordinates = geocode_location(location)exceptExceptionas e:returnf"Error geocoding location '{location}': {str(e)}"try: weather = weather_at_coordinates( latitude=coordinates.latitude, longitude=coordinates.longitude, )exceptExceptionas e:returnf"Error fetching weather for location '{location}': {str(e)}" facts = ["it is daytime"if weather.is_day else"it is night",f"humidity of {weather.humidity}",f"temperature of {weather.apparent_temperature}",f"precipitation of {weather.precipitation}",f"snowfall of {weather.snowfall}",f"cloud cover of {weather.cloud_cover}",f"sustained wind speed of {weather.wind_speed}",f"wind gusts up to {weather.wind_gusts}", ] facts_str =", ".join(facts)returnf"Weather for {coordinates.name}: {facts_str}"
get_current_weather_in_location("Toronto")
'Weather for Toronto, Golden Horseshoe, Ontario, Canada: it is daytime, humidity of 63 %, temperature of -15.3 °C, precipitation of 0.0 mm, snowfall of 0.0 cm, cloud cover of 100 %, sustained wind speed of 24.6 km/h, wind gusts up to 50.8 km/h'
It’s looking pretty rough in Toronto right now. This is great though! Since it includes day/night I don’t even need the timezone converter.
Image Generation Tool
The next thing will be to generate an image of the location based on the current weather. Luckily the image generation tool was provided on the course page, so I am just going to use that for now. At some point it would be good to hook it up to a local model.
from smolagents import load_toolimage_generation_tool = load_tool("agents-course/text-to-image", trust_remote_code=True)
Final Answer Tool
This is what generates the response to the user. I can just copy this from the space for now.
from typing import Any, Optionalfrom smolagents.tools import Toolclass FinalAnswerTool(Tool): name ="final_answer" description ="Provides a final answer to the given problem." inputs = {'answer': {'type': 'any', 'description': 'The final answer to the problem'}} output_type ="any"def forward(self, answer: Any) -> Any:return answerdef__init__(self, *args, **kwargs):self.is_initialized =False
Agent
The agent is then a model with a prompt. The Qwen/Qwen2.5-Coder-32B-Instruct model is chunky and available via the huggingface api without a token. That should be good enough for our needs. Then we can just add in the tools that we wrote to create the agent.
agent.run("Can you draw me a picture of Toronto please?")
╭──────────────────────────────────────────────────── New run ────────────────────────────────────────────────────╮│││Can you draw me a picture of Toronto please?│││╰─ HfApiModel - Qwen/Qwen2.5-Coder-32B-Instruct ──────────────────────────────────────────────────────────────────╯
─ Executing parsed code: ────────────────────────────────────────────────────────────────────────────────────────
weather = get_current_weather_in_location(location="Toronto, Canada")print(f"The current weather in Toronto is: {weather}")
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Execution logs:
The current weather in Toronto is: Weather for Toronto, Golden Horseshoe, Ontario, Canada: it is daytime, humidity
of 62 %, temperature of -15.7 °C, precipitation of 0.0 mm, snowfall of 0.0 cm, cloud cover of 100 %, sustained wind
speed of 24.6 km/h, wind gusts up to 51.1 km/h
Out: None
─ Executing parsed code: ────────────────────────────────────────────────────────────────────────────────────────
prompt =f"A high-res, photorealistic image of Toronto, Canada, with the current weather conditions: it is daytime, humidity of 62 %, temperature of -15. 7 °C, precipitation of 0. 0 mm, snowfall of 0. 0 cm, cloud coverof 100 %, sustained wind speed of 24. 6 km/h, wind gusts up to 51. 1 km/h."image = image_generator(prompt=prompt)final_answer(image)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Out - Final answer: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1024x1024 at 0x7180DEFC5E80>
The agent certainly needs to work on summarizing the weather conditions better, however this image is much better with the inclusion of the weather. There is actually a live camera feed from the cn tower so we can even compare this to the real thing (different angle obviously):
snowy toronto
It is snowy and clear so the generated image matches the current weather. Wonderful!
Looking at this more deeply, it would be good if the prompt for the image was improved. The current prompt is:
A high-res, photorealistic image of Toronto, Canada, with the current weather conditions: it is daytime, humidity of 62 %, temperature of -15. 7 °C, precipitation of 0. 0 mm, snowfall of 0. 0 cm, cloud cover of 100 %, sustained wind speed of 24. 6 km/h, wind gusts up to 51. 1 km/h.
If this was rephrased to something like: an extremely cold and windy overcast day then it’s likely that the grey box with an approximation of the weather in it would not get added to the picture. This could be achieved with another tool which just invokes the LLM to summarize the weather. I added this when I made this agent into a huggingface space.