OpenAI function calling

Recently, I stumbled upon Microsoft's Semantic Kernel and was impressed by the plugin option that
leverages function calling, a native feature of most of the latest LLMs, that allows them to invoke (personal) APIs.

I found Microsoft's example plugin (here) particularly interesting.

With "function calling", you can basically tell the LLM, that it can get additional information by calling (local/custom) functions
that you can define in your chat-request.

The LLM then decides whether it needs to call one of those functions to respond to your request.
Like described in Microsoft's Semantic Kernel plugin example, we can therefore ask the AI to "turn on the lamp" in a home assistant setting.

The LLM would check whether it has functions (tools) to accomplish these tasks and decides which function it needs to call to do so.
It will then respond with a tools_call where we can reply with the return value of the requested function.

After that, the LLM creates a response with the received result.

I wanted to try this out locally, since I do not have an OpenAI subscription and decided to try out Ollama to host the LLM locally.
I used the following docker-compose.yml to achieve this:

version: '3.8'

services:
  ollama-backend:
    image: ollama/ollama:0.3.4
    ports:
      - "11434:11434"
    volumes:
      - /ollama:/root/.ollama
    restart: always
    container_name: ollama-backend

  ollama-openweb-ui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_API_BASE_URL=http://ollama-backend:11434/api
    depends_on:
      - ollama-backend
    restart: always
    container_name: ollama-openweb-ui

... and decided to use the llama3.1 model, since it supports function calling according to the documentation.

I executed the following command to download the model via the docker container:
docker exec -it ollama-backend ollama run llama3.1

You could use other models too. Ollama offers a search option for finding models with "tools" support.

Unfortunately, Semantic Kernel does not support function calling through Ollama yet (Discussion, Issue #7442).

We can still try function calling with simple curl-requests, like described below, to get an understanding of how this works:

Step 1:
Ask the LLM a question and provide the function it can call to gather the required information.
In this case, we want to know the current weather and provide the get_current_weather function definition.

Note: If you use a local Ollama instance, you can remove the "Authorization"-header.

Request:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "llama3.1",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "How is the current weather in Graz?"
      }
    ],
    "parallel_tool_calls": false,
    "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
                "type": "string",
                "description": "The city and country, eg. San Francisco, USA"
            },
            "format": { "type": "string", "enum": ["celsius", "fahrenheit"] }
          },
          "required": ["location", "format"]
        }
      }
    }
   ]
}'

Response:
Llama3.1 responds by requesting the result of the get_current_weather function.
The interesting part here is, that it actually called the function with the location-parameter "Graz, Austria":

{
  "id": "chatcmpl-202",
  "object": "chat.completion",
  "created": 1723311089,
  "model": "llama3.1",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_fu6au6uj",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"format\":\"fahrenheit\",\"location\":\"Graz, Austria\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 184,
    "completion_tokens": 28,
    "total_tokens": 212
  }
}

Step 2:
We can now call our local function get_current_weather and return the result (Graz 30 °C) to the LLM by responding like this:

Request:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "llama3.1",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "How is the current weather?"
        },
        {
            "role": "assistant",
            "content": "",
            "tool_calls": [
                {
                    "id": "call_fu6au6uj",
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "arguments": "{\"format\":\"fahrenheit\",\"location\":\"Graz, Austria\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "content": "{\"city\":\"Graz\",\"weather\":\"30°C\"}",
            "tool_call_id": "call_fu6au6uj"
        }
    ],
    "parallel_tool_calls": false,
    "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
                "type": "string",
                "description": "The city and country, eg. San Francisco, USA"
            },
            "format": { "type": "string", "enum": ["celsius", "fahrenheit"] }
          },
          "required": ["location", "format"]
        }
      }
    }
   ]
}'

Response:
The LLM now returns the answer with the result of the get_current_weather function:

{
    "id": "chatcmpl-371",
    "object": "chat.completion",
    "created": 1723315198,
    "model": "llama3.1",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The current temperature in Graz is 30°C (86°F), which assumes it's clear and sunny today."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 81,
        "completion_tokens": 23,
        "total_tokens": 104
    }
}

As you can see, the LLM included the response of the get_current_weather-function in its generated response:
The current temperature in Graz is 30°C (86°F), which assumes it's clear and sunny today.

Looks good, right? You can also try this yourself by only executing the request in step 2, since it contains the whole chat history.

I hope you got an idea of how the "function calling" API is working and found this post useful.

Thanks for reading!

Useful resources:

by Philipp Meier, Aug 2024