Google
 

Saturday, September 7, 2024

Unlocking the Power of LLMs Locally with Docker Compose

Although the expectations that came with the advent of Large Language Models (LLMs) could be largely exaggerated, they still have proven to be useful in many scenarios and for a very wide audience. Probably, this is the closest non-technical users have come to interact with AI since its inception. ChatGPT has set the record for the fastest-growing user base.

Being an online tool, it comes with some concerns over privacy, customizability, and the constant need for an internet connection.
However ChatGPT is not the only player in this game. Many models are available online and for offline download. The latter is the focus of this post.

Recently, Meta release its latest model Llama 3.1. And I wanted to give it a try on my laptop, after positive feedback about it.

So, the objective I had was:

  • Find and easy way to setup the tooling required to download LLMs and get responses to my prompts locally.
  • Have a nice user interface that I can use to give prompts, save prompt history, and customize my environment.

 Two tools play very well together:

Ollama: https://ollama.com/
Think of Ollama as the npm or pip for language models. It enables you to download models, execute prompts from the CLI, list models and so on. It also provides APIs that can be called from other applications. These APIs are compatible with OpenAI Chat Completions API.

Open WebUI: https://openwebui.com/
Self-hosted web interface, very similar to what you get from OpenAI's ChatGPT web interface. Open WebUI can interface with Ollama. Think about it as a front end for the backend provided by Ollama.

 

Since I prefer to use Docker whenever possible to experiment with new tools, I opted to use it instead of installing any tools locally. Especially that I'm almost a complete beginner to this space.

Open WebUI provides docker images that include both Open WebUI and Ollama in the same image! Which makes setting up the whole stack locally super easy.

The documentation provides this example command to run the container:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
This command does the following:
  1. Starts the container, from the image ghcr.io/open-webui/open-webui:ollama. The --restart always parameter restarts the container automatically if it fails or was stopped.
  2. Maps the local port 3000 to Open WebUI port 8080.
  3. Maps docker volumes to paths within the container. This persists data even if the container is deleted. This is important to persist history and downloaded models, which are big.

I prefer to use docker compose, additionally I wanted to have the easy visibility on the data created by Open WebUI and the models downloaded by Ollama, so I chose to bind folders on my local machine instead of using volumes. Here is how the docker-compose.yml looks like:

services:
  OpenWebUI:
    image: ghcr.io/open-webui/open-webui:ollama
    container_name: open-webui
    environment:
      - WEBUI_AUTH=False
    volumes:
      - C:\open-webui:/app/backend/data
      - C:\ollama:/root/.ollama
    ports:
      - 3000:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Starting the stack is easy:

docker compose up -d

It may take a short while before it's ready, If you check the container logs (I use Docker Desktop on windows) you should see something similar to:

 

Then you can open your browser on http://localhost:3000/ and start playing.

Downloading models

Remember that this docker image does not include any models yet so probably the first step is to click the plus icon beside the "Select a model" label and write a model name. You'll find a list of model names in https://ollama.com/library. Click a model name, and choose the model size and the model name will be shown. So to download the 8b (8 billion parameters version) of llama3.1, write llama3.1:8b in the Open WebUI interface.

As shown in the screenshot below, I downloaded llama3.1:8b, gemma2:2b (Google's lightweight model). Note that the larger the number of parameters, the higher the specs your computer needs to have.

Testing with some prompts

After downloading models, you can try some prompts:


Note that you can interact with ollama CLI directly. Either use docker compose exec OpenWebUI bash or use docker desktop:


Now if you want to stop the container, run docker compose down

A note on GPUs

Ollama can run models using GPU, or CPU only. As you see in the docker compose file, I'm specifying that the container can use all the available GPUs on my machine. If you're using Docker desktop with WSL support, ensure that you have the latest WSL and Nvidia drivers installed. You can use this command to test that docker GPU access is working fine:

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark


Closing notes:

It's very exciting to have an LLM running locally. It opens a lot of customization possibilities and keeps your data private.
The machine I tried this experiment on is relatively old, so gemma2:2b was much faster than llama3.1:7b and still performed very well.
Looking forward to experiment more!

Note: The title of this post was recommended by gemma2 :)


Saturday, June 8, 2024

The simplest data pipeline ever

As software engineers what we care most about is getting stuff done. The simplest approach is probably the best, as long as it doesn't cause long term issues.

I was working on a project where the creation and initialization of a few PostgreSQL databases required data migration and transformation from another set of databases. The first step in this process involved loading a few hundreds of millions of records from the source databases into the destination databases.
I wanted to implement this in a way that is fully automated, and at the same time, I wanted to simplify this process as much as possible. So, no fancy tools. No cloud infrastructure. No nothing. Is that even possible?

The approach we followed was simply using Linux pipes! Linux pipes can transfer data from one process to another. So how could this help with this data import process ?

Postgres provides useful command line utilities that can be used to export and import data. For example this command exports data from a table called mydata to the standard output:

psql -h source -U postgres -d test -c "\copy mydata TO STDOUT"

On the other hand you can import data from standard input to a database table using this command:

psql -h destination -U postgres -d test -c "\copy mydata FROM STDIN"

With the power of Linux pipelines it's possible to stitch these commands together so that the output of the first command feeds into the input of the next and data flows from one database to another.

psql -h source -U postgres -d test -c "\copy mydata TO STDOUT" | psql -h destination -U postgres -d test -c "\copy mydata FROM STDIN"

Pretty simple. Isn't it? However since this operation may take a few hours, wouldn't it be nice to have some sort of progress indication ?
This is where another handy Linux utility comes to play: The pv utility. From the man page:

pv - monitor the progress of data through a pipe

By default this tool shows the number of bytes flowing through the pipe, however in the process of data import, the number of imported records is a better representative of the progress of the operation. The good thing is that pv has switches that enable counting the number of lines instead of bytes. So the final solution would look like:

psql -h source -U postgres -d test -c "\copy mydata TO STDOUT" | \ 
pv --line-mode --size 100000000 | \ 
psql -h destination -U postgres -d test -c "\copy mydata FROM STDIN"

Note that the --size parameter assumes knowledge of the total number of records, which can be retrieved using a simple select count(*), or just omitted.

When I run this in terminal, the progress looks like:


Transferring 100m records from one database to another both running as containers on my local machine took about 8 minutes. In case of transferring data over the network, it's expected to be slower.


Needless to say that in the real life implementation other steps were required like retrieving the credentials for the source and destination databases and hooking the scripts into a CI pipeline.

Surely, this isn't the most efficient way to transfer a lot of data. Note that the data is transferred as text which is far less efficient than the binary transfer that proper data import tools would use. As with any decision we make as software engineers it's all about tradeoffs. My priorities were clear: We need the simplest possible repeatable solution.

If you're interested in trying this on your machine, this is how I prepared the above screenshots:
Let's start with a docker compose file which instantiates 3 containers:

  1. A source database
  2. A destination database
  3. And a client where there data import / export process is executed.
version: "3.8"

networks:
  db-network:
    driver: bridge

services:
  source:
    image: postgres:16.1
    environment:
      - POSTGRES_PASSWORD=MYPASS123
    volumes:
      - type: volume
        source: source-data
        target: /var/lib/postgresql/data
    ports:
      - 9432:5432


  destination:
    image: postgres:16.1
    environment:
      - POSTGRES_PASSWORD=MYPASS123
    volumes:
      - type: volume
        source: destination-data
        target: /var/lib/postgresql/data
    ports:
      - 9433:5432

  client:
    container_name: postgres_client
    build: .
    entrypoint: [ "sleep", "infinity" ]


volumes:
  source-data:
    external: true
    name: source-data

  destination-data:
    external: true
    name: destination-data

 

Note that for the client container a Dockerfile is used and that is to ensure that the required utilities for this process -in particular pv- are installed. Additionally to copy the .pgpass file which contains the database passwords. 

FROM postgres:16.1

RUN apt-get update

RUN apt-get install pv

COPY pgpass /root/.pgpass

RUN chmod 0600 /root/.pgpass

source:5432:test:postgres:MYPASS123
destination:5432:test:postgres:MYPASS123

 

Then start this docker compose stack using:

docker compose up -d --build

Connect to the databases using your favorite tool (mine is Azure data studio) and create the test table by executing this query:

CREATE TABLE public.mydata (
	id int NOT NULL,
    firstname varchar NULL,
	lastname varchar NULL,
	email varchar NULL,
	CONSTRAINT mydata_pk PRIMARY KEY (id)
);

The next step would be to populate some test data into the source database:

INSERT INTO public.mydata
(firstname, lastname, email, id)
select concat('firstname', counter), concat('lastname', counter), concat('firstname', counter, '.', 'lastname', counter, '@email.com'), counter
	from pg_catalog.generate_series(1, 100000000) as counter

Then connect to the client container using:

docker compose exec client bash

And execute the script to start the data migration:

psql -h source -U postgres -d test -c "\copy mydata TO STDOUT" | \ 
pv --line-mode --size 100000000 | \ 
psql -h destination -U postgres -d test -c "\copy mydata FROM STDIN"

 

I hope this helps.

Friday, February 16, 2024

Changing log level for .net apps on the fly

Logging is very important to understand the behavior of an application. Logs can be used to analyze application behavior over an extended time period to understand trends or anomalies, but they're also critical to diagnose issues in production environments when the application is not behaving as expected.

How much logs an application should emit is a matter of tradeoffs. Writing too much logs may negatively impact application performance and increase data transfer and storage costs without adding value. Too few logs makes it very difficult to troubleshoot issues. This is why most logging frameworks allow configuring log levels so that the application developers can add as much logging as needed, but only logs with a specific level or below will actually be written to the destination.

The challenge is that you don't need all the logs all the time. You certainly can redeploy or reconfigure the application and restart it to change the log level, but this would be a bit disruptive. The good thig is that .net configuration system allows updating configuration values on the fly. Consider this simple web API:


var builder = WebApplication.CreateBuilder(args);

builder.Logging.AddConsole();

var app = builder.Build();

app.MapGet("/numbers", () =>
{
    app.Logger.LogDebug("Debug");
    app.Logger.LogInformation("Info");
    app.Logger.LogWarning("Warning");
    app.Logger.LogError("Error");

    return Enumerable.Range(0, 10);
});

app.Run();

With logging configuration file:

{
  "Logging": {
    "LogLevel": {
      "Default": "Error",
      "Microsoft.AspNetCore": "Warning"
    }
  }
}
When the /numbers endpoint is called, these logs are written to the console:
fail: ConfigReload[0]
      Error

This is clearly because the configured default log level is "Error". You can add a simple endpoint that changes the log level on the fly, like this:


app.MapGet("/config", (string level) => 
{
    if (app.Services.GetRequiredService<IConfiguration>() is not IConfigurationRoot configRoot)
        return;

    configRoot["Logging:LogLevel:Default"] = level;
    configRoot.Reload();
});

When you issue the GET request /config?level=Information Then invoke the /numbers endpoint again, the log output will look like:

info: ConfigReload[0]
      Info
warn: ConfigReload[0]
      Warning
fail: ConfigReload[0]
      Error

Similarly, to configure the log level to Debug, invoke /config?level=Debug. Very simple.

There are a few gotchas to consider:

  1. This the /config endpoint should be secured, only a privileged user should be able to invoke it as it changes the application behavior. I've intentionally ignored this in my example for simplicity.
  2. In case there are many instances serving the same API the /config invocation will be directed by the load balancer to only one instance of your application which most probably won't be sufficient. In this case you will need another approach to communicate with your application that the log level should be modified. One approach could be a pub-sub system that allows multiple consumers. This may be a subject of another blog post.

Another common approach for reconfiguring.net applications on the fly is by using a configuration source that refreshes automatically every specific time interval or based on config file change detection.
However the time based approach means that you have to wait until a certain time elapses for the application to reconfigure itself which may not be desirable as you want to change the log level as quickly as possible. A file change detection approach is not great for immutable deployments like container based applications or serverless functions.

Logging and monitoring are quality attributes that should be taken into consideration during the application design. In case you're not using a more advanced observability tooling that allow profiling for example then the technique proposed in this blog post may be of help.

Friday, January 12, 2024

Assertions of Equality and Equivalence

I remember that I encountered an interesting bug that was not detected by unit tests because the behaviour of the test framework did not match my expectations.
The test was supposed to verify that the contents of an array (or a list) returned by the code under test match an expected array of elements in the specific order of that expected array. The unit test was passing, however, later the team discovered a bug, and the root cause was that the array was not in the correct order! This is exactly why we write automated tests, but the test failed us.

The test, which uses FluentAssertions library basically looked like:

[Test]
public void FluentAssertions_Unordered_Pass()
{
	var actual = new List<int>  {1, 2, 3}; // SUT invocation here
	var expected = new [] {3, 2, 1};

	actual.Should().BeEquivalentTo(expected);
}
Although the order of the elements of the actual array don't match the expected, the test passes. This is not a bug in FluentAssertions. It's by design, and the solution is simple:
actual.Should().BeEquivalentTo(expected, config => config.WithStrictOrdering());

 

The config parameter enforces a specific order of the collection. It's also possible to configure this globally, when initializing the test assembly for example:

AssertionOptions.AssertEquivalencyUsing(config => config.WithStrictOrdering());

 

The default behavior of this method annoyed me. In my opinion, the test method should be strict by default. That is, it should assume that the collection should be sorted, and can be made more lenient by overriding this behavior. Not the opposite.

Probably I got into the habit of using BeEquivalentTo(), while an Equal() assertion exists, which "Expects the current collection to contain all the same elements in the same order" as it's default behavior. There are other differences between BeEquivalentTo() and Equal() that don't matter in this context. 

Similar behavior applies to Nunit assertions, although there is no way to override the equivalence behavior:

[Test]
public void NUnit_Unordered_Pass()
{
	var actual = new [] {1, 2, 3};
	var expected = List<int>  {3, 2, 1};

	Assert.That(actual, Is.EquivalentTo(expected)); // pass
	CollectionAssert.AreEquivalent(expected, actual); // pass
}
[Test]
public void NUnit_Unordered_Fail()
{
	var actual = new [] {1, 2, 3};
	var expected = new List<int> {3, 2, 1};

	Assert.That(actual, Is.EqualTo(expected)); // fail
	CollectionAssert.AreEqual(expected, actual); // fail
}

 

It's important to understand the behavior of the testing library to avoid similar mistakes. We rely on tests as our safetly net, and they better be reliable!