Table of Contents

1. Introduction

As I've mentioned in my previous posts I've been working through both theory and practice on the deeplearning and llm front. I really am quite excited (belatedly) by this technology, and want to see how I can use it better and also build on it.

As part of this I wanted to experiment with more than the readily available commercial llm based chatbots. I've already been working with Bing Copilot a lot and its been great. I've also used Google's Bard for the last couple of days and it's ok but I'm not able to integrate it well into my daily tasks.

So to explore further I decided to install some open source llm models on my own computer. I asked Bing Copilot for some guidance and I came across the pcmag article that recommended oobabooga's text generation webui.

This post is about my experience setting up and using this tool, and some tips and tricks to avoid few pitfalls.

NOTE: This is a time-taking process, and needs a lot of space on your disk, and uses up a lot of GPU when running.

2. Resources

Here are some of the pre-requisites to get started:

GPU: The more powerful the better.
Lots and lots of space: My installation ended up using ~75GB on disk at the end of this process. Use secondary hard-disk if you have one.
Lots of time: The installation took upwards of an hour to complete.

And a bunch of articles/resources/forums to look at before you decide to move ahead:

3. Actual Installation

Alright so you've decided to proceed. ⚠️ If you haven't looked at the resources section yet, I suggest you give it a quick read and make sure you have everything you need.

3.1. Clone the repo

PS> git clone https://github.com/oobabooga/text-generation-webui.git

PS> cd text-generation-webui

3.2. Start installation on windows

PS> .\start_windows.bat

3.2.1. Answer questions about GPU

You can see that I chose option A, because I have an NVIDIA GPU. Choose 'N' for the second question if you have a newer computer.

What is your GPU?

A) NVIDIA
B) AMD (Linux/MacOS only. Requires ROCm SDK 5.6 on Linux)
C) Apple M Series
D) Intel Arc (IPEX)
N) None (I want to run models in CPU mode)

Input> A

Do you want to use CUDA 11.8 instead of 12.1? Only choose this option if your GPU is very old (Kepler or older).
For RTX and GTX series GPUs, say "N". If unsure, say "N".

Input (Y/N)> N
CUDA: 12.1

3.2.2. Wait a loooong time and look at progress bars

No, seriously, this part takes a long time. And it looks like the software is installing almost every ml/ai/deeplearning package available and more!

3.2.3. Done

*******************************************************************
* WARNING: You haven't downloaded any model yet.
* Once the web UI launches, head over to the "Model" tab and download one.
*******************************************************************

You haven't downloaded a model yet!!!!! You don't say!

Screenshot of the "finished" installation goes here...

alt Installation Complete, Web UI Started

I went to the url shown in the terminal, and I started at the Chat tab of the web page.

alt Chat View First Time

Even though I had seen the warning about "no model" I decided to go ahead and talk to the chatbot, so I said "Hi" and got an error.

alt Chat Error No Model

3.3. Install a Model

Head over to the model page to make sure you don't have any models available/selected.

alt Chat Error No Model

3.3.1. Installing a "Large" Model

So now I had to install a model.

⚠️⛔The first attempt wasn't great and you can skip it and go to the next model installation directly.

The first model I tried to load was from one of the popular threads on the LocalLlama subreddit. It was called WizardLM/WizardCoder-Python-34B-V1.0.

alt WizardCoder Model Download

This model was too big. I missed looking at the file sizes and started the install. It took a long time to download and ended up taking up roughly 62GB of space. Once it was downloaded I wasn't sure how long it would take to load and how slow it would make my computer. So I decided not to use it.

3.3.2. Installing a "Tiny" Model

So I searched on LocalLlama subreddit for small/tiny models and found some. Then I went on huggingface and searched for "tiny". Found the next candidate TinyLlama/TinyLlama-1.1B-Chat-v1.0!

Enter full model name and download.

alt TinyLLama Model Download

This one downloaded fast (only ~2GBs)!

alt TinyLLama Model Download Complete

3.4. Use Model

🥁Drumroll! It was now time to run the model and have a chat with my own AI chatbot.

Now we need to refresh the models list and then choose a model from the model dropdown. Then click the "Load" button.

alt Select TinyLLama Model

And I get the message 🤔:

Successfully loaded TinyLlama_TinyLlama-1.1B-Chat-v1.0.

It seems to be an instruction-following model with template "Custom (obtained from model metadata)". In the chat tab, instruct or chat-instruct modes should be used.

3.5. Try it out

I was excited to try it out so I went back to the Chat tab and started talking to the AI.

I didn't pay attention to that warning message (see previous section) at first and didn't change the mode 😅 and I got some hot garbage when I sent a message to the AI.

alt Model Response on Incorrect Chat Mode

So I changed the mode to chat-instruct (whatever than means), and started a new chat, and also changed the character of the AI.

NOTE: ⚠️ The chat mode selection is right at the bottom of the chat screen, scroll down to get there.

And we are in business 😁!

alt Proper TinyLLama Response

NOTE: I will have to figure out this avatar thing, how to add new ones!