Troubleshooting Your Installation#

Here are some issues folks have run into and what they did to resolve them.

PyTorch & CUDA Woes#

You need PyTorch installed with cudatoolkit (see link).

You don’t seem to get cudatoolkit when you install PyTorch using pip. You may be able to install cudatoolkit separately, but we haven’t tested that. Our tested proceedure is to remove previous installs of PyTorch. e.g. if PyTorch is currently installed through pip:

pip uninstall torch

Then, install PyTorch with cudatoolkit through conda (see the PyTorch installation page for the latest install command, but CUDA 11.3 seems to be working with bitsandbytes)

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Then clone this repo, navigate to the repository root and use:

pip install -e .

Notes on bitsandbytes#

bitsandbytes was forked in a confusing way.

The actual repo is https://github.com/TimDettmers/bitsandbytes, and this version installs with

pip install bitsandbytes

However, if you search on Google, you get the older repo https://github.com/facebookresearch/bitsandbytes and this version installs with

pip install bitsandbytes-cudaXXX

We need the newer version, installed with pip install bitsandbytes.

Some Handy Dandy Environment Variables#

Stick these in your shell’s config for prosperity:

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256
export CUDA_VISIBLE_DEVICES=0

The first will prevent the GPU memory from getting too fragmented. The memory allocator was probably not tuned for working with the huge LLMs du jour and tends to get upset when we load massive models.

The second line is not entirely necessary but can be useful when you aren’t going to use all the GPUs on your machine / node. This is helpful if you want to run several experiments in parallel on different processes. (The example only makes CUDA:0 available, but you can of course list more GPUs here.)