Setting up an AI workstation

Introduction

In this document, I will share the steps required to get an AI workstation machine ready. I’ll be updating the content as my configuration evolves.

Base Operating System

My workstation will be using Ubuntu 20.04 LTS as the base operating system. I have tried the newer 22.04 LTS version too, but it includes Python 3.10 as default Python installation and I have found compatibility problems with some packages I was testing at the time like (Great_Expectations).

For the moment given, I don’t need any bleeding edge version that is only available on 22.04

Installation media

To make my setup as portable as possible, I will do the installation in an external SSD drive. I choose the Sandisk 1 Tb SSD (NVMe) with USB 3.0 Gen 2 connection. So far, my experience with Sandisk, in terms of reliability, has been good.

During the installation I formated the drive using this partitioning:

  • Boot partition: 512 Mb
  • Swap partition: 16 Gb (same size as my RAM)
  • Root partition: 256 Gb (mounts as /)
  • Home partition: 700 Gb aprox (mounts as /home)

My goal is being able to replace the root partition in the future, for an upgraded OS, without losing any personal data stored at /home.

Post-Install steps

I created an Ubuntu 20.04 LTS boot disk using BalenaEtcher, then booted my laptop. Configuration options were minimal and I chose not to load any external packages for NVidia or wi-fi, following the advice from David Adrián blogpost

Once system was installed, I installed the following packages:

sudo apt install net-tools
sudo apt install openssh-server

net-tools offers ifconfig command and openssh-server is a must for remote configuration.

Installing Lambda Stack

After checking several alternatives, I decided I will give a try to Lambda Stack, a customization layer on top of Ubuntu that brings the most popular tools for Deep Learning with a simple script.

To install it, launch the script from their website:

LAMBDA_REPO=$(mktemp) && \
wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \
sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \
sudo apt-get update && sudo apt-get install -y lambda-stack-cuda
sudo reboot

After installing the Lambda stack, it’s time to get a GPU accelerated Docker with this command:

sudo apt-get install docker.io nvidia-container-toolkit

Then you can use the lambda stack dockerfiles following this tutorial.

Other additional packages

After Lambda Stack, I decided to install also Miniconda3 and setting up JupyterHub as a service, following David Adrian guide.

JupyterHub as a service
conda create -n jupyter_env
conda activate jupyter_env
conda install python=3.9
conda install -c conda-forge jupyterhub jupyterlab nodejs nb_conda_kernels
sudo nano /etc/systemd/system/jupyterhub.service
sudo systemctl daemon reload
sudo systemctl start jupyterhub
sudo systemctl enable jupyterhub

En el interior del jupyterhub.service:

[Unit]
Description=JupyterHub
After=network.target

[Service]
User=root
Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/<your_user>/miniconda3/envs/jupyter_env/bin:/home/<your_user>/miniconda3/bin"
ExecStart=/home/<your_user>/miniconda3/envs/jupyter_env/bin/jupyterhub

[Install]
WantedBy=multi-user.target

I still have to review how this integration works. I am particularly puzzled about this paragraph: “The most interesting feature of this Jupyter setup is that it detects kernels in all conda environments, so you can access those kernels from here with no hassle. Just install the corresponding kernel in the desired environment (conda install ipykernel, or conda install irkernel) and restart Jupyter server from JupyterHub control panel”

Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
bash Miniconda3-py38_4.12.0-Linux-x86_64.sh
PyCharm Professional
sudo snap install pycharm-professional --classic

Alternative setups

I tried following the Ubuntu tutorial to install the RAPIDS NVidia stack, but it did not work for me. I got an error doing ./data-science-stack setup-system and opted for the current alternative.

Photo by Markus Winkler at Unsplash

Releted Posts

Launching a blog in 2022

Introduction January 2022. A new year starts, inmersed in COVID-19 pandemic. It’s that time of the year when everyone sets new goals and starts new projects.

Read more

Transcribe videos using OpenAI Whisper for free

Introduction OpenAI, the company behind GPT-3 and DALL-E 2 has just released a voice model called Whisper that can transcribe audio fragments to multiple languages and translate them to English.

Read more

Setting up doom emacs in Ubuntu 20.04

Introduction In this post I will capture the steps required to install doom emacs in a fresh new install of Ubuntu 20.

Read more