Hi! Two months ago I started teaching myself to write AI software using free stuff online. It was overwhelming due to the large amount of content on this topic.

Here's what I wish I had known when I started.

Your first two months writing AI software

Tooling

Despite having many years of experience as a software engineer and sysadmin, I had difficulty setting up a workflow for AI.

I chose to use python, conda (a python package manager) and jupyter notebooks. I have seen ML implementations in R, MatLab, and even one in Excel. Jupyter notebooks support JUlia, PYthon, and R. For some tasks, you can run software directly on your local computer. However, when you're training a model, you need to use a GPU.

A GPU can train a model in a few minutes that would take hours or days using a CPU. Typically this is done using a cloud tool such as Google Colab, which has a free tier that offers free GPU time. Make sure that you're actually using the GPU you have available, it doesn't always happen automatically. Google Colab has many competitors.

If you're coding on a machine with a GPU, you can try to run your code on it. Running and maintaining a GPU rig for development and training AI is a difficult sysadmin task. Expect to put some time and effort into this if you try it. Otherwise, take advantage of cloud tools.

You may wonder about TPUs vs. GPUs. In short, they are similar tools. TPUs are better suited to some types of computations, and GPUs to others. They will both continue to exist alongside each other. If you are choosing to use a GPU or TPU, use a GPU unless the library you're using suggests using a TPU for the problem you're solving.

AI software libraries make extensive use of native extensions to increase speed.

Most libraries that run on GPUs use an Nvidia software library called CUDA for their implementation. You need an Nvidia GPU to use these optimized libraries. Compiling native extensions for python libraries that can use CUDA can be complicated even if you don't have an Nvidia GPU, because the code might expect you to have one.

Additionally, if you're using an M1 MacBook (as I am) there are libraries whose native extensions are surprised by the M1's ARM-based architecture, making them difficult to compile.

The M1 is a blessing and a curse. There are so many people using M1's that some developers have gone through the trouble of optimizing software to run on it. For example, running Stable Diffusion models on an M1 is possible and it takes advantage of the M1's built-in GPU.

AI software libraries are less user-friendly than the web development libraries I use in my day-to-day work. Many popular libraries are little more than a proof of concept that is the only existing implementation of some technique, so we are all stuck with it unless we want to write our own from scratch.

More than usual, I have found that sample code in the AI world never runs without errors, often because libraries are not professionally engineered. Many popular libraries make changes to their APIs in minor version updates without explanation or warning, spam STDOUT with massive amounts of text, and don't specify their library requirements in their dependency graph, forcing their users to play dependency manager by hand.

Jupyter tips

Before starting, make sure you have a GPU turned on if you need one.

In the first code block of your notebook, put the !pip and !apt-get commands. Run them first, then restart the kernel runtime.

In the second block, put the import statements. On the second run (after restarting), run the first block, then second block. Then restart again.

Fix the errors that come up one by one, restarting each time.

Now you're reading to start coding and experimenting!

If a model training step is taking more than 20-30 minutes, double-check that you have a GPU turned on and you're utilizing it.

Google Colab is finicky and you can lose your progress if you leave the page for too long. If you have to start over, you might have to do the "install packages, restart runtime, import, restart runtime" routine again to get into a usable state.

If you're put a lot of time into training, figure out how to save an artifact of your training work. This can take a few forms. You might pickle your model object, or save the weights from a neural network so that you can simply reuse them again in another computing context.

Jupyter notebooks and git don't mix well. Consider using the "File menu -> Download -> Download .py" option in Google Colab to back up your work in a human-readable python source code file.

Learning

The amount of AI books, tutorials and courses available is completely overwhelming. Here's what worked for me.

  • Do the Fast.AI course Practical Deep Learning for Coders. Watch the video, step through the jupyter notebook book chapters executing the code, do the exercises, read and share your work on the forums. I especially recommend the lesson where you train and deploy your own image classifier to Hugging Face. I trained a Westworld host detector to save humanity from killer robots. Doing is better than reading and watching.
  • While you're doing the Fast.AI course, watch Making Friends with Machine Learning by Cassie Kozyrkov, Chief Decision Scientist, Google. It's a six-hour course that teaches what is possible to do with ML in practice, and teaches lessons to skip common pitfalls.
  • Have a project of your own that you're working on. This helps focus the mind while working through the courses, since you might learn something to help you with your own project. My project is training agents to play the game Blokus at superhuman level.
  • Look around on Kaggle. I haven't seriously attempted any competitions yet, but I often find myself reading a Kaggle notebook linked to from Fast.AI, or researching something on their Learn page.

Next steps

I'm not finished with the Fast.AI course, so I'll keep working through that. My Blokus AI project is a lot of fun, although I'm worried that training the agents might cost so much money that I'm not able to afford it. What if I make a mistake and spend a ton of money running code with a bug and it's all useless? I want to do some beginner Kaggle competitions and maybe work my way up to competing in the active contests.

I want to continue to make educational content like this until I get to the point that I'm training useful models!

Any thoughts? tweet at me