Random ramblings of a anonymous software engineer. Contains occasional profanity. Personal opinions, not related to employer.

Barnes-Hut t-SNE for CUDA (Linux, Mac)

I recently had to do some large-scale analysis, which was better suited to be run on a GPU. t-SNE is an excellent tool for visualizing data clusters, and those who have found this post via Google probably know what this already is, so I'm not going to go into much detail on the explanation. (The official page is here, with all the details you will need.)

I luckily found some readymade Python bindings, but with extremely limited documentation, and missing key pieces to get this working.

Turns out that the binary building part was completely omitted in the repository, so here are pre-built binaries which I made for my own use.

Obviously, this comes with no warranty. If this works for you, good, if it doesn't - you are on your own.

Ubuntu Linux 16.04, CUDA 8.0:

macOS 10.10, CUDA 7.5:

Place this into the bin directory of your checkout, and the library should work as expected. (The unfortunate bit is as this relies on a external binary, there is a lot of I/O and lexical casting involved - which I plan to look into someday.)

If the perplexity is too small, it will trip and trigger a assertion with very little useful information to debug the problem. If it doesn't work, try reducing the perplexity. (More on what that is in the official page, link above.)

One thing to note: There is an interesting bug in the implementation's VRAM calculation routine. (which I didn't bother looking into) Setting gpu_mem to a value above 0.9 triggers VRAM OOM. (It seems to think a Titan X has 13.2GBs of memory, based on the actual allocated VRAM and ratio I used.)

I use the binary outside of the Python binding as well, so in my case, I replaced the _find_exe_dir() helper in my site-package installation.

def _find_exe_dir():
    exe_dir = '/usr/local/bin'
    exe_file = 't_sne_bhcuda'

    return path_join(exe_dir, exe_file)

Both binaries are for 64-bit systems.

Example code: (Taken from the original source)

import t_sne_bhcuda.bhtsne_cuda as tsne_bhcuda
import matplotlib.pyplot as plt
import numpy as np

perplexity = 50.0
theta = 0.5
learning_rate = 200.0
iterations = 2000
gpu_mem = 0.8

t_sne_result = tsne_bhcuda.t_sne(samples=data_for_tsne, files_dir='/home/user/wherever', no_dims=2, perplexity=perplexity, eta=learning_rate, theta=theta, iterations=iterations, gpu_mem=gpu_mem, randseed=-1, verbose=2)
t_sne_result = np.transpose(t_sne_result)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(t_sne_result[0], t_sne_result[1])

Step by step instructions:

  1. Git clone Python bindings repository from Github.
  2. Make bin directory, download binary for your platform and drop it in there with the filename t_sne_bhcuda
  3. Alternatively, use the patch above and drop it into /usr/local/bin, same filename.
  4. Use the example code above.