
Theano is a very interesting numeric library for Python that I covered briefly a few years ago. Coming from the machine learning group at Université de Montréal – i.e. Yoshua Bengio et al. – it is well adapted to the kinds of numerical tasks that frequently occur in machine learning problems, in particular deep neural nets. I recently tried it again and found that its debugging and error diagnostic features had been sufficiently improved to make it practical in real-world applications.
It combines several paradigms for numerical computations into a coherent whole, namely:
- matrix algebra operations, in the style of Matlab and Numpy
- symbolic variable and function definitions, in the style of Mathematica or Maple
- optimizing, Just-In-Time compilation to CPU or GPU machine code
Mixing symbolic and numeric concepts is a very powerful construct indeed. To give you a flavour of what Theano is like, let me give you an example of a building a graph which computes the error in a logistic regression:
import theano.tensor as T from theano.tensor.nnet import sigmoid import numpy as np import theano #define variables X = T.matrix(name='X') y = T.vector(name='y') w = T.vector(name='w') #Forward data eta = X.dot(w) mu = sigmoid(eta) E = -(y*T.log(mu) + (1-y)*T.log(1-mu)).sum()
So far the codes doesn’t look too dissimilar to something that you would write in Matlab or NumPy. However, the variables X, y and w are symbolic variables: all the subsequently defined variables are also symbolic. That means that E, for example, is not a scalar value: you need to define a function and fill in the values to actually get a result. For example:
Efun = theano.function([X,w,y],E,allow_input_downcast=True) #Eval on actual data X_ = np.random.randn(1000,100) w_ = np.random.randn(100) y_ = np.random.rand(1000)>.5 error_val = Efun(X_,w_,y_) print error_val
That might seem like an unnecessary extra step, but therein lies the power of Theano: since it has a representation of the whole expression graph for the Efun function, it can optimize it and run it on both CPU and GPU.
Even more impressive is that it can compute symbolic functions of the graph, in particular gradients:
#Compute the gradient of E w.r.t. w the old fashioned way g1 = (y-mu).dot(X) #Compute it with the power of symbolic evaluation g2 = T.grad(E,w) gfun = theano.function([X,w,y],[g1,g2],allow_input_downcast=True) g1_,g2_ = gfun(X_,w_,y_) print g1_ print g2_
Same thing! Of course this is a trivial example where computing the gradient is straightforward enough. In more general cases, however, gradient computation is a lot less trivial; symbolic evaluation means you can explore complex model architectures without worrying about whether you’ve correctly computed the gradient.
That’s just a flavor of what Theano can do. Look at the tutorials on Theano itself or example applications in the context of deep neural nets.
