Classifying With Linear Models Linear Classifiers

In this lesson we have a look at machine learning with TensorFlow.

We will create our own linear classifier, and use TensorFlow’s built-in optimisation algorithm to train it.

First, we will have a look at the data, and what we are trying to do. For those new to machine learning, the task we are trying to perform is called supervised machine learning or classification.

The task is to try and work out the relationship between some input data and an output value. In practical terms, the input data could be measurements, such as height or weight, and the output value would be the expected prediction, such as “cat” or “dog”.

The lesson here extends on the work of our Convergence lesson, which can be found here. I recommend you complete that lesson first.

Let’s create and visualise some data:

from sklearn.datasets import make_blobs

import numpy as np

from sklearn.preprocessing import OneHotEncoder

X_values, y_flat = make_blobs(n_features=2, n_samples=800, centers=3, random_state=500)
y = OneHotEncoder().fit_transform(y_flat.reshape(-1, 1)).todense()
y = np.array(y)

%matplotlib inline

from matplotlib import pyplot as plt

# Optional line: Sets a default figure size to be a bit larger.
plt.rcParams['figure.figsize'] = (24, 10)

plt.scatter(X_values[:,0], X_values[:,1], c=y_flat, alpha=0.4, s=150)

Here we have three blobs of data, the yellows, blues and purples. They are plotted out on two dimensions, which we will call x0x0 and x1x1.

These values are stored in the X array.

When we perform machine learning, it is necessary to split your data into a training set, that we use for creating the model, and a testing set, that we use to evaluate it. If we don’t do that, then we can simply create a “cheating classifier” that just remembers our training data. By splitting, our classifier must learn the relationship between inputs (the position on the plot) and the outputs.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test, y_train_flat, y_test_flat = train_test_split(X_values, y, y_flat)

X_test += np.random.randn(*X_test.shape) * 1.5

Now we plot our testing data. After learning the relationship between position and colour from the training data, the classifier will be given the following points, and will be evaluated on how accurately it colours the points.

#plt.scatter(X_train[:,0], X_train[:,1], c=y_train_flat, alpha=0.3, s=150)
plt.plot(X_test[:,0], X_test[:,1], 'rx', markersize=20)

Creating a model

Our model will be a simple linear classifier. This means it will draw straight lines between the three colours. Points above a line are given one colour, while those below a line are given another colour. We will call these our decision lines, although they are normally called decision boundaries, because other models can learn more complex shapes than just a line.

To mathematically represent our model, we use this equation:


Our weights W is a (n_features, n_classes) matrix and represents the learned weights from our model. It dictates where the decision lines will site. X is a (n_rows by n_features) matrix, and is the position data – where a given point sits on the graph. Finally, b is a (1 by n_classes) vector, and is the biases. We need this so that our lines don’t have to go through point (0,0), giving us the ability to “draw” lines in any position on the graph.

The points in X are fixed – these are the training or testing data, and are called observed data. The values of W and b are the parameters in our model and we have control over those values. Choosing good values for these values gives us good decision lines.

​The process of choosing good values for the parameters in our model is called training the algorithm, and is the “learning” in machine learning.

​Let’s take our mathematical model from above, and turn it into a TensorFlow operation.

import tensorflow as tf

n_features = X_values.shape[1]
n_classes = len(set(y_flat))

weights_shape = (n_features, n_classes)

W = tf.Variable(dtype=tf.float32, initial_value=tf.random_normal(weights_shape)) # Weights of the model

X = tf.placeholder(dtype=tf.float32)

Y_true = tf.placeholder(dtype=tf.float32)

bias_shape = (1, n_classes)
b = tf.Variable(dtype=tf.float32, initial_value=tf.random_normal(bias_shape))

Y_pred = tf.matmul(X, W) + b

The Y_pred Tensor represents our mathematical model from above. By passing in observed data (X) we can get the expected values, in our case, the expected colour of a given point. Note the use of broadcasting for applying the bias across all of the predictions.

The actual values in Y_pred are composed of “likelihoods” that the model will select each of the classes for a given point, making is a (n_rows by n_classes) sized matrix. They aren’t real likelihoods, but we can find out which class our model thinks is most likely by finding the maximum value.

Next, we need to define a function that evaluates how good a given set of weights is. Note that we haven’t learned weights yet, they were simply given random values. TensorFlow has built-in loss functions that accept the predicted outputs (i.e. those values that come out of your model) against the actual values (the ground truth that we created when we first created our testing set). We compare these and score how well our model performed. We call it a loss function, because the worse we do the higher the value – we attempt to minimise the loss.

loss_function = tf.losses.softmax_cross_entropy(Y_true, Y_pred)

The final step is to create an optimisation step that takes our loss function and finds values for the given variables that minimises the loss. Note that the loss function references Y_true, which in turn references W and b. TensorFlow picks this relationship up, and alters the values in these Variables to find good values.

learner = tf.train.GradientDescentOptimizer(0.1).minimize(loss_function)

Now for the training bit!

We pass in the learner in a loop for it to find the best weights. Every time we loop, the learned weights from the previous loop are improved slightly for the next loop. The 0.1 in the previous line of code is the learning rate. If you increase the value, the algorithm learns faster. However, smaller values generally converge to better values. A value of 0.1 is a good starting point while you look at other aspects of the model.

In each loop, we pass in our training data to the learner through placeholders. Every 100th loop, we see how well our model is learning by passing the testing data in directly to the loss function.

with tf.Session() as sess:
for i in range(5000):
result =, {X: X_train, Y_true: y_train})
if i % 100 == 0:
print("Iteration {}:\tLoss={:.6f}".format(i,, {X: X_test, Y_true: y_test})))
y_pred =, {X: X_test})
W_final, b_final =[W, b])
predicted_y_values = np.argmax(y_pred, axis=1)
h = 1
x_min, x_max = X_values[:, 0].min() - 2 * h, X_values[:, 0].max() + 2 * h
y_min, y_max = X_values[:, 1].min() - 2 * h, X_values[:, 1].max() + 2 * h
x_0, x_1 = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
decision_points = np.c_[x_0.ravel(), x_1.ravel()]

A little complex, but we are effectively creating a two dimensional grid covering the possible values for x0 and x1.

# We recreate our model in NumPy
Z = np.argmax(decision_points @ W_final[[0,1]] + b_final, axis=1)

# Create a contour plot of the x_0 and x_1 values
Z = Z.reshape(xx.shape)
plt.contourf(x_0, x_1, Z, alpha=0.1)

plt.scatter(X_train[:,0], X_train[:,1], c=y_train_flat, alpha=0.3)
plt.scatter(X_test[:,0], X_test[:,1], c=predicted_y_values, marker='x', s=200)

plt.xlim(x_0.min(), x_0.max())
plt.ylim(x_1.min(), x_1.max())

There you have it! Our model will classify anything in the yellow region as yellow, and so on. If you overlay the actual test values (stored in y_test_flat), you can highlight any differences.

  1. Plot the relationship between iteration and loss. What shape appears, and how do you think this will continue?
  2. Using TensorBoard, write the graph to file and have a look at the values of the variables in TensorBoard. See our tutorial for more information.
  3. Create a non-linear model by performing some transformation on X before passing into our linear model. This can be done a large number of ways, and your model’s accuracy will alter depending on your choice.
  4. Use the following code to load a dataset with 64 dimensions, called digits and pass it through your classifier. What prediction accuracy do you get?
from sklearn.datasets import load_digits
digits = load_digits()
X =
y =