When we operate on arrays of different dimensionality, they can combine in different ways, either elementwise or through broadcasting.
Let’s start from scratch and build up to more complex examples. In the below example, we have a TensorFlow constant representing a single number.
Not much of a surprise there! We can also do computations, such as adding another number to it:
Let’s extend this concept to a list of numbers. To start, let’s create a list of three numbers, and then another list of numbers to it:
This is known as an elementwise operation, where the elements from each list are considered in turn, added together and then the results combined.
What happens if we just add a single number to this list?
Is this what you expected? This is known as an broadcasted operation. Our primary object of reference was
a, which is a list of numbers, also called an array or a one-dimensional vector. Adding a single number (called a scalar) results in an broadcasted operation, where the scalar is added to each element of the list.
Now let’s look at an extension, which is a two-dimensional array, also known as a matrix. This extra dimension can be thought of as a “list of lists”. In other words, a list is a combination of scalars, and a matrix is a list of lists.
That said, how do operations on matrices work?
That’s elementwise. If we add a scalar, the results are fairly predictable:
Here is where things start getting tricky. What happens if we add a one-dimensional array to a two-dimensional matrix?
In this case, the array was broadcasted to the shape of the matrix, resulting in the array being added to each row of the matrix. Using this terminology, a matrix is a list of rows.
What if we didn’t want this, and instead wanted to add
b across the columns of the matrix instead?
This didn’t work, as TensorFlow attempted to broadcast across the rows. It couldn’t do this, because the number of values in b (2) was not the same as the number of scalars in each row (3).
We can do this operation by creating a new matrix from our list instead.
What happened here? To understand this, let’s look at matrix shapes.
You can see from these two examples that
a has two dimensions, the first of size 2 and the second of size 2. In other words, it has two rows, each with three scalars in it.
b constant also has two dimensions, two rows with one scalar in each. This is not the same as a list, nor is it the same as a matrix if one row of two scalars.
Due to the fact that the shapes match on the first dimension but not the second, the broadcasting happened across columns instead of rows. For more on broadcasting rules, see here.
tf.shape(it’s an operation) to get a constant’s shape during operation of the graph.