Modeling and Evaluation – Proprietary models using Tensorflow & Keras – Part I

July 11,2022

Dr. Marios Skevofylakas

While the out-of-the box structures provided from Tensorflow are more than enough to cover most of the AI scenarios, sometimes we need to implement a fully proprietary AI structure that solves a specific problem that is not supported. Tensorflow provides us with the tools to implement these kinds of structures allowing us to focus on building the core of the AI behaviour while still being able to leverage upon the very rich ecosystem of Tensorflow tools that can be used to train, monitor evaluate and productionise our solution.

At the core of a Tensorflow AI architecture lies the Model which can be constructed from numerous pre-existing layers such as the Dense, Conv2D, LSTM and many more. Tensorflow allows us to inherit and extend both from the Model and the Layer class therefore giving us the opportunity to generate any type of proprietary layer and combinations of layers consisting of an AI core. Extending the Model class gives us the opportunity to architect any type of model structure from pre-existing layers and extending the layer allows us to create custom architectures at the layer level. Furthermore, we are allowed to override any part of the pipeline from a custom training process, and custom gradients to selectively deciding on the optimization flow within the structure, which parts of the structures remain intact and how other parts get optimized. With these toolsets at hand Tensorflow allows endless AI core architecting possibilities.

Installing Tensorflow on Anaconda

Simply ope a conda powershell and type conda install -c intel tensorflow to install the latest available library for your distribution. It is recommended to keep a separate environment to avoid conflicts between libraries as tensorflow will install a rich ecosystem of packages.

A primer on Tensorflow functions under the hood

Before we start writing a proprietary AI core let’s briefly discuss a few of the core concepts of Tensorflow. Let’s start with tensors. We opt to do this as many a times during AI core architecting, we need to be able to interpret the specifications of the tensors that are flowing within the structure as well as the operations and their lineage within Tensorflow. Tensors are uniform immutable multidimensional arrays. Types supported include floatX, intX, booleans as well as strings. Most Tensor are rectangular meaning that each dimension there are specialized Tensor types such as the RaggedTensor and the SparseTensor. All the basic math functions are provided through Tensorflow and they can be applied to tensors. Some important vocabulary on Tensors includes shape, rank, axis or dimension and the size of the tensor. Let create a Tensor and discuss its properties:

    	
            t = tf.ones([1,3,2,4])
print(t)
 
tf.Tensor(
[[[[1. 1. 1. 1.]
   [1. 1. 1. 1.]]
 
  [[1. 1. 1. 1.]
   [1. 1. 1. 1.]]
 
  [[1. 1. 1. 1.]
   [1. 1. 1. 1.]]]], shape = (1, 3, 2, 4), dtype=float32)

We can see from the output that this is a rank 4 tensor with a shape of (1, 3, 2, 4) axis 0 has 1 dimension, axis 2 3 dimensions, axis 3 2 dimensions and axis -1 is 4 dimensional and the tensor’s type is float32. We can always transform tensors by using functions such as tf.reshape(), tf.transpose(), tf.flatten(). The concept of broadcasting is applicable to tensors, in essence Tensors will automatically reshape, if possible, during operations. As an example:

    	
            a = tf.constant([1, 1, 1])
b = tf.constant([2])
print (a * b)

tf.Tensor([2 2 2], shape=(3,), dtype=int32)

We can see that the single element tensor b expanded to accommodate elementwise multiplication on a.

Numpy arrays can be converted to tensors using tf.convert_to_tensor() and tensors can be converted to arrays using .numpy(). As tensors are immutable, Tensorflow provides Variables to allow for state manipulations, essentially variables are structures that allow to “change” contained tensors through operations. It is often useful to appropriately name tensorflow variables to track and debug.

var_a = tf.Variable(a, name=”A”)

Variables are extremely important during the optimization process and can be excluded during gradient calculations using the trainable=False parameter.

var_a = tf.Variable(a, name=”A”, trainable=False)

During the optimization process tensorflow will remember all operations on the forward pass and apply the gradient tape object in reverse order. Here’s an example of a simple forward pass on a neural network. We assume a 3 neurons layer, a feature vector of ones passing through the layer and we random.normal initialize the weights as well as the biases to zeros for the layer:

    	
            w = tf.Variable(tf.random.normal((3, 2)), name = 'weights')
x = [[1., 1., 1.]]
b = tf.Variable(tf.zeros(2, dtype = tf.float32), name='biases')
with tf.GradientTape(persistent=True) as tape:
    y = tf.nn.sigmoid(tf.matmul(x, w) + b)

tf.Tensor([[0.06065261 0.36010492]], shape=(1, 2), dtype=float32

This may vary as initialization conditions are random. We can control what the GradientTape tracks through the trainable parameter we mentioned before, e.g. if for any reason we wanted a layer with constant biases of ones we could write:

b = tf.Variable(tf.ones(2, dtype = tf.float32), name='biases', trainable=False)

We can list all watched variables by using:

[v.name for v in tape.watched_variables()]

['weights:0', 'biases:0']

Notice how tensorflow will antecede the variable name with an index, as you can give variables the same names. If a variable is within the list any operation on it will create a chain of mathematical functions and since all operations applied are tensorflow friendly, the tape knows how to calculate the partial derivatives of tensors flowing through the network with respect to any tracked variable. Now we can calculate the partial derivatives with respect to weights and biases:

    	
            gradients = tape.gradient(y, [w, b])
print(gradients)
 
[<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array ([[0.05697387, 0.23042937],
       [0.05697387, 0.23042937],
       [0.05697387, 0.23042937]], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.05697387, 0.23042937], dtype=float32)>]

It is not uncommon to receive a gradient of None during proprietary AI core architecting, this can mean a multitude of things perhaps there is a disconnection within your graph flow like a calculation happening outside the tape. Other times a variable is not tracked because it was used as a tensor, and unless it is expected behaviour you would need to revisit. In rare cases you are using a calculation that is not differentiable as we can see in the Quantum Tensorflow Article VQNN…. In this case you will need to go a step further and define the differentiation yourself.

The results of the partial derivatives can be used to minimize a loss function and update all the trainable variables within the structure.

SOURCE CODE