# Keras Backend

In this notebook we will be using the Keras backend module, which provides an abstraction over both Theano and Tensorflow.

Let's try to re-implement the Logistic Regression Model using the keras.backend APIs.

The following code will look like very similar to what we would write in Theano or Tensorflow (with the only difference that it may run on both the two backends).

import keras.backend as K
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

Using TensorFlow backend.

from kaggle_data import load_data, preprocess_data, preprocess_labels

X_train, labels = load_data('../data/kaggle_ottogroup/train.csv', train=True)
X_train, scaler = preprocess_data(X_train)
Y_train, encoder = preprocess_labels(labels)

X_test, _ = preprocess_data(X_test, scaler)

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

dims = X_train.shape[1]
print(dims, 'dims')

9 classes
93 dims

feats = dims
training_steps = 25

x = K.placeholder(dtype="float", shape=X_train.shape)
target = K.placeholder(dtype="float", shape=Y_train.shape)

# Set model weights
W = K.variable(np.random.rand(dims, nb_classes))
b = K.variable(np.random.rand(nb_classes))

# Define model and loss
y = K.dot(x, W) + b
loss = K.categorical_crossentropy(y, target)

activation = K.softmax(y) # Softmax

lr = K.constant(0.01)

train = K.function(inputs=[x, target], outputs=[loss], updates=updates)

# Training
loss_history = []
for epoch in range(training_steps):
current_loss = train([X_train, Y_train])[0]
loss_history.append(current_loss)
if epoch % 20 == 0:
print("Loss: {}".format(current_loss))

Loss: [ 2.13178873  1.99579716  3.72429109 ...,  2.75165343  2.29350972
1.77051127]
Loss: [ 2.95424724  0.10998608  1.07148504 ...,  0.23925911  2.9478302
2.90452051]

loss_history = [np.mean(lh) for lh in loss_history]

# plotting
plt.plot(range(len(loss_history)), loss_history, 'o', label='Logistic Regression Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()


Please switch to the Theano backend and restart the notebook.

You should see no difference in the execution!

Reminder: please keep in mind that you can execute shell commands from a notebook (pre-pending a ! sign). Thus:

    !cat ~/.keras/keras.json


should show you the content of your keras configuration file.

### Moreover

Try to play a bit with the learning reate parameter to see how the loss history floats...

## Exercise: Linear Regression

To get familiar with automatic differentiation, we start by learning a simple linear regression model using Stochastic Gradient Descent (SGD).

Recall that given a dataset ${(xi, y_i)}{i=0}^N$, with $x_i, y_i \in \mathbb{R}$, the objective of linear regression is to find two scalars $w$ and $b$ such that $y = w\cdot x + b$ fits the dataset. In this tutorial we will learn $w$ and $b$ using SGD and a Mean Square Error (MSE) loss:

$$\mathcal{l} = \frac{1}{N} \sum_{i=0}^N (w\cdot x_i + b - y_i)^2$$

Starting from random values, parameters $w$ and $b$ will be updated at each iteration via the following rule:

$$wt = w{t-1} - \eta \frac{\partial \mathcal{l}}{\partial w}$$

$$bt = b{t-1} - \eta \frac{\partial \mathcal{l}}{\partial b}$$

where $\eta$ is the learning rate.

NOTE: Recall that linear regression is indeed a simple neuron with a linear activation function!!

### Definition: Placeholders and Variables

First of all, we define the necessary variables and placeholders for our computational graph. Variables maintain state across executions of the computational graph, while placeholders are ways to feed the graph with external data.

For the linear regression example, we need three variables: w, b, and the learning rate for SGD, lr.

Two placeholders x and target are created to store $x_i$ and $y_i$ values.

# Placeholders and variables
x = K.placeholder()
target = K.placeholder()
w = K.variable(np.random.rand())
b = K.variable(np.random.rand())


#### Notes:

In case you're wondering what's the difference between a placeholder and a variable, in short:

• Use K.variable() for trainable variables such as weights (W) and biases (b) for your model.
• Use K.placeholder() to feed actual data (e.g. training examples)

## Model definition

Now we can define the $y = w\cdot x + b$ relation as well as the MSE loss in the computational graph.

# Define model and loss

# %load ../solutions/sol_2311.py


Then, given the gradient of MSE wrt to w and b, we can define how we update the parameters via SGD:

# %load ../solutions/sol_2312.py


The whole model can be encapsulated in a function, which takes as input x and target, returns the current loss value and updates its parameter according to updates.

train = K.function(inputs=[x, target], outputs=[loss], updates=updates)


## Training

Training is now just a matter of calling the function we have just defined. Each time train is called, indeed, w and b will be updated using the SGD rule.

Having generated some random training data, we will feed the train function for several epochs and observe the values of w, b, and loss.

# Generate data
np_x = np.random.rand(1000)
np_target = 0.96*np_x + 0.24

# Training
loss_history = []
for epoch in range(200):
current_loss = train([np_x, np_target])[0]
loss_history.append(current_loss)
if epoch % 20 == 0:
print("Loss: %.03f, w, b: [%.02f, %.02f]" % (current_loss, K.eval(w), K.eval(b)))


We can also plot the loss history:

# Plot loss history

# %load ../solutions/sol_2313.py


### Final Note:

Please switch back your backend to tensorflow before moving on. It may be useful for next notebooks !-)