Race track, not race track

We were recently approached by Cam Software, developers of software for the a motorcycle aftermarket dashboard, who wanted to know if we could use our machine learning expertise to solve a problem they were having.

The dashboard is used to collect a myriad of data from a motor bike, including the usual telemetry such as speed and location but also items such as brake lever pressure and suspension travel. The system developed by Cam Software is then used to interpret this data in a variety of ways with the intention of helping users improve performance, particularly in a race setting.

The problem Cam Software were having relates to displaying an image of the race track a user is currently recording data on. Given the nature of GPS tracking, it is often the case that some inputs are invalid or produce track profiles with large discrepancies (for example, tracks with missing segments or extreme points/lines that are clearly not representative of a valid circuit). Meanwhile, subsequent readings provide a good representation of the track. They wanted the ability to detect if a given lap reading is indeed a valid race track or not without having to resort to the expense of maintaining a database of valid tracks.

Our plan was to use a Convolutional Neural Network (CNN) that can be trained to detect if a given set of GPS coordinates represents a valid race track or not. CNN's are commonly used to solve image classification problems and work by training sets of layers or maps that are applied to sub-sections of an image rather than the image as a whole. For example, if a standard fully connected neural network is trained to detect a bunch of flowers that appear in the bottom right hand corner of an image, then it will not be very good at detecting the flowers if they appear, for example, in the top left corner of the image. CNNs solve this problem and a trained network will be able to detect the flowers regardless of where they appear in the image.

While powerful, CNN's typical require a fairly large data set for training and this was not something that was readily available to us. We therefore decided to build a program that would generate the training data for us. We first built a stochastic algorithm that generates valid track profiles by randomly generating a series of straights and corners of differing angles and for the 'bad race track' data it sufficed to output a series of random lines. Examples of the two types of data are shown below:

race track
Race track
race track
Not a race track

Next, we built a simple CNN in TensorFlow based on the LeNet-5 image classification architecture. This architecture is typically used to classify hand written digits so we've adapted the output layer of our network so that it outputs a binary classification of ‘race track' or ‘not race track'. Our network is defined in TensorFlow as follows:

X = tf.placeholder(shape=(None, height, width, channels), dtype=tf.float32, name="X")
y = tf.placeholder(shape=[None], dtype=tf.int64, name="y")

he_init = tf.contrib.layers.variance_scaling_initializer()

c1 = tf.layers.conv2d(X, filters=6, kernel_size=5, strides=[1, 1], padding="SAME")
s2 = tf.nn.avg_pool(c1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
c3 = tf.layers.conv2d(s2, filters=16, kernel_size=5, strides=[1, 1], padding="SAME", activation=tf.nn.tanh)
s4 = tf.nn.avg_pool(c3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
c5 = tf.layers.conv2d(s4, filters=120, kernel_size=5, strides=[1, 1], padding="SAME", activation=tf.nn.tanh)
c5_flat = tf.reshape(c5, shape=[-1, 120 * 25 * 25])
fc_layer = tf.layers.dense(c5_flat, units=100, kernel_initializer=he_init, activation=tf.nn.tanh)
logits = tf.layers.dense(fc_layer, units=2, kernel_initializer=he_init, name="logits")
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)

loss = tf.reduce_mean(cross_entropy, name="loss")
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=momentum, use_nesterov=True)
training_op = optimizer.minimize(loss)

The X and y placeholders store the training instances and actual classifications respectively. The CNN is then based mostly on the LeNet-5 architecture, using the same filter, kernel size and stride hyper parameters. We have diverged slightly by feeding the CNN output through a fully-connected layer of 100 nodes before a final output layer consisting of just two nodes to represent our binary classification. We then use a Nesterov momentum optimizer to minimize our loss function, which is constructed using softmax cross entropy.

We trained our network on a training set consisting of 30,000 images with a batch size of 1000. After 500 training epochs and testing against test data that was not used in training, we observed an accuracy of approximately 95%.

Cam Software predominately writes software using Scala rather than Python, so we also provided an interop library for the model. To do this, we first updated the Python code to save our TensorFlow model as a bundle using the simple_save utility function:

simple_save.simple_save(sess, "./track-classifier ", inputs={"X": X}, outputs={"logits": logits})

As the model will be used only for testing new instances and not for training, we only save the ‘X' placeholder and not the ‘y' placeholder.

To load the model using Scala, we provided the following utility class:

object ModelInterrogator {
  def load(modelPath: Path): ModelInterrogator = new ModelInterrogator(SavedModelBundle.load(modelPath.toString, "serve"))
}

class ModelInterrogator private(bundle: SavedModelBundle) {

  def classify(image: BufferedImage): Int = classify(Array(image)).head

  def classify(images: Array[BufferedImage]): Array[Int] = {
    val input = Tensor.create(Array[Long](images.length, 100, 100, 3), imagesToFloatBuffer(images))
    val output = bundle.session().runner().feed("X", input).fetch("logits/BiasAdd").run().get(0)
    output.toFloatArrays.map(_.argmax)
  }

  private def imagesToFloatBuffer(images: Array[BufferedImage]): FloatBuffer = {
    require(images.forall(image => image.getWidth == 100 && image.getHeight == 100), "Images must have dimensions of 100x100")

    def toNormalisedRGB(pixel: Int): (Float, Float, Float) = {
      val red = (pixel >> 16) & 0xff
      val green = (pixel >> 8) & 0xff
      val blue = pixel & 0xff
      (red / 255.0f, green / 255.0f, blue / 255.0f)
    }

    FloatBuffer.wrap(
      images.flatMap { image =>
        (0 until image.getWidth).toArray.flatMap { x =>
          (0 until image.getHeight).toArray.flatMap { y =>
            val (r, g, b) = toNormalisedRGB(image.getRGB(y, x))
            Array(r, g, b)
          }
        }
      })
  }
}

ModelInterrogator makes use of a couple of implicit extension methods that are defined as follows:

implicit class Argmax[T](a: Array[T]) {
  def argmax(implicit cmp: Ordering[T]): Int = a.zipWithIndex.maxBy(_._1)._2
}

implicit class TensorExt(tensor: Tensor[_]) {
  def toFloatArrays: Array[Array[Float]] = {
    require(tensor.shape().length == 2, "Only two-dimensional tensors are supported")
    val (d0, d1) = (tensor.shape()(0).toInt, tensor.shape()(1).toInt)
    val floatBuffer = FloatBuffer.allocate(d0 * d1)
    tensor.writeTo(floatBuffer)
    floatBuffer.position(0)

    (0 until d0).toArray.map(_ => (0 until d1).toArray.map(_ => floatBuffer.get))
  }
}

To load the model, we call TensorFlow's SavedModelBundle.load method. ModelInterrogator then offers two classify methods for returning the classification, one for a single image and one for multiple images. This classification takes a Java based representation of an image (BufferedImage) and returns a flattened FloatBuffer representation of the data that can then be used as input into TensorFlow's API.

As part of the development and testing of our CNN, in particular for the Scala interop logic, we wrote an example GUI application to demonstrate the classifier in action. A short video of which is available here:

While there are other possible ways of tackling the original problem, we hope that this article has highlighted the potential for using CNNs as an elegant solution to such tasks. Sometimes it can be difficult to procure enough training data to accurately train models but being able to generate additional examples from a small data sets is a useful tool in combating this issue. In addition to this, the model can be improved over time through the use of real user-driven training examples.