Build Inference Pipelines with Seldon Core

Eugene Fedorenko
3 min readAug 16, 2020

Seldon Core is a popular open-source framework to deploy machine learning models on a Kubernetes cluster. It’s lightweight, simple and it gives a number of impressive instruments to build a powerful ML inference service. One of the most interesting feature, in my opinion, is the ability to build scoring pipelines with Seldon Core predictor components. In this post I am going to demonstrate that feature and build a sample inference pipline.

Seldon Core introduces a custom K8s resource SeldonDeployment. This is the resource that we describe in a K8s manifest file and deploy to our cluster. It contains a definition of a key working horse - a predictor. Whenever Seldon Core receives a request to predict something it forwards the request to the predictor, which actually takes an ML model and does the prediction with it. This is a simple and most common scenario. A predictor consists of a number of components. In most cases it contains a single Model component that actually works with a model file and returns the prediction result. However there are a number of available component types that may combine a predictor turning it into a complex inference pipeline: Model, Router, Combiner, Transformer, Output_Transformer.

The sample pipeline we are going to build in this post looks like on the following diagram.

For the sake of simplicity and in order to focus on the actual pipeline concept, it performs some very basic transformations. The input data for the pipeline is a json with an integer payload. The Input Transformer component adds 100 to the payload and forwards the request to the Router. The Router component randomly routes the request to either Model A or to the Combiner. Every model emulates a prediction by incrementing a payload. For example, the prediction made by Model A for {“data”:”100"} will be {“data”:”101"}. Model B takes the prediction from Model A as an input and returns the prediction {“data”:”102"}. The Combiner component aggregates the results of its children, so it concatenates the predictions from Model C and Model D and returns {“data”:”101101"}. The Output Transformer wraps the result into a json with “prediction_result” attribute.

All these components I implemented in a simple Web service with Python and Flask. To develop a specific component you need to implement API for this component type. For example, in order to build a Model component the web service should response on “/predict” url, to implement a Router component the web service should response on “/route” url.

This is a sample implementation of all our components:

Let’s wrap it into a Docker image with this Dockerfile:

Having done that, we are ready to define our inference pipeline in a SeldonDeployment manifest:

Note, that all components in this sample are implemented with the same Docker image. Obviously in real life there might be different images. On the other hand, to make things clear, every component here is represented by it’s own Docker container in a K8s pod, separated by port numbers. However, some of them can be combined to reduce the number of containers in a pod.

Once the SeldonDeployment manifest is applied to a K8s cluster, we can invoke our predictor, which is an instance of an inference pipeline, with the following commands:

The source code of the full implementation for this sample is available here.

That’s it!

--

--