Deploying CatBoost Models with KServe
This guide demonstrates how to deploy MLflow models using KServe's InferenceService and how to send inference requests using the Open Inference Protocol.
Prerequisites
Before you begin, make sure you have:
- A Kubernetes cluster with KServe installed.
kubectlCLI configured to communicate with your cluster.- Basic knowledge of Kubernetes concepts and MLflow models.
- Access to cloud storage (like Google Cloud Storage) to store your model artifacts.
Training a Sample CatBoost Model
The first step is to train a sample CatBoost model and serialize it in an appropriate format using save_model API.
import numpy as np
from catboost import CatBoostClassifier
train_data = np.random.randint(0, 100, size=(100, 10))
train_labels = np.random.randint(0, 2, size=(100))
model = CatBoostClassifier(
iterations=2,
depth=2,
learning_rate=1,
loss_function="Logloss",
verbose=True,
)
model.fit(train_data, train_labels)
model.save_model("model.cbm")
Testing the Model Locally
Once you have your model serialized as model.cbm, you can use MLServer to create a local model server. For more details, check the CatBoost example documentation.
This local testing step is optional. You can skip to the deployment section if you prefer.
Prerequisites
To use MLServer locally, install the mlserver package and the CatBoost runtime:
pip install mlserver mlserver-catboost
Model Settings
Next, provide model settings so that MLServer knows:
- The inference runtime to serve your model (i.e.
mlserver_catboost.CatboostModel) - The model's parameters
These can be specified through environment variables or by creating a local model-settings.json file:
{
"implementation": "mlserver_catboost.CatboostModel",
"name": "catboost-classifier",
"parameters": {
"uri": "./model.cbm",
"version": "v0.1.0"
}
}
Starting the Model Server Locally
With the mlserver package installed and a local model-settings.json file, start your server with:
cat << EOF > model-settings.json
{
"implementation": "mlserver_catboost.CatboostModel",
"name": "catboost-classifier",
"parameters": {
"uri": "./model.cbm",
"version": "v0.1.0"
}
}
EOF
mlserver start .
If everything is fine, you will see such messages in the mlserver process output:
2025-09-18 01:15:05,483 [mlserver.parallel] DEBUG - Starting response processing loop...
2025-09-18 01:15:05,484 [mlserver.rest] INFO - HTTP server running on http://0.0.0.0:8080
...
2025-09-18 01:15:06,528 [mlserver][catboost-classifier:v0.1.0] INFO - Loaded model 'catboost-classifier' successfully.
2025-09-18 01:15:06,529 [mlserver][catboost-classifier:v0.1.0] INFO - Loaded model 'catboost-classifier' successfully.
Deploying the Model with InferenceService
When deploying the model with InferenceService, KServe injects sensible defaults that work out-of-the-box without additional configuration. However, you can override these defaults by providing a model-settings.json file similar to your local one. You can even provide multiple model-settings.json files to load multiple models.
To use the Open Inference Protocol (v2) for inference with the deployed model, set the protocolVersion field to v2. In this example, your model artifacts have already been uploaded to a Google Cloud Storage bucket and can be accessed at gs://kfserving-examples/models/catboost/classifier.
Apply the YAML manifest:
kubectl apply -f - << EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "catboost-example"
spec:
predictor:
model:
runtime: kserve-mlserver
modelFormat:
name: catboost
protocolVersion: v2
storageUri: "gs://kfserving-examples/models/catboost/classifier"
resources:
requests:
cpu: "1"
memory: "1Gi"
limits:
cpu: "1"
memory: "1Gi"
EOF
Testing the Deployed Model
You can test your deployed model by sending a sample request following the Open Inference Protocol.
Use our sample input file catboost-input.json to test the model:
Determine the ingress IP and ports, then set the INGRESS_HOST and INGRESS_PORT environment variables.
SERVICE_HOSTNAME=$(kubectl get inferenceservice catboost-example -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./catboost-input.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/catboost-example/infer
{
"model_name": "catboost-example",
"id": "70062817-f7de-4105-93ef-e0e2ea3d214e",
"parameters": {},
"outputs": [
{
"name": "predict",
"shape": [
1,
1
],
"datatype": "INT64",
"parameters": {
"content_type": "np"
},
"data": [
0
]
}
]
}