Customized Metrics

This is an example demonstrating how to add your customized Python side Prometheus metrics.

Mosec already has the Rust side metrics, including:

  • throughput for the inference endpoint

  • duration for each stage (including the IPC time)

  • batch size (only for the max_batch_size > 1 workers)

  • number of remaining tasks to be processed

If you need to monitor more details about the inference process, you can add some Python side metrics. E.g., the inference result distribution, the duration of some CPU-bound or GPU-bound processing, the IPC time (get from rust_step_duration - python_step_duration).

This example has a simple WSGI app as the monitoring metrics service. In each worker process, the Counter will collect the inference results and export them to the metrics service. For the inference part, it parses the batch data and compares them with the average value.

For more information about the multiprocess mode for the metrics, check the Prometheus doc.

python_side_metrics.py

# Copyright 2022 MOSEC Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example: Adding metrics service."""

import os
import pathlib
import tempfile
from typing import List

from prometheus_client import (  # type: ignore
    CollectorRegistry,
    Counter,
    multiprocess,
    start_http_server,
)

from mosec import Server, ValidationError, Worker, get_logger

logger = get_logger()


# check the PROMETHEUS_MULTIPROC_DIR environment variable before import Prometheus
if not os.environ.get("PROMETHEUS_MULTIPROC_DIR"):
    metric_dir_path = os.path.join(tempfile.gettempdir(), "prometheus_multiproc_dir")
    pathlib.Path(metric_dir_path).mkdir(parents=True, exist_ok=True)
    os.environ["PROMETHEUS_MULTIPROC_DIR"] = metric_dir_path


metric_registry = CollectorRegistry()
multiprocess.MultiProcessCollector(metric_registry)
counter = Counter(
    "inference_result",
    "statistic of result",
    ("status", "worker_id"),
    registry=metric_registry,
)


class Inference(Worker):
    """Sample Inference Worker."""

    def __init__(self):
        super().__init__()
        self.worker_id = str(self.worker_id)

    def deserialize(self, data: bytes) -> int:
        json_data = super().deserialize(data)
        try:
            res = int(json_data.get("num"))
        except Exception as err:
            raise ValidationError(err) from err
        return res

    def forward(self, data: List[int]) -> List[bool]:
        avg = sum(data) / len(data)
        ans = [x >= avg for x in data]
        counter.labels(status="true", worker_id=self.worker_id).inc(sum(ans))
        counter.labels(status="false", worker_id=self.worker_id).inc(
            len(ans) - sum(ans)
        )
        return ans


if __name__ == "__main__":
    # Run the metrics server in another thread.
    start_http_server(5000, registry=metric_registry)

    # Run the inference server
    server = Server()
    server.append_worker(Inference, num=2, max_batch_size=8)
    server.run()

Start

python python_side_metrics.py

Test

http POST :8000/inference num=1

Check the Python side metrics

http :8080

Check the Rust side metrics

http :8000/metrics

How to build monitoring system for Mosec

In this tutorial, we will explain how to build monitoring system for Mosec, which includes Prometheus and Grafana.

Prerequisites

Before starting, you need to have Docker and Docker Compose installed on your machine. If you don’t have them installed, you can follow the instructions get-docker and compose to install them.

Starting the monitoring system

Clone the repository containing the docker-compose.yaml file:

git clone https://github.com/mosecorg/mosec.git

Navigate to the directory containing the docker-compose.yaml file:

cd mosec/examples/monitor

Start the monitoring system by running the following command:

docker-compose up -d

This command will start three containers: Mosec, Prometheus, and Grafana.

Test

Run test and feed metrics to Prometheus.

http POST :8000/inference num=1

Accessing Prometheus

Prometheus is a monitoring and alerting system that collects metrics from Mosec. You can access the Prometheus UI by visiting http://127.0.0.1:9090 in your web browser.

Accessing Grafana

Grafana is a visualization tool for monitoring and analyzing metrics. You can access the Grafana UI by visiting http://127.0.0.1:3000 in your web browser. The default username and password are both admin.

Stopping the monitoring system

To stop the monitoring system, run the following command:

docker-compose down

This command will stop and remove the containers created by Docker Compose.