Customized GPU Allocation#

This is an example demonstrating how to give different worker processes customized environment variables to control things like GPU device allocation, etc.

Assume your machine has 4 GPUs, and you hope to deploy your model to all of them to handle inference requests in parallel, maximizing your service’s throughput. With MOSEC, we provide parallel workers with customized environment variables to satisfy the needs.

As shown in the codes below, we can define our inference worker together with a list of environment variable dictionaries, each of which will be passed to the corresponding worker process. For example, if we set CUDA_VISIBLE_DEVICES to 0-3, (the same copy of) our model will be deployed on 4 different GPUs and be queried in parallel, largely improving the system’s throughput. You could verify this either from the server logs or the client response.

custom_env.py#

# Copyright 2022 MOSEC Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example: Custom Environment setup"""

import os

from mosec import Server, Worker, get_logger

logger = get_logger()


class Inference(Worker):
    """Customisable inference class."""

    def __init__(self):
        super().__init__()
        # initialize your models here and allocate dedicated device to it
        device = os.environ["CUDA_VISIBLE_DEVICES"]
        logger.info("initializing model on device=%s", device)

    def forward(self, data: dict) -> dict:
        device = os.environ["CUDA_VISIBLE_DEVICES"]
        # NOTE self.worker_id is 1-indexed
        logger.info("worker=%d on device=%s is processing...", self.worker_id, device)
        return {"device": device}


if __name__ == "__main__":
    NUM_DEVICE = 4

    def _get_cuda_device(cid: int) -> dict:
        return {"CUDA_VISIBLE_DEVICES": str(cid)}

    server = Server()

    server.append_worker(
        Inference, num=NUM_DEVICE, env=[_get_cuda_device(x) for x in range(NUM_DEVICE)]
    )
    server.run()

Start#

python custom_env.py

Test#

http :8000/inference dummy=0