> ## Documentation Index > Fetch the complete documentation index at: https://wb-21fd5541-sdk-testing-latest.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # NVIDIA NeMo Inference Microservice Deploy Job > Deploy a W&B model artifact to NVIDIA NeMo Inference Microservice using W&B Launch for scalable model serving. Deploy a model artifact from W\&B to a NVIDIA NeMo Inference Microservice. To do this, use W\&B Launch. W\&B Launch converts model artifacts to NVIDIA NeMo Model and deploys to a running NIM/Triton server. W\&B Launch currently accepts the following compatible model types: 1. [Llama2](https://llama.meta.com/llama2/) 2. [StarCoder](https://github.com/bigcode-project/starcoder) 3. NV-GPT (coming soon) Deployment time varies by model and machine type. The base Llama2-7b config takes about 1 minute on Google Cloud's `a2-ultragpu-1g`. ## Quickstart 1. [Create a launch queue](/platform/launch/add-job-to-queue/) if you don't have one already. See an example queue config below. ```yaml theme={null} net: host gpus: all # can be a specific set of GPUs or `all` to use everything runtime: nvidia # also requires nvidia container runtime volume: - model-store:/model-store/ ```

2. Create this job in your project: ```bash theme={null} wandb job create -n "deploy-to-nvidia-nemo-inference-microservice" \ -e $ENTITY \ -p $PROJECT \ -E jobs/deploy_to_nvidia_nemo_inference_microservice/job.py \ -g andrew/nim-updates \ git https://github.com/wandb/launch-jobs ``` 3. Launch an agent on your GPU machine: ```bash theme={null} wandb launch-agent -e $ENTITY -p $PROJECT -q $QUEUE ``` 4. Submit the deployment launch job with your desired configs from the [Launch UI](https://wandb.ai/launch) 1. You can also submit via the CLI: ```bash theme={null} wandb launch -d gcr.io/playground-111/deploy-to-nemo:latest \ -e $ENTITY \ -p $PROJECT \ -q $QUEUE \ -c $CONFIG_JSON_FNAME ```

5. You can track the deployment process in the Launch UI.

6. Once complete, you can immediately curl the endpoint to test the model. The model name is always `ensemble`. ```bash theme={null} #!/bin/bash curl -X POST "http://0.0.0.0:9999/v1/completions" \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -d '{ "model": "ensemble", "prompt": "Tell me a joke", "max_tokens": 256, "temperature": 0.5, "n": 1, "stream": false, "stop": "string", "frequency_penalty": 0.0 }' ```