This example shows how a multitenant service can distribute requests evenly among multiple Azure OpenAI Service instances and manage tokens per minute (TPM) for multiple tenants.
kubernetes grafana prometheus openai grafana-dashboard tpm load-balancing aks azure-kubernetes-service azure-openai azure-openai-service tokens-per-minute
-
Updated
Feb 26, 2024 - C#