Skip to main content

📈 Prometheus metrics

info

✨ Prometheus metrics is on LiteLLM Enterprise

Enterprise Pricing

Get free 7-day trial key

LiteLLM Exposes a /metrics endpoint for Prometheus to Poll

Quick Start​

If you're using the LiteLLM CLI with litellm --config proxy_config.yaml then you need to pip install prometheus_client==0.20.0. This is already pre-installed on the litellm Docker image

Add this to your proxy config.yaml

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
callbacks: ["prometheus"]

Start the proxy

litellm --config config.yaml --debug

Test Request

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

View Metrics on /metrics, Visit http://localhost:4000/metrics

http://localhost:4000/metrics

# <proxy_base_url>/metrics

Virtual Keys, Teams, Internal Users Metrics​

Use this for for tracking per user, key, team, etc.

Metric NameDescription
litellm_spend_metricTotal Spend, per "user", "key", "model", "team", "end-user"
litellm_total_tokensinput + output tokens per "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"
litellm_input_tokensinput tokens per "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"
litellm_output_tokensoutput tokens per "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"

Proxy Level Tracking Metrics​

Use this to track overall LiteLLM Proxy usage.

  • Track Actual traffic rate to proxy
  • Number of client side requests and failures for requests made to proxy
Metric NameDescription
litellm_proxy_failed_requests_metricTotal number of failed responses from proxy - the client did not get a success response from litellm proxy. Labels: "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"
litellm_proxy_total_requests_metricTotal number of requests made to the proxy server - track number of client side requests. Labels: "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "status_code"

LLM API / Provider Metrics​

Use this for LLM API Error monitoring and tracking remaining rate limits and token limits

Labels Tracked for LLM API Metrics​

LabelDescription
litellm_model_nameThe name of the LLM model used by LiteLLM
requested_modelThe model sent in the request
model_idThe model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id
api_baseThe API Base of the deployment
api_providerThe LLM API provider, used for the provider. Example (azure, openai, vertex_ai)
hashed_api_keyThe hashed api key of the request
api_key_aliasThe alias of the api key used
teamThe team of the request
team_aliasThe alias of the team used
exception_statusThe status of the exception, if any
exception_classThe class of the exception, if any

Success and Failure Metrics for LLM API​

Metric NameDescription
litellm_deployment_success_responsesTotal number of successful LLM API calls for deployment. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"
litellm_deployment_failure_responsesTotal number of failed LLM API calls for a specific LLM deployment. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"
litellm_deployment_total_requestsTotal number of LLM API calls for deployment - success + failure. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"

Remaining Requests and Tokens Metrics​

Metric NameDescription
litellm_remaining_requests_metricTrack x-ratelimit-remaining-requests returned from LLM API Deployment. Labels: "model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"
litellm_remaining_tokensTrack x-ratelimit-remaining-tokens return from LLM API Deployment. Labels: "model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"

Deployment State Metrics​

Metric NameDescription
litellm_deployment_stateThe state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: "litellm_model_name", "model_id", "api_base", "api_provider"
litellm_deployment_latency_per_output_tokenLatency per output token for deployment. Labels: "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"

Fallback (Failover) Metrics​

Metric NameDescription
litellm_deployment_cooled_downNumber of times a deployment has been cooled down by LiteLLM load balancing logic. Labels: "litellm_model_name", "model_id", "api_base", "api_provider", "exception_status"
litellm_deployment_successful_fallbacksNumber of successful fallback requests from primary model -> fallback model. Labels: "requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"
litellm_deployment_failed_fallbacksNumber of failed fallback requests from primary model -> fallback model. Labels: "requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"

Request Latency Metrics​

Metric NameDescription
litellm_request_total_latency_metricTotal latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"
litellm_overhead_latency_metricLatency overhead (seconds) added by LiteLLM processing - tracked for labels "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model"
litellm_llm_api_latency_metricLatency (seconds) for just the LLM API call - tracked for labels "model", "hashed_api_key", "api_key_alias", "team", "team_alias", "requested_model", "end_user", "user"
litellm_llm_api_time_to_first_token_metricTime to first token for LLM API call - tracked for labels model, hashed_api_key, api_key_alias, team, team_alias [Note: only emitted for streaming requests]

Virtual Key - Budget, Rate Limit Metrics​

Metrics used to track LiteLLM Proxy Budgeting and Rate limiting logic

Metric NameDescription
litellm_remaining_team_budget_metricRemaining Budget for Team (A team created on LiteLLM) Labels: "team_id", "team_alias"
litellm_remaining_api_key_budget_metricRemaining Budget for API Key (A key Created on LiteLLM) Labels: "hashed_api_key", "api_key_alias"
litellm_remaining_api_key_requests_for_modelRemaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: "hashed_api_key", "api_key_alias", "model"
litellm_remaining_api_key_tokens_for_modelRemaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: "hashed_api_key", "api_key_alias", "model"

[BETA] Custom Metrics​

Track custom metrics on prometheus on all events mentioned above.

  1. Define the custom metrics in the config.yaml
model_list:
- model_name: openai/gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY

litellm_settings:
callbacks: ["prometheus"]
custom_prometheus_metadata_labels: ["metadata.foo", "metadata.bar"]
  1. Make a request with the custom metadata labels
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <LITELLM_API_KEY>' \
-d '{
"model": "openai/gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
}
]
}
],
"max_tokens": 300,
"metadata": {
"foo": "hello world"
}
}'
  1. Check your /metrics endpoint for the custom metrics
... "metadata_foo": "hello world" ...

Monitor System Health​

To monitor the health of litellm adjacent services (redis / postgres), do:

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
service_callback: ["prometheus_system"]
Metric NameDescription
litellm_redis_latencyhistogram latency for redis calls
litellm_redis_failsNumber of failed redis calls
litellm_self_latencyHistogram latency for successful litellm api call

🔥 LiteLLM Maintained Grafana Dashboards ​

Link to Grafana Dashboards maintained by LiteLLM

https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard

Here is a screenshot of the metrics you can monitor with the LiteLLM Grafana Dashboard

Deprecated Metrics​

Metric NameDescription
litellm_llm_api_failed_requests_metricdeprecated use litellm_proxy_failed_requests_metric
litellm_requests_metricdeprecated use litellm_proxy_total_requests_metric

FAQ​

What are _created vs. _total metrics?​

  • _created metrics are metrics that are created when the proxy starts
  • _total metrics are metrics that are incremented for each request

You should consume the _total metrics for your counting purposes