Warning
You are currently viewing v0.15 of the documentation and it is not the latest. For the most recent documentation, kindly click here.
Scaling
HTTP Add-on scaling decisions, including metrics, scale-to-zero, and cold-start behavior
The HTTP Add-on provides two scaling metrics: concurrency and request rate. KEDA uses these metrics to adjust the replica count of backend workloads through the Horizontal Pod Autoscaler (HPA).
Concurrency measures the number of in-flight requests at any instant for a given route. The interceptor increments a counter when a request arrives and decrements it when the response completes.
Request rate measures the number of requests per second, averaged over a sliding time window. The windowed averaging smooths out short bursts and provides a stable signal for scaling.
The default configuration is:
When an InterceptorRoute specifies both concurrency and request rate targets, KEDA evaluates each metric independently and scales to whichever demands more replicas. This is standard KEDA/HPA behavior: the metric requiring the highest replica count wins.
The desired replica count for a single metric follows the standard HPA formula:
desiredReplicas = ceil(currentMetricValue / targetValue)
For example, with a concurrency target of 100 and a current concurrency of 250, the desired count is ceil(250 / 100) = 3.
This calculation happens within the Kubernetes HPA based on the metrics and targets that KEDA provides.
When all metrics for a route drop to zero and the ScaledObject’s cooldownPeriod expires, KEDA scales the backend workload to zero replicas.
The cooldownPeriod is a field on the KEDA ScaledObject (not the InterceptorRoute).
It defines how long all metrics must remain at zero before KEDA scales the workload down to zero.
When a request arrives for a backend that has been scaled to zero, the interceptor holds the request while KEDA scales the backend up. The sequence is:
If the backend does not become ready within the readiness timeout, the interceptor either routes to a fallback service (if configured) or returns an error. See Configure Cold-Start Behavior for fallback configuration and Configure Timeouts for timeout settings.