Getting Started Latest
Sample HTTP application deployment and autoscaling with the KEDA HTTP Add-on
In this tutorial, we will install the KEDA HTTP Add-on and use it to autoscale an HTTP application based on incoming traffic — including scaling to zero when idle.
By the end, we will have:
- A sample HTTP application running in our cluster
- The HTTP Add-on intercepting and counting requests
- KEDA scaling the application up under load and back to zero when traffic stops
Prerequisites
Before we begin, we need:
A Kubernetes cluster (kind, minikube, or a cloud provider)
kubectlconfigured to access the clusterHelm 3 installed
KEDA core installed:
helm install keda kedacore/keda --namespace keda --create-namespaceSee the KEDA deployment docs for other installation methods.
Step 1: Add the KEDA Helm Repository
If we have not already added the KEDA Helm repository, we add it now and update our local chart index:
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
Step 2: Install the HTTP Add-on
We install the HTTP Add-on into the same keda namespace where KEDA core is running:
helm install http-add-on kedacore/keda-add-ons-http --namespace keda
We verify that all components are running:
kubectl get pods -n keda
We will see pods for the operator, interceptor, and scaler — all with a Running status:
NAME READY STATUS RESTARTS AGE
keda-add-ons-http-interceptor-... 1/1 Running 0 30s
keda-add-ons-http-operator-... 1/1 Running 0 30s
keda-add-ons-http-scaler-... 1/1 Running 0 30s
keda-admission-webhooks-... 1/1 Running 0 2m
keda-operator-... 1/1 Running 0 2m
keda-operator-metrics-apiserver-... 1/1 Running 0 2m
Step 3: Deploy a Sample Application
We create a namespace and deploy a sample HTTP application using traefik/whoami, a lightweight HTTP server that responds with request metadata.
kubectl create namespace demo
We deploy a Deployment and Service:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-app
namespace: demo
spec:
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
spec:
containers:
- name: sample-app
image: traefik/whoami
args: ["--port=8080"]
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: sample-app
namespace: demo
spec:
selector:
app: sample-app
ports:
- port: 80
targetPort: 8080
EOF
We verify the Deployment was created:
kubectl get deployment -n demo
We will see the Deployment with 1 replica running (the Kubernetes default):
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 1/1 1 1 10s
Step 4: Create an InterceptorRoute
The InterceptorRoute tells the interceptor how to route requests to our sample app and what scaling metric to use.
kubectl apply -f - <<EOF
apiVersion: http.keda.sh/v1beta1
kind: InterceptorRoute
metadata:
name: sample-app
namespace: demo
spec:
target:
service: sample-app
port: 80
rules:
- hosts:
- sample-app.example.com
scalingMetric:
requestRate:
targetValue: 5
window: 1m
granularity: 1s
EOF
The requestRate metric scales based on requests per second, averaged over the configured window.
A targetValue: 5 means the add-on targets 5 requests per second per replica.
We use a low value here so that scaling is visible during testing.
See Scaling for details on scaling metrics and how to tune them.
We verify the InterceptorRoute is ready:
kubectl get interceptorroute -n demo
We will see:
NAME TARGETSERVICE READY AGE
sample-app sample-app True 10s
Step 5: Create a ScaledObject
The ScaledObject tells KEDA how to scale our sample-app deployment.
It uses the external-push trigger type, which receives metrics from the HTTP Add-on’s scaler component.
kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sample-app
namespace: demo
spec:
scaleTargetRef:
name: sample-app
minReplicaCount: 0
maxReplicaCount: 10
cooldownPeriod: 30
triggers:
- type: external-push
metadata:
scalerAddress: keda-add-ons-http-external-scaler.keda:9090
interceptorRoute: sample-app
EOF
The interceptorRoute value must match the name of the InterceptorRoute we created in the previous step.
See Architecture for details on how these components connect.
We verify the ScaledObject was created:
kubectl get scaledobject -n demo
We will see:
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS ...
sample-app apps/v1.Deployment sample-app 0 10 external-push ...
Step 6: Send Traffic and Observe Scaling
Now we test that autoscaling works. Since there is no traffic, KEDA has scaled the deployment to 0 replicas. We verify this:
kubectl get deployment sample-app -n demo
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 0/0 0 0 2m
For testing, we use kubectl port-forward to access the interceptor proxy.
In production, your ingress or gateway must route traffic to the interceptor proxy service (keda-add-ons-http-interceptor-proxy) instead of directly to your application — see Configure Ingress for details.
kubectl port-forward -n keda svc/keda-add-ons-http-interceptor-proxy 8090:8080
In another terminal, we send a request with the matching Host header:
curl -H "Host: sample-app.example.com" localhost:8090
The first request may take a few seconds. This is the cold start: KEDA is scaling the deployment from 0 to 1 replica, and the interceptor holds the request until the pod is ready. We will see a response from the sample app once the pod starts.
We check replicas again:
kubectl get deployment sample-app -n demo
We will see 1 replica running:
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 1/1 1 1 3m
Step 7: Generate Load and Watch It Scale Up
To see scaling beyond 1 replica, we generate a burst of traffic.
The wait=50ms query parameter tells whoami to hold each response for 50 milliseconds, which produces a steady rate of about 20 requests per second — enough to trigger scaling with our targetValue of 5:
for i in $(seq 1 300); do curl -s -H "Host: sample-app.example.com" "localhost:8090/?wait=50ms" > /dev/null; done
After the burst finishes, we check the deployment:
kubectl get deployment sample-app -n demo
We will see the replica count has increased:
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 2/2 2 2 5m
Step 8: Observe Scale to Zero
After the burst ends and the cooldown period passes (30 seconds, as configured in our ScaledObject), KEDA scales the deployment back to 0. We can watch this happen:
kubectl get deployment sample-app -n demo -w
We will see replicas decrease to 0:
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 2/2 2 2 5m
sample-app 1/1 1 1 6m
sample-app 0/0 0 0 7m
Step 9: Clean Up
To remove the sample application and all its resources:
kubectl delete namespace demo
What’s Next
- Architecture — Understand how the interceptor, scaler, and operator work together.
- Autoscale an App — Apply this pattern to your own services.
- Configure Ingress — Set up Gateway API or Ingress for production traffic.
- Configure Scaling Metrics — Tune concurrency targets or switch to request-rate scaling.
Getting Help
- Kubernetes Slack —
#kedachannel (join here) - GitHub Issues — Bug reports and feature requests
- GitHub Discussions — Questions and general conversation