Kubernetes Services, networking and DNS without hand-waving

Kubernetes networking becomes much less mysterious when you stop treating a Service as a small load balancer floating in the cluster and start treating it as a contract.

The contract says: “Clients can use this stable name and port. Kubernetes will keep the current set of matching Pods behind it.” That is the simple idea. The details matter because most beginner networking problems are not deep packet-routing failures. They are usually label mismatches, wrong ports, namespace assumptions, or DNS names that point to a Service with no ready endpoints.

This post is for the stage where you know what a Pod is, maybe you have deployed something with a Deployment, and now the question is: why can I reach it sometimes, from some places, with some names, but not from others?

Start with the problem Services solve

A Pod gets an IP address. In most clusters, that IP is routable from other Pods. If a Pod called web-7b9d4c8d6f-q2xk9 has the IP 10.244.1.23, another Pod may be able to call http://10.244.1.23:8080.

But you should almost never build an application around that.

Pods are replaceable. A Deployment can delete one Pod and create another during a rollout. A node can fail. A readiness probe can remove a Pod from traffic. The replacement Pod gets a different name and usually a different IP. If your client remembers the old Pod IP, it is now holding a stale address.

This is why Services exist. A Service gives clients a stable front door for a changing group of Pods.

The group is selected by labels:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
  template:
    metadata:
      labels:
        app: checkout
    spec:
      containers:
        - name: checkout
          image: ghcr.io/example/checkout:1.0.0
          ports:
            - containerPort: 8080

And the Service points at Pods with the same label:

apiVersion: v1
kind: Service
metadata:
  name: checkout
spec:
  type: ClusterIP
  selector:
    app: checkout
  ports:
    - name: http
      port: 80
      targetPort: 8080

Read that Service slowly. Clients call checkout on port 80. Kubernetes forwards to matching Pods on port 8080. port is the Service port. targetPort is the container port. Mixing those up is one of the most common beginner mistakes.

The Service is stable, the endpoints are not

When you create a Service with a selector, Kubernetes watches for Pods matching that selector. It then records the actual backend addresses in EndpointSlices. Older commands and tutorials often mention Endpoints; newer clusters use EndpointSlices internally, but the idea is the same: “these are the current backend IPs and ports for this Service.”

That gives you a useful debugging path:

kubectl get svc checkout
kubectl get pods -l app=checkout -o wide
kubectl get endpointslice -l kubernetes.io/service-name=checkout
kubectl describe svc checkout

If the Service exists but has no endpoints, networking is not your first suspect. The Service is not selecting ready Pods. Check labels, namespaces, readiness, and ports.

Useful checks:

kubectl get pods --show-labels
kubectl describe pod <pod-name>
kubectl get pods -l app=checkout

A Service selector must match labels on the Pod template, not the Deployment object alone. This is subtle. You can put labels on the Deployment metadata and still forget them under spec.template.metadata.labels. The Service does not send traffic to Deployments. It sends traffic to Pods.

What ClusterIP really means

The default Service type is ClusterIP. Kubernetes allocates a virtual IP for the Service, for example:

kubectl get svc checkout

You might see:

NAME       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)
checkout   ClusterIP   10.96.42.117   <none>        80/TCP

That CLUSTER-IP is stable for the lifetime of the Service. It is not a Pod IP. It is not bound to one container. In many clusters, kube-proxy or an equivalent dataplane programs rules on nodes so traffic to that virtual IP is distributed to one of the ready backends.

As a beginner, you do not need to memorize every iptables or eBPF implementation detail. But the mental model matters: the ClusterIP is a stable virtual address inside the cluster. It only works from places that can reach the cluster network. Your laptop usually cannot call it directly.

So if curl http://10.96.42.117 fails from your terminal, that does not prove the Service is broken. Try from inside the cluster:

kubectl run tmp-shell --rm -it --image=curlimages/curl -- sh
curl -v http://checkout
curl -v http://checkout.default.svc.cluster.local

Use a temporary debug Pod carefully in shared clusters. Some organizations restrict ad-hoc Pods for good reasons. In a local cluster, it is a very useful learning tool.

DNS is the friendly layer over Services

Kubernetes usually runs CoreDNS. When you create a Service named checkout in namespace default, DNS records are created so Pods can resolve it.

From a Pod in the same namespace:

checkout

Usually works.

From another namespace:

checkout.default
checkout.default.svc
checkout.default.svc.cluster.local

The fully qualified form is:

<service-name>.<namespace>.svc.cluster.local

The cluster domain is commonly cluster.local, but it can be configured differently. If you are debugging something odd, inspect DNS from inside a Pod:

kubectl exec -it <pod-name> -- cat /etc/resolv.conf
kubectl exec -it <pod-name> -- nslookup checkout
kubectl exec -it <pod-name> -- nslookup checkout.default.svc.cluster.local

The short name checkout works because Kubernetes sets search domains in /etc/resolv.conf. A Pod in payments might search payments.svc.cluster.local, then svc.cluster.local, then cluster.local. That convenience can also hide namespace mistakes. If you have a checkout Service in two namespaces, the short name resolves to the one in the caller’s namespace first.

When in doubt, use the fully qualified name during debugging. It removes one variable.

Service types are about exposure, not app identity

Beginners often ask, “Should my app be ClusterIP, NodePort, LoadBalancer, or Ingress?” The answer depends on who needs to reach it.

ClusterIP is for internal communication inside the cluster. Most Services should start here.

NodePort opens a port on every node and forwards to the Service. It is useful for learning and some infrastructure cases, but it is not usually how I want to expose a normal web app in production.

LoadBalancer asks the underlying platform, usually a cloud provider, to create an external load balancer pointing at the Service. In local clusters, it may stay pending unless you install something like MetalLB or use a local feature that emulates it.

ExternalName returns a DNS CNAME to an external name. It does not create normal endpoints or proxy traffic.

Ingress is not a Service type. Ingress is a separate API for HTTP routing, usually backed by an ingress controller. A typical production path is:

Internet -> cloud load balancer -> ingress controller -> ClusterIP Service -> Pods

That looks like many layers, but each layer has a job. The Service keeps the backend Pod set stable. The ingress controller understands HTTP hosts and paths. The external load balancer gets traffic into the cluster.

Ports: where small words cause big confusion

Service YAML uses names that are easy to read too quickly:

ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP

port is what clients use on the Service.

targetPort is where traffic lands on the Pod.

containerPort in a Deployment is mostly documentation and metadata for humans and tools. It does not open a port like a firewall rule. Your application still has to listen on that port inside the container.

You can use a named target port:

containers:
  - name: checkout
    image: ghcr.io/example/checkout:1.0.0
    ports:
      - name: http
        containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: checkout
spec:
  selector:
    app: checkout
  ports:
    - port: 80
      targetPort: http

I like named ports when they are used consistently. They make intent clearer and survive some port-number changes better. They also add another thing to spell correctly.

A practical debugging flow

When a Service does not work, I try to avoid guessing. I move from the client toward the backend.

First, confirm where you are testing from. Inside or outside the cluster changes everything.

kubectl config current-context
kubectl config view --minify --output 'jsonpath={..namespace}'

Then check the Service:

kubectl get svc checkout -o wide
kubectl describe svc checkout

Look at selector, type, ClusterIP, and ports.

Then check whether Pods match:

kubectl get pods -l app=checkout -o wide
kubectl get pods --show-labels

Then check endpoints:

kubectl get endpoints checkout -o yaml
kubectl get endpointslice -l kubernetes.io/service-name=checkout -o yaml

If there are no endpoints, check readiness:

kubectl describe pod <pod-name>
kubectl get pod <pod-name> -o jsonpath='{.status.conditions}'

If endpoints exist, test DNS and HTTP from inside the cluster:

kubectl run netcheck --rm -it --image=curlimages/curl -- sh
nslookup checkout
curl -v http://checkout
curl -v http://checkout.default.svc.cluster.local

If DNS resolves but HTTP fails, the problem may be the app, port, protocol, NetworkPolicy, or the app only listening on localhost inside the container. That last one is surprisingly common: a process listening on 127.0.0.1 is not reachable through the Pod IP. It should usually listen on 0.0.0.0.

Common misconceptions

The first misconception: “A Service restarts my Pods.” It does not. Deployments and other controllers manage Pod lifecycle. Services route traffic.

The second: “A Service waits until my application is healthy.” Not by itself. It uses ready endpoints. If you have no readiness probe, a Pod may be considered ready as soon as the container is running. That can be too early.

The third: “DNS means the app is reachable.” DNS only answers the name. A name can resolve to a Service that has no endpoints, wrong ports, blocked traffic, or an application returning errors.

The fourth: “NodePort is the normal next step after ClusterIP.” Sometimes in tutorials, yes. In production, often no. Many clusters use Ingress, Gateway API, service meshes, or cloud load balancers depending on platform standards.

The fifth: “Namespaces are just folders.” They affect DNS search behavior, policy, RBAC, quotas, and how you address Services. A missing namespace in a command is not a small detail.

The mental model I keep

Pods are cattle with IP addresses. Services are stable contracts over changing Pod sets. EndpointSlices are the current passenger list behind that contract. DNS gives the contract a name humans and applications can use.

When something fails, do not start with “Kubernetes networking is broken.” Start with:

Am I testing from the right place?
Does the Service exist in the namespace I think it does?
Does the selector match ready Pods?
Do EndpointSlices contain backend addresses?
Does DNS resolve the name I am using?
Is the application listening on the expected port and interface?
Is policy or ingress/load-balancer configuration blocking the path?

That sequence will not solve every networking problem. Real clusters add CNI plugins, network policies, ingress controllers, service meshes, cloud load balancers, and sometimes old decisions nobody wants to touch. But this baseline prevents a lot of fog.

Kubernetes networking is not magic. It is a set of contracts and translations. Once you learn which object owns which part of the path, the system becomes much easier to question.