GitOps on OpenShift: a practical guide

GitOps on vanilla Kubernetes is already a discipline. GitOps on OpenShift adds another layer: the cluster is not only Kubernetes. It is Red Hat packaging, operators, Security Context Constraints, Routes, and often a platform team that owns parts of the control plane you never touch.

I use Argo CD on OpenShift the same way I use it elsewhere — desired state in Git, reconciliation in the cluster, drift visible in a UI. The difference is what happens when sync meets platform guardrails. This post is a practical guide for application teams and platform engineers who share a cluster and need GitOps to survive contact with day-two reality.

What OpenShift adds to the GitOps picture

Upstream GitOps assumes you can declare most of what runs and the controller will converge. OpenShift assumes the same, then adds opinionated defaults.

Routes instead of only Ingress for many external entry points.

SCCs that mutate or reject Pod security settings your manifest declares.

Cluster Operators that install and upgrade platform components outside your Application scope.

Managed-cluster policies on ROSA or ARO where some objects are read-only or owned by the cloud provider.

Namespace-scoped quotas and limits enforced before your Deployment ever schedules.

None of this makes GitOps wrong on OpenShift. It means the reconciliation loop has neighbors. Your Application syncs your Deployment; the platform mutates the Pod; the quota controller rejects a scale-up; Argo shows OutOfSync and you need to know which diff matters.

The OpenShift GitOps operator

Red Hat ships OpenShift GitOps as a cluster operator. It installs and manages Argo CD instances — typically a default instance in the openshift-gitops namespace and optionally additional instances for team isolation.

You do not install Argo CD from raw YAML the way a lab cluster might. You install the operator, configure an ArgoCD custom resource, and let OLM handle upgrades. That is a feature: supported lifecycle, integrated SSO patterns, and documented upgrade paths. It is also a boundary: your team’s GitOps tooling is tied to cluster version and operator health.

A minimal ArgoCD instance declaration looks like this:

apiVersion: argoproj.io/v1beta1
kind: ArgoCD
metadata:
  name: openshift-gitops
  namespace: openshift-gitops
spec:
  applicationSet: {}
  resourceTrackingMethod: annotation
  server:
    route:
      enabled: true

The operator creates Deployments, Services, Routes, and RBAC for the Argo CD control plane. Application teams usually interact with the Application and AppProject CRs, not the operator itself — but someone on the platform side should own operator upgrades and instance sizing.

Practical note: know which instance your team uses. Multi-instance setups separate platform Applications from tenant Applications. Connecting to the wrong Argo CD URL is a common onboarding mistake.

Argo CD Applications on OCP

An Application ties a Git source to a cluster destination. On OpenShift the destination is still https://kubernetes.default.svc from inside the cluster, or an external API URL if you run a centralized Argo CD elsewhere.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: shop-api
  namespace: openshift-gitops
spec:
  project: team-payments
  source:
    repoURL: "https://github.com/example-org/shop-api.git"
    targetRevision: main
    path: deploy/overlays/production
  destination:
    server: "https://kubernetes.default.svc"
    namespace: payments-prod
  syncPolicy:
    automated:
      prune: false
      selfHeal: false
    syncOptions:
      - CreateNamespace=true
      - ApplyOutOfSyncOnly=true
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas

Several fields deserve attention on OpenShift:

CreateNamespace=true — convenient for self-service namespaces; coordinate with platform naming and quota policies before enabling broadly.

ApplyOutOfSyncOnly=true — reduces full replace churn; helpful when webhooks and SCC mutations touch many fields.

ignoreDifferences on replicas — common when Horizontal Pod Autoscaler owns replica count. Without it, OutOfSync noise or sync fights with HPA.

automated prune: false — my default on production until the team has reviewed what prune would delete in a shared namespace.

Use oc to inspect what Argo will touch before trusting the UI:

oc get application shop-api -n openshift-gitops -o yaml
oc argocd app diff shop-api --local deploy/overlays/production

Exact CLI wiring depends on your Argo CD CLI login and RBAC. The habit matters more than the command: read the diff in Git and in Argo before sync on production.

AppProject boundaries

AppProject is where multi-tenant GitOps succeeds or leaks. On OpenShift, tie AppProject allowed destinations to namespaces the platform team actually provisioned.

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: team-payments
  namespace: openshift-gitops
spec:
  description: "Payments squad production and staging"
  sourceRepos:
    - "https://github.com/example-org/shop-api.git"
    - "https://github.com/example-org/payments-platform.git"
  destinations:
    - namespace: payments-prod
      server: "https://kubernetes.default.svc"
    - namespace: payments-staging
      server: "https://kubernetes.default.svc"
  clusterResourceWhitelist: []
  namespaceResourceBlacklist:
    - group: ""
      kind: ResourceQuota
    - group: ""
      kind: LimitRange

Blacklisting ResourceQuota and LimitRange from application repos is a pattern I have seen work: platform owns quota objects; applications own Deployments, Services, Routes, and ConfigMaps inside the envelope.

Review AppProject with the same seriousness as cluster RBAC. A misconfigured project can sync cluster-scoped resources or write into a neighbor namespace.

App-of-apps and where caution starts

The app-of-apps pattern bootstraps a tree of Applications from one root Application. Platform teams love it for baseline cluster add-ons. Application teams adopt it to manage many microservices from one repo.

Structure example:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-root
  namespace: openshift-gitops
spec:
  project: team-payments
  source:
    repoURL: "https://github.com/example-org/payments-gitops.git"
    targetRevision: main
    path: apps
  destination:
    server: "https://kubernetes.default.svc"
    namespace: openshift-gitops
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

The apps folder contains more Application manifests — one per service. Clean in diagrams. Fragile in production if you treat it as fire-and-forget.

Cautions I carry from real clusters:

Blast radius. One bad merge in the root repo affects every child Application. Require CODEOWNERS on the root path and separate repos for experimental services if needed.

Prune at the root. Auto-sync with prune on the app-of-apps root has deleted child Applications when someone renamed a folder. I prefer manual sync on roots until the team has run at least one incident drill on accidental prune.

Sync ordering. App-of-apps does not replace sync waves. CRDs, namespaces, operators, then Deployments — order still matters. OpenShift adds Routes and certificates that fail loudly if the backend Service does not exist yet.

Circular ownership. Child Application points at Git; parent Application lives in Git; someone kubectl-patches the parent during an incident — drift at the root is hard to see.

Platform Applications mixed with tenant Applications. Keep cluster baseline (logging agents, monitoring hooks) in a platform-owned app-of-apps with different RBAC than tenant repos.

App-of-apps is a scaling tool, not a substitute for review. Start with a flat list of Applications until pain justifies the tree.

Sync policy versus platform constraints

Auto-sync and self-heal sound virtuous until they fight OpenShift defaults.

SCC admission may change runAsUser, fsGroup, or drop capabilities your manifest set. Argo sees drift. Syncing it back can loop or fail.

Cluster Network Operator and DNS own objects application repos should not touch. Keep them out of Application sources.

Operator-managed operands — Service Mesh, Serverless, custom operators — often add labels, annotations, or sidecars. ignoreDifferences or server-side apply options need periodic review so they do not hide real problems.

Managed OpenShift may forbid or revert changes to security-sensitive resources. Sync succeeds in staging; fails or drifts in production with an opaque message.

Example ignore rule for common Deployment mutation noise:

ignoreDifferences:
  - group: apps
    kind: Deployment
    jqPathExpressions:
      - .spec.template.metadata.annotations
  - group: route.openshift.io
    kind: Route
    jsonPointers:
      - /status

Prefer fixing Git when live state is correct — for example when the platform team patched a Route host for a cutover. Prefer fixing live when Git was wrong — for example when someone merged an image tag typo.

Sync is a production change. Platform constraints are why manual sync on production remains a reasonable default for many teams, even when staging auto-syncs freely.

Drift on managed and shared clusters

Drift is not moral failure. On managed OpenShift it is often expected.

Incident hotfixes — scale, ConfigMap patch, Route weight change — may land before Git catches up.

HPA and cluster autoscaler — replica counts and node counts diverge from manifests by design.

Sealed Secrets or External Secrets — Git stores encrypted or referenced material; live Secrets differ.

Platform maintenance — node drain, operator upgrade, certificate rotation — changes status fields and sometimes spec defaults.

Read-only managed policies — live cluster holds fields your Application cannot write; Argo stays OutOfSync permanently until you ignore or accept the diff.

Questions before clicking Sync:

Did platform or SRE patch this during an incident?
Is the diff only metadata or status?
Will sync restart Pods during business hours?
Does prune remove a shared resource another team relies on?
Is OutOfSync because Git is wrong or because OpenShift is right?

On ROSA and ARO, also ask whether the diff touches a cloud-provider-managed object. Fighting that loop wastes on-call energy.

Weekly drift review beats silent auto-heal: list OutOfSync Applications, assign owners, document whether to fix Git, ignore, or escalate to platform.

Routes, secrets, and manifests that need local knowledge

OpenShift Routes are first-class GitOps objects for many teams:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: shop-api
  namespace: payments-prod
spec:
  host: shop-api.apps.cluster.example.com
  to:
    kind: Service
    name: shop-api
    weight: 100
  port:
    targetPort: http
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect

Quote hosts and URLs. Coordinate host naming with platform DNS patterns before merging.

Secrets rarely belong in plain Git. Use Sealed Secrets, External Secrets Operator, or vault integration your platform documents — then verify the Secret exists in the namespace after sync, not only that the Application is Healthy.

RBAC and who may sync production

OpenShift GitOps integrates with cluster OAuth and Argo CD RBAC policies. Define who may:

create Applications in a namespace
sync production projects
override sync options
delete Applications with cascade prune

If everyone with cluster login can sync production, verify is weakened. Platform teams often expose read-only Argo CD to developers and restrict sync to CI or release engineers.

Document break-glass: who may oc apply during an outage, and the SLA to backport changes into Git afterward. Permanent drift is debt.

A practical rollout path

What I recommend teams try on OpenShift:

Install or inherit OpenShift GitOps — confirm instance URL, SSO, and upgrade owner.
One non-production namespace, one Application, manual sync — learn diff, hooks, and SCC mutations without customer blast radius.
AppProject locked to that namespace — expand destinations only after quota and network policy exist.
CI renders manifests — kustomize build or helm template in the pipeline; attach rendered YAML to PRs.
Add auto-sync on staging — keep production manual or semi-auto until prune behavior is understood.
App-of-apps only when Application count hurts — not on day one.
Drift report weekly — OutOfSync with owners; tune ignoreDifferences on schedule, not reactively forever.

No silver bullet. Iteration beats a big-bang “everything through Git” mandate from a slide deck.

Observability beyond Healthy in the UI

Argo CD Health is necessary, not sufficient.

Watch deployment success rate, error budget burn, and saturation during sync windows. An Application can be Healthy while the Route points at Pods that fail readiness checks after a ConfigMap change.

Correlate merges in the GitOps repo with incident timestamps. If every page follows a platform Application sync, review depth — not tool brand — is the issue.

Closing

GitOps on OpenShift works when teams respect both halves of the equation: Git as intent and the platform as a partner with its own operators, constraints, and upgrade calendar.

Use the OpenShift GitOps operator as supported infrastructure. Treat app-of-apps as a scaling pattern with real prune and ordering risks. Choose sync policies that match production judgment, not only staging speed. Treat drift on managed clusters as signal — verify, then fix Git or fix live with eyes open.

If you run Argo CD on OCP today, pick one production Application and walk through the last sync: what would prune have removed if it were enabled? What is OutOfSync right now, and is OpenShift telling you something Git should learn? Five minutes of verify beats another hour of trusting the green icon.