GitOps with ArgoCD Step-by-Step Guide
MOJAHID UL HAQUE
DevOps Engineer
GitOps becomes valuable the moment more than one engineer, cluster, or environment can change deployment state. Without a clear source of truth, each cluster gradually accumulates manual patches, emergency fixes, and configuration drift that nobody fully trusts. ArgoCD addresses that by making Git the declared state and the cluster something that should continuously converge toward what has been approved there.
The point is not only cleaner deployments. The deeper benefit is operational truth. During incidents, auditors and responders can compare live state against Git and know immediately whether the cluster drifted, whether the last release was applied fully, and whether rollback is a matter of reverting configuration instead of reverse engineering what a human changed in the runtime.
Why this matters in production
GitOps matters because environment control is otherwise surprisingly fragile. Teams can have a strong CI system and still suffer from manual kubectl edits, unclear promotion paths, or configuration that drifts differently across staging and production. Once deployment authority moves into reviewed Git changes, release history becomes easier to trust, rollback becomes more deterministic, and platform ownership becomes visible rather than tribal.
Implementation approach
A clean starting model separates application code from deployment configuration. CI builds the image and publishes it. A config repository holds Helm values, Kustomize overlays, or raw manifests per environment. ArgoCD watches those paths and reconciles the cluster to the desired state. Promotion becomes a pull request that updates image digests or manifests in Git, not a direct push into the cluster. That structure gives you clearer approvals and a more reliable audit trail.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payments-staging
namespace: argocd
spec:
project: platform
source:
repoURL: https://github.com/acme/platform-config.git
targetRevision: main
path: apps/payments/overlays/staging
destination:
server: https://kubernetes.default.svc
namespace: payments
syncPolicy:
automated:
prune: true
selfHeal: trueReal-world use case
Imagine a platform team running dev, staging, and production clusters for several services. Developers merge code throughout the day. CI publishes versioned images, then a bot or release job creates a pull request in the config repository to update staging. ArgoCD syncs the change, tests run, and production is updated only after an approved manifest change. If somebody hotfixes a Deployment manually in the cluster, ArgoCD exposes the drift immediately instead of letting the change disappear into operational memory.
Common mistakes and operating risks
The biggest risk is automating reconciliation without designing approval boundaries. Another common mistake is allowing image tags like latest to stand in for real artifact promotion, which weakens traceability. Teams also get into trouble when ArgoCD projects and destinations are too permissive, because one configuration mistake can then affect the wrong namespace or cluster. Strong GitOps is conservative by design: explicit ownership, explicit destinations, explicit promotion.
When this pattern fits best
This pattern fits Kubernetes environments where more than one service, engineer, or environment needs a predictable release model. It is especially valuable for teams that care about drift detection, reproducibility, and environment approvals. It adds less value if deployments are still rare and single-cluster, but the discipline pays off quickly once platform scale increases.
Checklist
- Keep deployment configuration in reviewed Git paths per environment.
- Promote immutable image tags or digests, not moving references.
- Restrict ArgoCD destinations and RBAC aggressively.
- Route drift and sync failures into the normal incident channel.
- Document rollback as a Git operation before production usage scales.
How to roll this out safely
The safest rollout path is usually narrower than teams expect. Start with one service, one environment, or one clear platform boundary and baseline the metrics that matter before changing everything at once. Document ownership, define rollback or fallback behavior, and review the first few changes with the people who will support the system during real incidents. That approach prevents architecture optimism from outpacing operational reality. Mature patterns spread well because they are tested in small steps first, not because they looked complete in a design document.
What to measure after adoption
Success should be visible in operating outcomes, not only in implementation status. Good patterns reduce surprise, shorten diagnosis time, improve release confidence, or create a more predictable cost and performance profile. If the change only adds process, dashboards, or YAML without improving those outcomes, the design is probably too heavy. Measure the behaviors that matter to responders and service owners, then simplify aggressively anywhere the pattern creates ceremony without making production safer or easier to understand.
What teams usually learn after the first real test
The first serious deployment, spike, or incident almost always reveals something the design discussion missed. Maybe ownership was less clear than expected, maybe the observability path was too thin, or maybe the new process worked but took longer than planned because one dependency was not included in the original mental model. That is normal. Production patterns mature when teams capture that feedback immediately and adjust the defaults before the next rollout. In practice, the best patterns are not the most complicated ones. They are the ones that survive contact with real operations and become easier to use with every review.
Ownership and review cadence
Every useful platform practice needs a review loop. After the first few real uses, revisit the pattern with fresh evidence from deployments, incidents, and operator feedback. Ask what was confusing, what created noise, what saved time, and what controls were worth keeping. The strongest engineering patterns usually become smaller and clearer over time because teams trim the parts that do not change behavior. Review cadence turns a one-time implementation into a dependable operating habit.
That final review step is easy to skip when the initial rollout appears successful, but it is usually where the best long-term improvements are found. Small refinements in defaults, ownership, and observability often create more value than another wave of tooling.
A good rule is to treat the first month after adoption as part of the implementation rather than as an afterthought. Watch how the pattern behaves under normal changes, under stress, and during one real support event. If it remains understandable in all three cases, it is probably strong enough to become a team standard.
If the pattern is difficult to explain to a new engineer after that first month, it still needs refinement. Clarity is one of the most reliable indicators that a production practice is ready to scale across teams.
Documentation should evolve along with the pattern. Keep the shortest possible notes that explain ownership, the expected success signals, the rollback or fallback path, and the dashboards or logs responders should check first. Teams often over-document implementation detail and under-document the operational decisions that matter during a real event. A concise, current operating note is usually more valuable than a long design artifact nobody opens once the initial rollout is complete.
That knowledge-transfer step is especially important when more than one team or on-call rotation will depend on the pattern. A practice is not really finished until another engineer can use it confidently without needing the original author in the room.
Continue the thread
Related archive posts that connect this guide back to the original LinkedIn stream.
Next step
Need help with DevOps setup? Contact me.
FAQ
Quick answers to the questions teams usually ask when implementing this pattern.
What problem does GitOps actually solve?
It gives the platform one declared source of truth for deployment state. Instead of clusters drifting through manual changes, the runtime is continuously reconciled to reviewed configuration in Git.
Do I still need CI if I use ArgoCD?
Yes. CI still builds, tests, scans, and publishes artifacts. GitOps takes over environment state management and reconciliation after those artifacts exist.
Should app code and manifests live together?
Either can work, but many teams keep deployment configuration in a separate repo because it makes environment ownership and approval boundaries much clearer.
What is the most common ArgoCD mistake?
Using it like a push-button deployment UI instead of a reconciliation system. Its biggest value is drift detection, declared state, and controlled promotion through Git.
Related Posts
Canary Deployment Strategy Guide
Use canary deployments safely with staged traffic shifts, success criteria, observability, and rollback rules for real production services.
Advanced CI/CD Pipeline with GitHub Actions and Docker
Build a production-ready CI/CD pipeline with GitHub Actions and Docker, including secure image promotion, caching, rollout gates, and rollback strategy.
Blue-Green Deployment Explained
A practical blue-green deployment guide covering routing, database safety, rollback timing, health checks, and where the strategy works best.