[{"content":"I keep coming back to dev containers because they remove a lot of the friction around setup, tooling drift, and \u0026ldquo;works on my machine\u0026rdquo; nonsense.\nMy rule is pretty simple: if a new project or idea requires me to install real tooling, it gets its own dev container.\nProblem I’ve been using dev containers since the early VS Code Remote Containers days, mostly because they solved a very real problem in the work I was doing at the time.\nI was an automation solution architect, which meant building demos and proofs of concept that had to line up with whatever the customer needed. That usually meant moving between Python and JavaScript, often in the same project, and doing it fast enough to keep the demo moving.\nPython was especially annoying. I would constantly run into Python 2 versus Python 3 confusion, and half the time I had to stop and check what version I was actually using. On top of that, library conflicts were everywhere. If I forgot to create or activate a venv, the environment would slowly turn into a mess. Unless something like direnv handled it for me, that usually happened eventually.\nInstalling tools directly on my laptop made things worse. Every new demo, every new POC, and every new dependency added more version clashes and more cleanup later. Reproducing the same setup on another machine was painful, and sharing an environment with a colleague or a customer was even worse.\nThat was the real problem: the work itself was already moving fast, and my local environment kept getting in the way.\nWhy It Matters Dev containers took my pain away by letting me define the environment once and stop rebuilding it manually every time.\nThe repository became the environment contract:\nsame tooling versions same CLIs and extensions ssame shell environment and bootstrap process It also reduced the classic “works on my machine” onboarding problem. Instead of maintaining setup docs that slowly drift over time, new environments become mostly self-service.\nFor example, some of my projects need Hugo, Wrangler, Node.js, GitHub CLI, Kubernetes CLIs, and Codex installed together. I do not want that stack leaking into every machine I use.\nBecause the environment lives in the repository and runs in a container, it depends far less on whatever random tooling already exists on the host machine.\nAnother thing I ended up appreciating later is that the same dev container definition also works well in remote environments like GitHub Codespaces. Once the environment is defined properly, moving between local and remote development becomes much easier.\nThe Core Idea What finally clicked for me was treating the environment as part of the project instead of something I manually rebuilt every time I changed machines or started a new demo.\nInstead of treating setup as a separate tribal-knowledge step, the repo becomes the source of truth. The container image, the extensions, the shell tools, the language runtimes, and the startup behavior all live in one place.\nThat changes the workflow in a useful way:\nopen the repo start the dev container get the same baseline environment every time No guessing what is installed locally. No manually rebuilding the same setup for each project. No trying to remember which version of Python, Node.js, or some CLI tool that particular demo depended on.\nFor me, that was the real value. The environment stopped being a side problem and became part of the project itself.\nHow I Use Them A dev container keeps the environment isolated from the start. If I need Python, Node.js, CLIs, or anything else that could turn into version drift later, I prefer putting it in the container instead of polluting my laptop.\nNot every project needs that. If it is just a simple HTML page with static JavaScript, I usually skip the dev container entirely. In that case, I just use a folder and run the Live Server extension in VS Code. That is enough for quick static work, and adding a container would just be unnecessary overhead.\nThe nice thing is that the dev container ecosystem has grown a lot. You can usually find something close to what you need already. If something is missing, you can add it in the Dockerfile or extend the setup in devcontainer.json.\nflowchart LR repo[Git Repository] devcontainer[devcontainer.json] vscode[VS Code] runtime[Docker or Podman] environment[Development Environment] repo --\u0026gt; devcontainer vscode --\u0026gt; devcontainer devcontainer --\u0026gt; runtime runtime --\u0026gt; environmentTo keep it simple, I also rely on a couple of basics:\nDev Containers extension in VS Code A container runtime installed on the machine Note: Dev Containers do not require Docker specifically. Anything implementing the container runtime interface properly, like Podman, can work too. (More details) Once those pieces are in place, the workflow stays consistent across projects instead of changing every time I start something new.\nWhat I Learned The biggest thing I learned is that dev containers work best when you keep them boring.\n1. Use Dev Container features Use the features option as much as possible. It covers a lot of common setup without making you maintain everything yourself. If you need extra tools, check the features catalog first before you start writing custom installation steps.\n2. Persistence is tricky Persistence is the other thing to think about. Tools like GitHub CLI, Codex, or anything else that stores auth state only inside the container can require you to log in again after rebuilds, unless you mount that state or the tool integrates with host credential forwarding.\nYour code is different. When you open a local folder in a dev container, the project itself is usually safe because it is bind mounted from the folder you started with, so the repository stays on disk even if the container goes away.\n3. Clean up stale containers and volumes One downside is that it is easy to accumulate orphaned containers and volumes over time. If you use dev containers a lot, it is worth checking cleanup every once in a while instead of letting old environments pile up.\nUsually this is enough:\ndocker system prune -f And sometimes:\ndocker volume prune -f 4. Reduce building time If the dev container keeps rebuilding from scratch and it starts taking too long, then it is usually worth moving the static parts into a Dockerfile so Docker can cache them as layers. That keeps rebuilds more predictable and saves time when you are working on the same project repeatedly.\nTL;DR Dev containers save me from turning my laptop into a dependency graveyard.\nThey give me a reproducible environment, make onboarding easier, and keep demos and POCs from depending on whatever random tools I happened to install last week.\nIf a project needs real tooling, I start with a dev container. If it is just a static page, I keep it simple and skip it. You do not need to rebuild your entire workflow overnight.\nTry it on the next project that requires Kubernetes tooling, Python, Node.js, cloud CLIs, or anything that normally pollutes your laptop. That is usually the point where dev containers immediately start making sense.\nVS Code Dev Containers: https://code.visualstudio.com/docs/devcontainers/containers ","permalink":"https://joseluisgomez.com/posts/2026-05-17-why-i-use-devcontainers-every-day/","summary":"Why dev containers became my default workflow for demos, cloud native projects, and avoiding local environment drift.","title":"Why I Use Dev Containers Every Day"},{"content":"After automating certificates with cert-manager , I realized I still had another repetitive problem.\nEvery new application meant opening Cloudflare, creating DNS records manually, copying ingress IPs, and eventually cleaning up stale entries later. That gets annoying quickly when you constantly rebuild clusters, deploy temporary apps, or work with ephemeral demo environments.\nI wanted DNS records to behave the same way Kubernetes infrastructure behaves: declarative, automatic, and disposable.\nThat’s exactly what ExternalDNS solves.\nIn my case, I’m using the Nutanix Kubernetes Platform (NKP) , where ExternalDNS is available directly from the application catalog in NKP Pro and Ultimate.\nThe same concepts still apply to any Kubernetes distribution.\ncert-manager vs ExternalDNS A common misconception is assuming cert-manager and ExternalDNS solve the same problem.\nThey don’t.\nComponent Responsibility cert-manager Certificates ExternalDNS DNS records Ingress Controller Traffic routing ExternalDNS watches Kubernetes resources and automatically creates DNS records in your DNS provider. That can include Ingress resources, Gateway API resources, or Services of type LoadBalancer.\nIn this example, I’m using Cloudflare as the DNS provider.\nflowchart LR ingress[Ingress] externaldns[ExternalDNS] cloudflare[Cloudflare DNS] ingress --\u0026gt;|\u0026#34;1 - Detect hostname\u0026#34;| externaldns externaldns --\u0026gt;|\u0026#34;2 - Create DNS record\u0026#34;| cloudflare Create a Cloudflare API Token In Cloudflare Dashboard :\nGo to: Manage account, on the left sidebar menu Account API tokens Create Token Use: DNS → Edit Avoid using the Global API Key and scope the token only to the DNS zone(s) you actually need.\nEven though cert-manager and ExternalDNS can technically share the same token, I usually prefer using separate tokens for operational separation.\nCreate the Cloudflare Secret Before enabling ExternalDNS in NKP, create the Cloudflare API token secret in the workspace namespace of each target cluster.\nkubectl --kubeconfig=\u0026lt;selected-cluster\u0026gt; \\ --namespace \u0026lt;workspace-namespace\u0026gt; \\ create secret generic cloudflare-api-token \\ --from-literal=cloudflare_api_token=\u0026lt;secret\u0026gt; In NKP, workspace-scoped applications propagate to the clusters attached to that workspace, which makes this a convenient place to manage shared platform services like ExternalDNS.\nEnable ExternalDNS in NKP In NKP Pro and Ultimate, ExternalDNS can be enabled directly from the application catalog.\nNKP exposes the configuration through application overrides, which can be configured globally at the workspace level or customized per cluster.\nThe same configuration values can also be used with NKP Starter, but ExternalDNS must be deployed and managed manually instead of through the NKP application catalog. That also means the ExternalDNS lifecycle becomes the user\u0026rsquo;s responsibility.\nFor this setup, I’m using a combination of global workspace overrides and per-cluster overrides.\nThe Workspace Application Configuration Override is:\npolicy: sync domainFilters: - example.com provider: cloudflare cloudflare: secretName: cloudflare-api-token proxied: false sources: - service - ingress - gateway-httproute In this setup, gateway-httproute enables Gateway API support in addition to traditional Ingress resources, policy: sync allows ExternalDNS to fully reconcile records instead of only creating them, and proxied: false makes sense for internal or private environments where the services are not publicly reachable through Cloudflare Proxy.\nShow screenshot with example Since I have a couple of NKP clusters in my Workspace, I want to set a custom value using the Cluster Application Configuration Override:\ntxtOwnerId: \u0026lt;cluster-name\u0026gt; txtOwnerId helps identify which ExternalDNS instance owns the records Show screenshot with example After enabling the application, verify the deployment:\nkubectl --namespace \u0026lt;workspace-namespace\u0026gt; get deployment external-dns Create an Ingress Now create an ingress with a hostname annotation.\nCreate ingress.yaml:\napiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: demo annotations: external-dns.alpha.kubernetes.io/hostname: demo.example.com spec: ingressClassName: kommander-traefik rules: - host: demo.example.com http: paths: - path: / pathType: Prefix backend: service: name: demo-service port: number: 80 Note: If you are using a different Kubernetes distribution, change the ingressClassName. Also, you don\u0026rsquo;t need a running application with a service to test ExternalDNS. Just creating the Ingress is enough.\nApply it:\nkubectl apply -f ingress.yaml Watch ExternalDNS Create the Record Check the logs:\nkubectl -n \u0026lt;workspace-namespace\u0026gt; logs deploy/external-dns -f Within a few seconds, you should see ExternalDNS detecting the hostname and creating the DNS record in Cloudflare.\ntime=\u0026#34;2026-05-14T15:58:05Z\u0026#34; level=info msg=\u0026#34;All records are already up to date\u0026#34; time=\u0026#34;2026-05-14T15:59:06Z\u0026#34; level=info msg=\u0026#34;Changing record.\u0026#34; action=CREATE record=demo.example.com ttl=1 type=A zone=485bbd1ca5c0f55aed6707d8d52d2a46 time=\u0026#34;2026-05-14T15:59:06Z\u0026#34; level=info msg=\u0026#34;Changing record.\u0026#34; action=CREATE record=a-demo.example.com ttl=1 type=TXT zone=485bbd1ca5c0f55aed6707d8d52d2a46 time=\u0026#34;2026-05-14T16:00:08Z\u0026#34; level=info msg=\u0026#34;All records are already up to date\u0026#34; You can also verify directly from Cloudflare or using dig:\ndig demo.example.com At this point:\nKubernetes created the Ingress ExternalDNS detected the hostname Cloudflare received the DNS record The application became reachable through the new FQDN Ownership TXT Records You may notice ExternalDNS also creates TXT records automatically.\nThese ownership records help ExternalDNS track which records it manages and avoid conflicts between multiple clusters or ExternalDNS instances sharing the same DNS zone. That’s expected behavior.\nGateway API Support This example used a traditional Ingress resource to keep the workflow compact and easy to follow, but ExternalDNS also supports Gateway API resources through the gateway-httproute source configured earlier.\nThat becomes especially useful if you are already moving toward Gateway API for Kubernetes traffic management.\nTL;DR ExternalDNS automatically creates and manages DNS records directly from Kubernetes resources.\nFor me, this removed another repetitive operational task from rebuilding Kubernetes demo and lab environments: manually creating DNS records, copying ingress IPs, and cleaning up stale DNS entries later.\nNow DNS behaves much more like Kubernetes infrastructure itself: declarative, automated, and disposable.\nIn the next post, we’ll connect ExternalDNS and cert-manager together so applications automatically get:\nDNS records Trusted HTTPS certificates Kubernetes TLS secrets ","permalink":"https://joseluisgomez.com/posts/2026-05-15-automate-dns-creation-with-cloudflare-and-externaldns/","summary":"Automate Kubernetes DNS records in Cloudflare using ExternalDNS instead of manually creating and deleting DNS entries for every application.","title":"Automatically Create Cloudflare DNS Records from Kubernetes with ExternalDNS"},{"content":"As someone constantly building demo environments and recording videos publicly, browser certificate warnings were driving me insane. Every new cluster meant:\nRequesting certificates manually Copying TLS secrets around Checking expiration dates Because I work a lot with ephemeral Kubernetes environments, repeating the same certificate setup process across clusters adds up quickly.\nThat’s where cert-manager completely changed the workflow for me. Now I keep a small set of Kubernetes manifests ready to go:\nDeploy cluster Configure Cloudflare token Apply issuer Request certificate This removed the repetitive manual work of requesting, renewing, and reconfiguring certificates every time I built a new Kubernetes environment.\nIn my case, my demos are based in the Nutanix Kubernetes Platform (NKP) , where cert-manager is already included across Starter, Pro, and Ultimate editions, but the same setup works with any Kubernetes distribution.\ncert-manager vs ExternalDNS One of the biggest misconceptions around cert-manager is assuming it also manages your application DNS records.\nIt doesn’t.\ncert-manager is responsible for:\nRequesting certificates from Let\u0026rsquo;s Encrypt Creating temporary TXT records in Cloudflare for DNS validation Renewing certificates automatically You still have to manage your application DNS records separately, either manually or through automation. For the latter, I\u0026rsquo;ll be writing another blog on how ExternalDNS solves that problem.\nflowchart LR certmanager[cert-manager] cloudflare[Cloudflare DNS] letsencrypt[Let\u0026#39;s Encrypt] secret[Kubernetes TLS Secret] certmanager --\u0026gt;|\u0026#34;1 - Create TXT record\u0026#34;| cloudflare cloudflare --\u0026gt;|\u0026#34;2 - Validate domain\u0026#34;| letsencrypt letsencrypt --\u0026gt;|\u0026#34;3 - Issue certificate\u0026#34;| certmanager certmanager --\u0026gt;|\u0026#34;4 - Store TLS secret\u0026#34;| secret Verify cert-manager Exists If you’re using NKP, cert-manager is already deployed.\nVerify it exists:\nkubectl get pods -n cert-manager You should see:\ncert-manager cert-manager-cainjector cert-manager-webhook If you’re using another Kubernetes distribution, install cert-manager first from the upstream project.\ncert-manager Installation Documentation Create a Cloudflare API Token In Cloudflare Dashboard :\nGo to: Manage account, on the left sidebar menu Account API tokens Create Token Use: DNS → Edit Avoid using the Global API Key and scope the token only to the DNS zone(s) you actually need.\nStore the Token in Kubernetes The token must be created in the cert-manager namespace to ensure that cert-manager has access to the secret when you create the ClusterIssuer.\nkubectl create secret generic cloudflare-api-token-secret \\ -n cert-manager \\ --from-literal=api-token=\u0026#39;YOUR_CLOUDFLARE_API_TOKEN\u0026#39; Create the ClusterIssuer A ClusterIssuer is a cluster-scoped cert-manager resource that defines how certificates should be requested, which ACME server to use, and how domain validation should happen. Because it is cluster-scoped, applications and users from any namespace can later reference it when requesting certificates.\nCreate clusterissuer.yaml and update the value for email:\napiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-cloudflare spec: acme: email: you@example.com server: https://acme-v02.api.letsencrypt.org/directory privateKeySecretRef: name: letsencrypt-cloudflare solvers: - dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token-secret key: api-token Note: privateKeySecretRef is where cert-manager stores the ACME account private key used to communicate with Let\u0026rsquo;s Encrypt. This is not your application TLS certificate secret.\nApply it:\nkubectl apply -f clusterissuer.yaml Verify it becomes ready:\nkubectl get clusterissuer Request a Certificate At this point you have two common options. You can request a wildcard certificate and reuse it across multiple apps, or you can request a dedicated certificate per FQDN.\nFor homelabs and demo environments, I usually prefer the wildcard approach because it reduces the number of certificates I need to manage.\napiVersion: cert-manager.io/v1 kind: Certificate metadata: name: wildcard-homelab namespace: cert-manager spec: secretName: wildcard-homelab-tls issuerRef: name: letsencrypt-cloudflare kind: ClusterIssuer dnsNames: - \u0026#34;*.homelab.example.com\u0026#34; This stores the wildcard TLS secret, wildcard-homelab-tls, in the cert-manager namespace. That can make sense if you treat it as shared platform infrastructure and want to reuse it across multiple applications or namespaces.\nThe other option is to request a certificate directly in the application namespace:\napiVersion: cert-manager.io/v1 kind: Certificate metadata: name: demo-app namespace: demo spec: secretName: demo-app-tls issuerRef: name: letsencrypt-cloudflare kind: ClusterIssuer dnsNames: - demo.homelab.example.com This creates the TLS secret demo-app-tls in the same namespace as the app, which is usually cleaner for application-specific certificates because the Ingress or Gateway can reference the secret locally.\nVerify the TLS Secret Watch the certificate:\nkubectl get certificate -A -w The full process usually takes around a minute while cert-manager creates the temporary TXT record in Cloudflare, waits for DNS propagation, and Let\u0026rsquo;s Encrypt verifies the challenge before issuing the certificate.\nThen verify the generated secret exists in the namespace where the Certificate resource was created:\nkubectl get secret -n cert-manager Or for an application-specific certificate:\nkubectl get secret -n demo You should see your generated TLS secret:\nwildcard-homelab-tls or:\ndemo-app-tls At this point:\ncert-manager requested the certificate Cloudflare handled DNS validation Let\u0026rsquo;s Encrypt issued the certificate Kubernetes stored it as a TLS secret in the target namespace What I Learned cert-manager can automatically generate and renew trusted Let\u0026rsquo;s Encrypt certificates directly from Kubernetes using DNS validation.\nIn this example I used Cloudflare , but cert-manager supports many other DNS providers as well. You can check the supported providers in the cert-manager documentation: https://cert-manager.io/docs/configuration/acme/dns01/ For me, this removed one of the most repetitive parts of rebuilding Kubernetes demo and lab environments:\nmanually requesting certificates dealing with browser warnings repeating the same TLS setup process on every new cluster Now it’s just:\napply manifests wait for the TLS secret use HTTPS In the next post, we’ll automate the other half of the workflow: Creating DNS records automatically with ExternalDNS in Kubernetes.\n","permalink":"https://joseluisgomez.com/posts/2026-05-13-generate-letsencrypt-certificate-kubernetes-cloudflare/","summary":"Automate trusted Let\u0026rsquo;s Encrypt certificates in Kubernetes using cert-manager and Cloudflare DNS validation without manually managing TLS secrets anymore.","title":"Generate Let's Encrypt Certificates in Kubernetes with cert-manager and Cloudflare DNS"},{"content":"For a long time I wanted to securely expose APIs and webhooks behind Cloudflare Access without opening them publicly or disabling authentication entirely.\nThe problem was always machine-to-machine traffic.\nCloudflare Access works great when a human is opening an application through a browser and authenticating with SSO, email OTP, or another login method. But automation tools, webhooks, or agents do not handle interactive authentication flows very well.\nI finally spent some time testing this properly, and it ended up being much simpler than I expected.\nThe workflow, an Apple Shortcut triggering a webook in self-hosted n8n, behind Cloudflare Access and Cloudflare Tunnel.\nThe Idea Cloudflare Access supports non-interactive authentication using Service Tokens .\nInstead of authenticating through the browser, the client sends two HTTP headers:\nCF-Access-Client-Id CF-Access-Client-Secret Cloudflare validates the headers before forwarding the request to the protected application.\nArchitecture The important part is that the application still remains protected behind Cloudflare Access. You are not creating a public bypass or exposing a separate unauthenticated endpoint just for automation traffic.\nCreating the Service Token In Cloudflare Zero Trust go to:\nAccess controls → Service credentials → Service Tokens Create a new Service Token.\nExample:\nService token name: n8n-shortcuts Service Token Duration: 1 year Cloudflare generates two values:\nCF-Access-Client-Id CF-Access-Client-Secret Save both securely because the secret will not be shown again afterwards.\nAllow the Service Token Next, open the protected application:\nAccess controls → Applications Edit the application and create a new policy:\nAction: Service Auth Include: Selector → Service Token Select the Service Token you created earlier.\nAfter saving the policy, requests containing the correct headers can authenticate programmatically without going through the interactive login portal.\nTesting with curl Before testing this with Apple Shortcuts or n8n, I first validated the flow with curl:\ncurl -X POST https://n8n.example.com/webhook/test \\ -H \u0026#34;CF-Access-Client-Id: YOUR_CLIENT_ID\u0026#34; \\ -H \u0026#34;CF-Access-Client-Secret: YOUR_CLIENT_SECRET\u0026#34; \\ -H \u0026#34;Content-Type: application/json\u0026#34; \\ -d \u0026#39;{\u0026#34;message\u0026#34;:\u0026#34;hello\u0026#34;}\u0026#39; If the headers are valid, Cloudflare Access forwards the request normally to the backend service.\nOne thing I liked while testing this is that you do not even need a real API behind the tunnel initially. You can simply protect a normal website with Cloudflare Access and use curl with the headers to validate that authentication works end-to-end before involving your automation tooling.\nIf authentication does not work, Cloudflare Zero Trust also makes troubleshooting pretty easy. In the Access controls → Service credentials → Service Tokens page, you can check the Last Seen column for your token. If the timestamp updates after your curl request, Cloudflare is receiving and validating the token, which helps narrow down whether the issue is in Access policies or your backend application.\nWhat I Learned I originally assumed Service Tokens were going to involve something more complex, but they are actually pretty straightforward. Cloudflare Access simply validates the headers and forwards the request if they match a valid token.\nFor homelab and self-hosted setups, this ends up being extremely useful for things like:\nwebhooks automation workflows AI agents internal APIs behind Cloudflare Tunnel I also expected this to be a paid-only enterprise feature, but Cloudflare includes Service Tokens in the Zero Trust free plan. The free tier currently supports up to 50 Service Tokens, which is honestly plenty for most homelab and automation use cases.\nYou keep the service protected while still allowing secure machine-to-machine communication without exposing the application publicly.\n","permalink":"https://joseluisgomez.com/posts/2026-05-10-bypassing-cloudflare-access-with-service-tokens/","summary":"Securely expose self-hosted webhooks and APIs behind Cloudflare Access using Service Tokens for machine-to-machine authentication without making services public.","title":"Bypassing Cloudflare Access for Machine-to-Machine Traffic with Service Tokens"},{"content":"Hi, I’m Jose — I work on Cloud Native, Kubernetes, and AI at Nutanix.\nThis blog is where I document things I learn, problems I solve, and notes I want to remember.\nWhat you\u0026rsquo;ll find here Kubernetes Artificial Intelligence DevOps/Platform Engineering workflows Architecture \u0026amp; Design notes Much of the content is AI-assisted. I use AI tools to help structure, refine, and validate ideas, while keeping the technical direction and conclusions my own.\nWhy this blog I write mostly for myself — to build a reference — but if it helps others, even better.\nContact If something here is useful, feel free to reach out.\n","permalink":"https://joseluisgomez.com/about/","summary":"\u003cp\u003eHi, I’m Jose — I work on Cloud Native, Kubernetes, and AI at Nutanix.\u003c/p\u003e\n\u003cp\u003eThis blog is where I document things I learn, problems I solve, and notes I want to remember.\u003c/p\u003e\n\u003ch2 id=\"what-youll-find-here\"\u003eWhat you\u0026rsquo;ll find here\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eKubernetes\u003c/li\u003e\n\u003cli\u003eArtificial Intelligence\u003c/li\u003e\n\u003cli\u003eDevOps/Platform Engineering workflows\u003c/li\u003e\n\u003cli\u003eArchitecture \u0026amp; Design notes\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eMuch of the content is AI-assisted. I use AI tools to help structure, refine, and validate ideas, while keeping the technical direction and conclusions my own.\u003c/p\u003e","title":"About"},{"content":"Sometimes you just want one specific node gone.\n⚠️ Anti-Pattern Alert!\nYes, in theory nodes are cattle, not pets. But in real environments—especially in enterprise setups—you don’t always have that luxury. Maybe a node is flaky, maybe it’s tied to something broken, or maybe you just know that one needs to go.\nHere’s how to do it in a Cluster API (CAPI) cluster without it coming back.\nThe problem CAPI is declarative. If your cluster says “I want N nodes,” it will keep N nodes.\nSo if you run:\nkubectl delete node \u0026lt;node\u0026gt; that node will come back. The same happens if you delete the Machine object—CAPI will simply recreate it to match the desired state.\nThe trick The only reliable way to control which node gets removed is to mark it before reducing capacity.\nIf you scale down first, CAPI will pick a Machine on its own, and at that point you’ve already lost control.\nSo the flow is simple:\nflowchart TD A[Find target node] --\u0026gt; B[Find corresponding Machine] B --\u0026gt; C[Annotate Machine for deletion] C --\u0026gt; D{Using Cluster Autoscaler?} D --\u0026gt;|Yes| E[Update min/max size] D --\u0026gt;|No| F[Scale MachineDeployment replicas] E --\u0026gt; G[Node is removed] F --\u0026gt; G[Node is removed] Step 1 — Find the Machine Start by listing your nodes and picking the one you want to remove:\nkubectl get nodes Then list the Machines. You’re looking for the Machine that matches your node. The easiest way is to use:\nkubectl get machines -A -o wide This adds a NODE column, so you can match it directly with the node name:\nNAMESPACE NAME NODE default worker-abc123 node-1 The NODE value should match the Kubernetes node you want to remove. If that column is empty, you can check manually:\nkubectl get machine \u0026lt;machine-name\u0026gt; -n \u0026lt;namespace\u0026gt; -o yaml and look for:\nstatus: nodeRef: name: \u0026lt;node-name\u0026gt; At the end of this step, you should have the Machine name and its namespace.\nStep 2 — Annotate the Machine Now mark that Machine for deletion:\nkubectl annotate machine \u0026lt;machine-name\u0026gt; -n \u0026lt;namespace\u0026gt; \\ cluster.x-k8s.io/delete-machine=\u0026#34;true\u0026#34; This tells CAPI: “when you scale down, remove this one.”\nStep 3 — Reduce capacity At this point, you’ve told CAPI which Machine to remove. Now you just need to reduce the desired number of nodes so that a scale-down actually happens.\nHow you do that depends on whether your cluster is using the Cluster Autoscaler.\nIf you’re NOT using autoscaler First, find your MachineDeployment:\nkubectl get machinedeployments -A Then scale it down:\nkubectl scale machinedeployment \u0026lt;md-name\u0026gt; -n \u0026lt;namespace\u0026gt; --replicas=\u0026lt;new-count\u0026gt; If you ARE using autoscaler In this case, you don’t scale replicas directly because the autoscaler controls the node count.\nIf you’re not sure whether autoscaler is enabled, you can check quickly:\nkubectl get machinedeployment \u0026lt;md-name\u0026gt; -n \u0026lt;namespace\u0026gt; -o yaml | grep autoscaler If you see annotations like:\ncluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size then autoscaler is managing your nodes. This is common on platforms like Nutanix Kubernetes Platform.\nInstead of scaling replicas, update those limits:\nkubectl annotate machinedeployment \u0026lt;md-name\u0026gt; -n \u0026lt;namespace\u0026gt; \\ cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size=\u0026#34;\u0026lt;new-min\u0026gt;\u0026#34; \\ cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size=\u0026#34;\u0026lt;new-max\u0026gt;\u0026#34; \\ --overwrite Once the autoscaler decides to scale down, it will pick the Machine you annotated.\nVerify Finally, check that the node is gone and not replaced:\nkubectl get machines -A kubectl get nodes TL;DR flowchart LR A[Find the Machine] --\u0026gt; B[[Annotate it]] B --\u0026gt; C[Then reduce capacity] No autoscaler → scale the MachineDeployment Autoscaler → update min/max instead If you do it in that order, CAPI will remove the node you chose—and it won’t come back.\n","permalink":"https://joseluisgomez.com/posts/2026-05-05-removing-a-specific-worker-node-in-a-capi-cluster/","summary":"Learn how to safely remove a specific worker node in a Cluster API (CAPI) Kubernetes cluster without it being recreated.","title":"Removing a Specific Worker Node in a CAPI Cluster"}]