Skip to content

Update documentation to call out --update-status-on-shutdown for external DNS #1877

@jordanjennings

Description

@jordanjennings

Summary

I think it would be a great idea to have a callout for users of external DNS right on the main README.md that they probably want to set --update-status-on-shutdown=false, or they might experience DNS downtime.

Full details

Is this a BUG REPORT or FEATURE REQUEST?:
Documentation request

NGINX Ingress controller version:
0.9.0

What happened:
DNS records were deleted by external DNS during a cluster rolling update, because nginx ingress controller cleared out the ingress status fields on shutdown. This caused unexpected downtime while DNS was re-propagated after nginx ingress controller came back online and re-updated the ingress status, and then external DNS recreated the DNS records.

What you expected to happen:
No DNS changes when nginx ingress controller is evicted or redployed. From reading the code I can see that there's a flag for this that isn't very well called out --update-on-shutdown and I now see the original issue that requested that flag #881.

How to reproduce it (as minimally and precisely as possible):
Run one instance of nginx ingress controller along with external DNS set to watch ingress, then delete the nginx ingress controller pod. While the pod is shutting down it deletes the ingress status, then external DNS does a DELETE on the DNS records, then once the new nginx ingress pod comes up and becomes leader, it re-applies the ingress status, and external DNS then does a CREATE for the DNS record.

Anything else we need to know:
Even when running more than one nginx ingress controller, the issue came up from time to time when the leader was evicted. One way I can see this happening is if both nginx ingress controllers are scheduled on the same node and that node gets rolled, and the non-leader nginx ingress shuts down more quickly than the leader. That said, I applied pod anti-affinity and was still seeing issues sometimes when the leader was evicted even when another nginx ingress controller was running. Seems like the logic isn't bulletproof in status.go for determining if more than one controller is running. I haven't been able to pinpoint the exact issue there, and once I found the --update-status-on-shutdown flag I stopped investigating.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/docshelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions