-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Dns and outbound connect prechecks #5665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
zhoxing-ms
merged 69 commits into
Azure:main
from
rohan-dassani:dns-and-outbound-connect-prechecks
Jan 30, 2023
Merged
Changes from all commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
314c19d
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani d34a3cb
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani 6842e24
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani 9aafd26
modified: src/connectedk8s/HISTORY.rst
rohan-dassani 69ce330
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 8b7f6d3
modified: src/connectedk8s/HISTORY.rst
rohan-dassani 03e59e8
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani 64c1935
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani c80ae9d
Merge branch 'Azure:main' into dns-and-outbound-connect-prechecks
rohan-dassani 970830d
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani dfa09f1
Merge branch 'dns-and-outbound-connect-prechecks' of https://github.c…
rohan-dassani d92e0f1
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 3560a39
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani 199f8b1
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani a278610
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 07469ae
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 278556d
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani e7896f2
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani 617abfe
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 8379900
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani ad4ed21
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani a290d22
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 500a8ac
modified: src/connectedk8s/azext_connectedk8s/_utils.py
rohan-dassani e1f21a9
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani ddebdc8
Merge branch 'dns-and-outbound-connect-prechecks' of https://github.c…
rohan-dassani 37dc6ac
Your branch is ahead of 'origin/dns-and-outbound-connect-prechecks' b…
rohan-dassani a1b6995
Merge https://github.com/Azure/azure-cli-extensions into dns-and-outb…
rohan-dassani 0cc38b4
modified: src/connectedk8s/HISTORY.rst
rohan-dassani 6810ee1
modified: src/connectedk8s/azext_connectedk8s/_utils.py
rohan-dassani 4ef28a6
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 0d007c2
Merge https://github.com/Azure/azure-cli-extensions into dns-and-outb…
rohan-dassani 9b932c0
modified: src/connectedk8s/setup.py
rohan-dassani c1cc9a4
modified: src/connectedk8s/HISTORY.rst
rohan-dassani 48b64c6
modified: src/connectedk8s/azext_connectedk8s/_utils.py
rohan-dassani 5300ed6
modified: src/connectedk8s/azext_connectedk8s/_utils.py
rohan-dassani cd86de8
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 9f43bc4
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani ad27d1e
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani 7097f17
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani f345579
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani 1904597
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani e8c9e8d
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani 24a6b04
modified: src/connectedk8s/azext_connectedk8s/custom.py
rohan-dassani eb7a337
pass location param to precheckcharts
sikasire 184d305
Merge https://github.com/Azure/azure-cli-extensions into dns-and-outb…
rohan-dassani 36d6bda
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani c76b89b
Merge branch 'dns-and-outbound-connect-prechecks' of https://github.c…
sikasire cae9612
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani bb02d2b
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani fa8846a
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani f4717de
some naming changes
sikasire df6e66c
fix clusterDNS and network check in utils
sikasire 6d2b31f
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 5ca78bb
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani d715923
install precheck chart in azure-arc-release
sikasire 01ecf35
Merge branch 'dns-and-outbound-connect-prechecks' of https://github.c…
sikasire 48c8a03
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani 4ab9ab0
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 84dc48b
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani bbf2192
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani 402b5d8
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 9354118
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani cc6845c
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 551ef0d
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani ca6c97a
modified: src/connectedk8s/azext_connectedk8s/_constants.py
rohan-dassani 840c656
modified: src/connectedk8s/azext_connectedk8s/_utils.py
rohan-dassani 01fd351
modified: src/connectedk8s/azext_connectedk8s/_precheckutils.py
rohan-dassani 871e0ac
add handling based on cloud
sikasire 33ee014
modified: src/connectedk8s/HISTORY.rst
rohan-dassani File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,221 @@ | ||
| # -------------------------------------------------------------------------------------------- | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. See License.txt in the project root for license information. | ||
| # -------------------------------------------------------------------------------------------- | ||
|
|
||
| import os | ||
| import shutil | ||
| import subprocess | ||
| from subprocess import Popen, PIPE | ||
| import time | ||
| import requests | ||
| from requests.adapters import HTTPAdapter | ||
| from urllib3.util.retry import Retry | ||
| import json | ||
| from kubernetes import client, config, watch, utils | ||
| from knack.util import CLIError | ||
| from knack.log import get_logger | ||
| from knack.prompting import NoTTYException, prompt_y_n | ||
| from azure.cli.core.commands.client_factory import get_subscription_id | ||
| from azure.cli.core.util import send_raw_request | ||
| from azure.cli.core import telemetry | ||
| from azure.core.exceptions import ResourceNotFoundError, HttpResponseError | ||
| from msrest.exceptions import AuthenticationError, HttpOperationError, TokenExpiredError | ||
| from msrest.exceptions import ValidationError as MSRestValidationError | ||
| from kubernetes.client.rest import ApiException | ||
| from azext_connectedk8s._client_factory import _resource_client_factory, _resource_providers_client | ||
| import azext_connectedk8s._constants as consts | ||
| import azext_connectedk8s._utils as azext_utils | ||
| from kubernetes import client as kube_client | ||
| from azure.cli.core import get_default_cli | ||
| from azure.cli.core.azclierror import CLIInternalError, ClientRequestError, ArgumentUsageError, ManualInterrupt, AzureResponseError, AzureInternalError, ValidationError | ||
| from argparse import Namespace | ||
| from pydoc import cli | ||
| from logging import exception | ||
| import yaml | ||
| import json | ||
| import datetime | ||
| from subprocess import Popen, PIPE, run, STDOUT, call, DEVNULL | ||
| import shutil | ||
| from knack.log import get_logger | ||
| from azure.cli.core import telemetry | ||
| import azext_connectedk8s._constants as consts | ||
| logger = get_logger(__name__) | ||
| # pylint: disable=unused-argument, too-many-locals, too-many-branches, too-many-statements, line-too-long | ||
| # pylint: disable | ||
|
|
||
| diagnoser_output = [] | ||
|
|
||
|
|
||
| def fetch_diagnostic_checks_results(corev1_api_instance, batchv1_api_instance, helm_client_location, kubectl_client_location, kube_config, kube_context, location, http_proxy, https_proxy, no_proxy, proxy_cert, azure_cloud, filepath_with_timestamp, storage_space_available): | ||
| global diagnoser_output | ||
| try: | ||
| # Setting DNS and Outbound Check as working | ||
| dns_check = "Starting" | ||
| outbound_connectivity_check = "Starting" | ||
| # Executing the cluster_diagnostic_checks job and fetching the logs obtained | ||
| cluster_diagnostic_checks_container_log = executing_cluster_diagnostic_checks_job(corev1_api_instance, batchv1_api_instance, helm_client_location, kubectl_client_location, kube_config, kube_context, location, http_proxy, https_proxy, no_proxy, proxy_cert, azure_cloud) | ||
| # If cluster_diagnostic_checks_container_log is not empty then only we will check for the results | ||
| if(cluster_diagnostic_checks_container_log is not None and cluster_diagnostic_checks_container_log != ""): | ||
| cluster_diagnostic_checks_container_log_list = cluster_diagnostic_checks_container_log.split("\n") | ||
| cluster_diagnostic_checks_container_log_list.pop(-1) | ||
| dns_check_log = "" | ||
| counter_container_logs = 1 | ||
| # For retrieving only cluster_diagnostic_checks logs from the output | ||
| for outputs in cluster_diagnostic_checks_container_log_list: | ||
| if consts.Outbound_Connectivity_Check_Result_String in outputs: | ||
| counter_container_logs = 1 | ||
| elif consts.DNS_Check_Result_String in outputs: | ||
| dns_check_log += outputs | ||
| counter_container_logs = 0 | ||
| elif counter_container_logs == 0: | ||
| dns_check_log += " " + outputs | ||
| dns_check, storage_space_available = azext_utils.check_cluster_DNS(dns_check_log, filepath_with_timestamp, storage_space_available, diagnoser_output) | ||
| outbound_connectivity_check, storage_space_available = azext_utils.check_cluster_outbound_connectivity(cluster_diagnostic_checks_container_log_list[-1], filepath_with_timestamp, storage_space_available, diagnoser_output) | ||
| else: | ||
| return consts.Diagnostic_Check_Incomplete, storage_space_available | ||
|
|
||
| # If both the check passed then we will return cluster diagnostic checks Passed | ||
| if(dns_check == consts.Diagnostic_Check_Passed and outbound_connectivity_check == consts.Diagnostic_Check_Passed): | ||
| return consts.Diagnostic_Check_Passed, storage_space_available | ||
| # If any of the check remain Incomplete than we will return Incomplete | ||
| elif(dns_check == consts.Diagnostic_Check_Incomplete or outbound_connectivity_check == consts.Diagnostic_Check_Incomplete): | ||
| return consts.Diagnostic_Check_Incomplete, storage_space_available | ||
| else: | ||
| return consts.Diagnostic_Check_Failed, storage_space_available | ||
|
|
||
| # To handle any exception that may occur during the execution | ||
| except Exception as e: | ||
| logger.warning("An exception has occured while trying to execute cluster diagnostic checks container on the cluster. Exception: {}".format(str(e)) + "\n") | ||
| telemetry.set_exception(exception=e, fault_type=consts.Cluster_Diagnostic_Checks_Execution_Failed_Fault_Type, summary="Error occured while executing the cluster diagnostic checks container") | ||
|
|
||
| return consts.Diagnostic_Check_Incomplete, storage_space_available | ||
|
|
||
|
|
||
| def executing_cluster_diagnostic_checks_job(corev1_api_instance, batchv1_api_instance, helm_client_location, kubectl_client_location, kube_config, kube_context, location, http_proxy, https_proxy, no_proxy, proxy_cert, azure_cloud): | ||
| job_name = "cluster-diagnostic-checks-job" | ||
| # Setting the log output as Empty | ||
| cluster_diagnostic_checks_container_log = "" | ||
|
|
||
| cmd_helm_delete = [helm_client_location, "uninstall", "cluster-diagnostic-checks", "-n", "azure-arc-release"] | ||
| if kube_config: | ||
| cmd_helm_delete.extend(["--kubeconfig", kube_config]) | ||
| if kube_context: | ||
| cmd_helm_delete.extend(["--kube-context", kube_context]) | ||
|
|
||
| # To handle the user keyboard Interrupt | ||
| try: | ||
| # Executing the cluster diagnostic checks job yaml | ||
| config.load_kube_config(kube_config, kube_context) | ||
| # Attempting deletion of cluster diagnostic checks resources to handle the scenario if any stale resources are present | ||
| response_kubectl_delete_helm = Popen(cmd_helm_delete, stdout=PIPE, stderr=PIPE) | ||
| output_kubectl_delete_helm, error_kubectl_delete_helm = response_kubectl_delete_helm.communicate() | ||
| # If any error occured while execution of delete command | ||
| if (response_kubectl_delete_helm != 0): | ||
| # Converting the string of multiple errors to list | ||
| error_msg_list = error_kubectl_delete_helm.decode("ascii").split("\n") | ||
| error_msg_list.pop(-1) | ||
| valid_exception_list = [] | ||
| # Checking if any exception occured or not | ||
| exception_occured_counter = 0 | ||
| for ind_errors in error_msg_list: | ||
| if('not found' in ind_errors or 'deleted' in ind_errors): | ||
| pass | ||
| else: | ||
| valid_exception_list.append(ind_errors) | ||
| exception_occured_counter = 1 | ||
| # If any exception occured we will print the exception and return | ||
| if exception_occured_counter == 1: | ||
| logger.warning("Cleanup of previous diagnostic checks helm release failed and hence couldn't install the new helm release. Please cleanup older release using \"helm delete cluster-diagnostic-checks -n azuer-arc-release\" and try onboarding again") | ||
| telemetry.set_exception(exception=error_kubectl_delete_helm.decode("ascii"), fault_type=consts.Cluster_Diagnostic_Checks_Release_Cleanup_Failed, summary="Error while executing cluster diagnostic checks Job") | ||
| return | ||
|
|
||
| chart_path = azext_utils.get_chart_path(consts.Cluster_Diagnostic_Checks_Job_Registry_Path, kube_config, kube_context, helm_client_location, consts.Pre_Onboarding_Helm_Charts_Folder_Name, consts.Pre_Onboarding_Helm_Charts_Release_Name) | ||
|
|
||
| helm_install_release_cluster_diagnostic_checks(chart_path, location, http_proxy, https_proxy, no_proxy, proxy_cert, azure_cloud, kube_config, kube_context, helm_client_location) | ||
|
|
||
| # Watching for cluster diagnostic checks container to reach in completed stage | ||
| w = watch.Watch() | ||
| is_job_complete = False | ||
| is_job_scheduled = False | ||
| # To watch for changes in the pods states till it reach completed state or exit if it takes more than 180 seconds | ||
| for event in w.stream(batchv1_api_instance.list_namespaced_job, namespace='azure-arc-release', label_selector="", timeout_seconds=60): | ||
| try: | ||
| # Checking if job get scheduled or not | ||
| if event["object"].metadata.name == "cluster-diagnostic-checks-job": | ||
| is_job_scheduled = True | ||
| # Checking if job reached completed stage or not | ||
| if event["object"].metadata.name == "cluster-diagnostic-checks-job" and event["object"].status.conditions[0].type == "Complete": | ||
| is_job_complete = True | ||
| w.stop() | ||
| except Exception as e: | ||
| continue | ||
| else: | ||
| continue | ||
|
|
||
| if (is_job_scheduled is False): | ||
| telemetry.set_exception(exception="Couldn't schedule cluster diagnostic checks job in the cluster", fault_type=consts.Cluster_Diagnostic_Checks_Job_Not_Scheduled, | ||
| summary="Couldn't schedule cluster diagnostic checks job in the cluster") | ||
| logger.warning("Unable to schedule the cluster diagnostic checks job in the kubernetes cluster. The possible reasons can be presence of a security policy or security context constraint (SCC) or it may happen becuase of lack of ResourceQuota.\n") | ||
| Popen(cmd_helm_delete, stdout=PIPE, stderr=PIPE) | ||
| return | ||
rohan-dassani marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| elif (is_job_scheduled is True and is_job_complete is False): | ||
| telemetry.set_exception(exception="Couldn't complete cluster diagnostic checks job after scheduling in the cluster", fault_type=consts.Cluster_Diagnostic_Checks_Job_Not_Complete, | ||
| summary="Couldn't complete cluster diagnostic checks job after scheduling in the cluster") | ||
| logger.warning("Cluster diagnostics job didn't reach completed state in the kubernetes cluster. The possible reasons can be resource constraints on the cluster.\n") | ||
| Popen(cmd_helm_delete, stdout=PIPE, stderr=PIPE) | ||
rohan-dassani marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| return | ||
| else: | ||
| # Fetching the cluster diagnostic checks Container logs | ||
| all_pods = corev1_api_instance.list_namespaced_pod('azure-arc-release') | ||
| # Traversing through all agents | ||
| for each_pod in all_pods.items: | ||
| # Fetching the current Pod name and creating a folder with that name inside the timestamp folder | ||
| pod_name = each_pod.metadata.name | ||
| if(pod_name.startswith(job_name)): | ||
| # Creating a text file with the name of the container and adding that containers logs in it | ||
| cluster_diagnostic_checks_container_log = corev1_api_instance.read_namespaced_pod_log(name=pod_name, container="cluster-diagnostic-checks-container", namespace='azure-arc-release') | ||
| # Clearing all the resources after fetching the cluster diagnostic checks container logs | ||
| Popen(cmd_helm_delete, stdout=PIPE, stderr=PIPE) | ||
|
|
||
| # To handle any exception that may occur during the execution | ||
| except Exception as e: | ||
| logger.warning("An exception has occured while trying to execute the cluster diagnostic checks in the cluster. Exception: {}".format(str(e)) + "\n") | ||
| Popen(cmd_helm_delete, stdout=PIPE, stderr=PIPE) | ||
| telemetry.set_exception(exception=e, fault_type=consts.Cluster_Diagnostic_Checks_Execution_Failed_Fault_Type, summary="Error while executing cluster diagnostic checks Job") | ||
| return | ||
|
|
||
| return cluster_diagnostic_checks_container_log | ||
|
|
||
|
|
||
| def helm_install_release_cluster_diagnostic_checks(chart_path, location, http_proxy, https_proxy, no_proxy, proxy_cert, azure_cloud, kube_config, kube_context, helm_client_location, onboarding_timeout="60"): | ||
| cmd_helm_install = [helm_client_location, "upgrade", "--install", "cluster-diagnostic-checks", chart_path, "--namespace", "{}".format(consts.Release_Install_Namespace), "--create-namespace", "--output", "json"] | ||
| # To set some other helm parameters through file | ||
| cmd_helm_install.extend(["--set", "global.location={}".format(location)]) | ||
| cmd_helm_install.extend(["--set", "global.azureCloud={}".format(azure_cloud)]) | ||
| if https_proxy: | ||
| cmd_helm_install.extend(["--set", "global.httpsProxy={}".format(https_proxy)]) | ||
| if http_proxy: | ||
| cmd_helm_install.extend(["--set", "global.httpProxy={}".format(http_proxy)]) | ||
| if no_proxy: | ||
| cmd_helm_install.extend(["--set", "global.noProxy={}".format(no_proxy)]) | ||
| if proxy_cert: | ||
| cmd_helm_install.extend(["--set-file", "global.proxyCert={}".format(proxy_cert)]) | ||
|
|
||
| if kube_config: | ||
| cmd_helm_install.extend(["--kubeconfig", kube_config]) | ||
| if kube_context: | ||
| cmd_helm_install.extend(["--kube-context", kube_context]) | ||
|
|
||
| # Change --timeout format for helm client to understand | ||
| onboarding_timeout = onboarding_timeout + "s" | ||
| cmd_helm_install.extend(["--wait", "--timeout", "{}".format(onboarding_timeout)]) | ||
|
|
||
| response_helm_install = Popen(cmd_helm_install, stdout=PIPE, stderr=PIPE) | ||
| _, error_helm_install = response_helm_install.communicate() | ||
| if response_helm_install.returncode != 0: | ||
| if ('forbidden' in error_helm_install.decode("ascii") or 'timed out waiting for the condition' in error_helm_install.decode("ascii")): | ||
| telemetry.set_user_fault() | ||
| telemetry.set_exception(exception=error_helm_install.decode("ascii"), fault_type=consts.Cluster_Diagnostic_Checks_Helm_Install_Failed_Fault_Type, | ||
| summary='Unable to install cluster diagnostic checks helm release') | ||
| raise CLIInternalError("Unable to install cluster diagnostic checks helm release: " + error_helm_install.decode("ascii")) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.