使用ARM模板部署AKS群集偶尔会因PutNetworkSecurityGroupOperation错误而失败

问题描述 投票:1回答:1

我正在使用Azure模板部署AKS群集。大部分时间部署AKS群集都成功了。但有时,使用相同的输入,部署会因Operation PutNetworkSecurityGroupOperation (XXXXXXXX) was canceled and superseded by operation PutNetworkSecurityGroupOperation而失败。 azure模板和部署错误包含在下面。什么可能导致这个问题?

Template

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "resourceGroupName": {
      "type": "string",
      "metadata": {
        "description": "The resource group name."
      }
    },
    "subscriptionId": {
      "type": "string",
      "metadata": {
        "description": "The subscription id."
      }
    },
    "region": {
      "type": "string",
      "metadata": {
        "description": "The region of AKS resource."
      }
    },
    "gbPerNode": {
      "type": "int",
      "defaultValue": 20,
      "metadata": {
        "description": "Disk size (in GB) to provision for each of the agent pool nodes. This value ranges from 0 to 1023. Specifying 0 will apply the default disk size for that agentVMSize."
      },
      "minValue": 1,
      "maxValue": 1023
    },
    "numNodes": {
      "type": "int",
      "defaultValue": 3,
      "metadata": {
        "description": "The number of agent nodes for the cluster."
      },
      "minValue": 1,
      "maxValue": 50
    },
    "machineType": {
      "type": "string",
      "defaultValue": "Standard_D2_v2",
      "metadata": {
        "description": "The size of the Virtual Machine."
      }
    },
    "servicePrincipalClientId": {
      "metadata": {
        "description": "Client ID (used by cloudprovider)"
      },
      "type": "securestring"
    },
    "servicePrincipalClientSecret": {
      "metadata": {
        "description": "The Service Principal Client Secret."
      },
      "type": "securestring"
    },
    "osType": {
      "type": "string",
      "defaultValue": "Linux",
      "allowedValues": [
        "Linux"
      ],
      "metadata": {
        "description": "The type of operating system."
      }
    },
    "kubernetesVersion": {
      "type": "string",
      "defaultValue": "1.11.5",
      "metadata": {
        "description": "The version of Kubernetes."
      }
    },
    "maxPods": {
      "type": "int",
      "defaultValue": 30,
      "metadata": {
        "description": "Maximum number of pods that can run on a node."
      }
    }
  },
  "variables": {
    "deploymentEventTopic": "deploymenteventtopic",
    "resourceGroupName": "[parameters('resourceGroupName')]",
    "omswsName": "[concat('omsws-', parameters('resourceGroupName'))]",
    "clustername": "cluster"
  },
  "resources": [
    {
      "apiVersion": "2018-03-31",
      "type": "Microsoft.ContainerService/managedClusters",
      "location": "[parameters('region')]",
      "name": "[variables('clustername')]",
      "properties": {
        "kubernetesVersion": "[parameters('kubernetesVersion')]",
        "enableRBAC": true,
        "dnsPrefix": "clust",
        "addonProfiles": {
          "httpApplicationRouting": {
            "enabled": true
          },
          "omsagent": {
            "enabled": false
          }
        },
        "agentPoolProfiles": [
          {
            "name": "agentpool",
            "osDiskSizeGB": "[parameters('gbPerNode')]",
            "count": "[parameters('numNodes')]",
            "vmSize": "[parameters('machineType')]",
            "osType": "[parameters('osType')]",
            "storageProfile": "ManagedDisks"
          }
        ],
        "servicePrincipalProfile": {
          "ClientId": "[parameters('servicePrincipalClientId')]",
          "Secret": "[parameters('servicePrincipalClientSecret')]"
        },
        "networkProfile": {
          "networkPlugin": "kubenet"
        }
      }
    }
  ]
}

Error

{
   "code":"DeploymentFailed",
   "message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
   "details":[
      {
         "code":"Conflict",
         "message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Canceled\",\r\n \"message\": \"Operation was canceled.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Canceled\",\r\n \"message\": \"Operation was canceled.\",\r\n \"details\": [\r\n {\r\n \"code\": \"CanceledAndSupersededDueToAnotherOperation\",\r\n \"message\": \"Operation PutNetworkSecurityGroupOperation (XXXXXXX) was canceled and superseded by operation PutNetworkSecurityGroupOperation (XXXXX).\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"
      }
   ]
}
azure azure-resource-manager azure-kubernetes
1个回答
0
投票

原因似乎是部署了启用了httpApplicationRouting路由的AKS集群。要解决此问题,请在模板中部署没有httpApplicationRouting的群集,然后在使用azure java sdk部署群集后以编程方式启用它。

final KubernetesCluster kCluster = serviceManager.kubernetesClusters()
  .getByResourceGroup(resourceGroupName, deploymentName);

final Map<String, ManagedClusterAddonProfile> addonProfileMap = new HashMap<>();
addonProfileMap.put("httpApplicationRouting", 
   new ManagedClusterAddonProfile().withEnabled(true));

kCluster.update()
  .withAddOnProfiles(addonProfileMap)
  .apply();

我用Azure打开了支持票,支持工程师证实这是AKS团队正在修复的错误。因此,如果您不想实施变通方法,应尽快制定解决方案。

© www.soinside.com 2019 - 2024. All rights reserved.