我正在使用Azure模板部署AKS群集。大部分时间部署AKS群集都成功了。但有时,使用相同的输入,部署会因Operation PutNetworkSecurityGroupOperation (XXXXXXXX) was canceled and superseded by operation PutNetworkSecurityGroupOperation
而失败。 azure模板和部署错误包含在下面。什么可能导致这个问题?
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"resourceGroupName": {
"type": "string",
"metadata": {
"description": "The resource group name."
}
},
"subscriptionId": {
"type": "string",
"metadata": {
"description": "The subscription id."
}
},
"region": {
"type": "string",
"metadata": {
"description": "The region of AKS resource."
}
},
"gbPerNode": {
"type": "int",
"defaultValue": 20,
"metadata": {
"description": "Disk size (in GB) to provision for each of the agent pool nodes. This value ranges from 0 to 1023. Specifying 0 will apply the default disk size for that agentVMSize."
},
"minValue": 1,
"maxValue": 1023
},
"numNodes": {
"type": "int",
"defaultValue": 3,
"metadata": {
"description": "The number of agent nodes for the cluster."
},
"minValue": 1,
"maxValue": 50
},
"machineType": {
"type": "string",
"defaultValue": "Standard_D2_v2",
"metadata": {
"description": "The size of the Virtual Machine."
}
},
"servicePrincipalClientId": {
"metadata": {
"description": "Client ID (used by cloudprovider)"
},
"type": "securestring"
},
"servicePrincipalClientSecret": {
"metadata": {
"description": "The Service Principal Client Secret."
},
"type": "securestring"
},
"osType": {
"type": "string",
"defaultValue": "Linux",
"allowedValues": [
"Linux"
],
"metadata": {
"description": "The type of operating system."
}
},
"kubernetesVersion": {
"type": "string",
"defaultValue": "1.11.5",
"metadata": {
"description": "The version of Kubernetes."
}
},
"maxPods": {
"type": "int",
"defaultValue": 30,
"metadata": {
"description": "Maximum number of pods that can run on a node."
}
}
},
"variables": {
"deploymentEventTopic": "deploymenteventtopic",
"resourceGroupName": "[parameters('resourceGroupName')]",
"omswsName": "[concat('omsws-', parameters('resourceGroupName'))]",
"clustername": "cluster"
},
"resources": [
{
"apiVersion": "2018-03-31",
"type": "Microsoft.ContainerService/managedClusters",
"location": "[parameters('region')]",
"name": "[variables('clustername')]",
"properties": {
"kubernetesVersion": "[parameters('kubernetesVersion')]",
"enableRBAC": true,
"dnsPrefix": "clust",
"addonProfiles": {
"httpApplicationRouting": {
"enabled": true
},
"omsagent": {
"enabled": false
}
},
"agentPoolProfiles": [
{
"name": "agentpool",
"osDiskSizeGB": "[parameters('gbPerNode')]",
"count": "[parameters('numNodes')]",
"vmSize": "[parameters('machineType')]",
"osType": "[parameters('osType')]",
"storageProfile": "ManagedDisks"
}
],
"servicePrincipalProfile": {
"ClientId": "[parameters('servicePrincipalClientId')]",
"Secret": "[parameters('servicePrincipalClientSecret')]"
},
"networkProfile": {
"networkPlugin": "kubenet"
}
}
}
]
}
{
"code":"DeploymentFailed",
"message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
"details":[
{
"code":"Conflict",
"message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Canceled\",\r\n \"message\": \"Operation was canceled.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Canceled\",\r\n \"message\": \"Operation was canceled.\",\r\n \"details\": [\r\n {\r\n \"code\": \"CanceledAndSupersededDueToAnotherOperation\",\r\n \"message\": \"Operation PutNetworkSecurityGroupOperation (XXXXXXX) was canceled and superseded by operation PutNetworkSecurityGroupOperation (XXXXX).\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"
}
]
}
原因似乎是部署了启用了httpApplicationRouting路由的AKS集群。要解决此问题,请在模板中部署没有httpApplicationRouting的群集,然后在使用azure java sdk部署群集后以编程方式启用它。
final KubernetesCluster kCluster = serviceManager.kubernetesClusters()
.getByResourceGroup(resourceGroupName, deploymentName);
final Map<String, ManagedClusterAddonProfile> addonProfileMap = new HashMap<>();
addonProfileMap.put("httpApplicationRouting",
new ManagedClusterAddonProfile().withEnabled(true));
kCluster.update()
.withAddOnProfiles(addonProfileMap)
.apply();
我用Azure打开了支持票,支持工程师证实这是AKS团队正在修复的错误。因此,如果您不想实施变通方法,应尽快制定解决方案。