我很难理解为什么我的计划需要更换资源。
这是上下文,我有一个 AAD 组字典,其中值是标签和我想要应用于它们的自定义角色。
我有 2 个这样的字典,一个用于处理 Azure 机器学习计算实例的组,我在模块中创建一个将链接到资源的托管标识。第二个是允许对数据块进行读取访问的组。
因此,我将两个字典的交集仅采用将创建托管 ID 的组,我将其发送到我的 databricks 模块并创建链接到此托管 ID 的服务主体。
每次我在组中获得新用户时,它都会使用其托管 ID 正确创建计算实例,而不会删除该组的其他计算实例。但它将删除 databricks 中所有组的所有服务主体,等待计算实例的创建,并重新添加所有组中的所有服务主体。
这就是我创建计算实例和托管 ID 的方式
data "azuread_group" "all_groups_compute_instance" {
for_each = local.azuread_group_info
display_name = each.key
}
data "azuread_users" "all_users_compute_instance" {
for_each = data.azuread_group.all_groups_compute_instance
ignore_missing = true
object_ids = each.value.members
}
# Here I create an azurerm_user_assigned_identity + add Rbac roles + create a azurerm_machine_learning_compute_instance with azurerm_user_assigned_identity linked to it.
module "azurerm_machine_learning_compute_instance" {
source = "./modules/aml_compute_instance_creation"
for_each = merge([
for group_name, group_data in local.azuread_group_info : {
for user in data.azuread_users.all_users_compute_instance[group_name].users : "${group_name}-${user.display_name}" => {
group = group_name
role = group_data.role
user = user
}
}
]...)
context = var.context
user = each.value.user
role = each.value.role
compute_instance_tags = local.azuread_group_info[each.value.group]["tags"]
ressoure_group = module.resource_group_01
azurerm_machine_learning_workspace = module.mlw_01
subnet_ressource = module.subnet_mlw
tenant_id = var.secrets.TENANT_ID
keyvault = module.keyvault_gb
tags = local.tags[var.context.environment]
}
这就是我在databricks中创建SP,在databricks中分配到一个组中的方法。
# Take only that have a compute instance on AML AND roles on databricks
locals {
aml_groups_for_databricks = {
for key, group in local.groups :
key => group if contains(keys(local.azuread_group_info), key)
}
}
# local.groups is the dict for databricks roles
# local.azuread_group_info is the dict for AML compute instance creation
data "azuread_group" "all_groups_databricks_for_aml" {
for_each = local.aml_groups_for_databricks
display_name = each.key
}
data "azuread_users" "all_users_databricks_for_aml" {
for_each = data.azuread_group.all_groups_databricks_for_aml
ignore_missing = true
object_ids = each.value.members
}
# This module will only get the id of the managed identity, create it in databricks as a service principal and link it to a group that has the same name of they key group from previous group dict
module "compute_instance_policy" {
source = "./modules/databricks_aml_compute_policy"
for_each = merge([
for group_name, group_data in local.aml_groups_for_databricks : {
for user in data.azuread_users.all_users_databricks_for_aml[group_name].users : "${group_name}-${user.display_name}" => {
id = "${group_name}-${user.display_name}"
}
}
]...)
tenant_id = var.secrets.TENANT_ID
id = module.databricks_01.id
managed_identity_name = module.azurerm_machine_learning_compute_instance[each.key].managed_identity
# Custom dependency
# If I don't add this, it will try to create the SP without waiting for the managed_identity to be created from the azurerm_machine_learning_compute_instance module
dependency_resource = [module.azurerm_machine_learning_compute_instance[each.key]]
}
因此,每次我计划/应用其中一个组中的用户数量发生修改时,Terraform 都会读取链接到 databricks 中其他服务主体的所有资源,删除它们,等待创建计算实例(5-10分钟),然后在 databricks 中重新创建所有 SP。使所有其他用户无法从其计算实例连接到数据块。
# module.compute_instance_policy["USERNAME-GROUP_NAME"].data.azuread_service_principal.compute_managed_identity will be read during apply
# (depends on a resource or a module with changes pending)
<= data "azuread_service_principal" "compute_managed_identity" {
+ account_enabled = (known after apply)
+ alternative_names = (known after apply)
+ app_role_assignment_required = (known after apply)
+ app_role_ids = (known after apply)
+ app_roles = (known after apply)
+ application_id = (known after apply)
+ application_tenant_id = (known after apply)
+ client_id = (known after apply)
+ description = (known after apply)
+ display_name = "USERNAME-GROUP_NAME"
+ feature_tags = (known after apply)
+ features = (known after apply)
+ homepage_url = (known after apply)
+ id = (known after apply)
+ login_url = (known after apply)
+ logout_url = (known after apply)
+ notes = (known after apply)
+ notification_email_addresses = (known after apply)
+ oauth2_permission_scope_ids = (known after apply)
+ oauth2_permission_scopes = (known after apply)
+ object_id = (known after apply)
+ preferred_single_sign_on_mode = (known after apply)
+ redirect_uris = (known after apply)
+ saml_metadata_url = (known after apply)
+ saml_single_sign_on = (known after apply)
+ service_principal_names = (known after apply)
+ sign_in_audience = (known after apply)
+ tags = (known after apply)
+ type = (known after apply)
}
# module.compute_instance_policy["USERNAME-GROUP_NAME"].databricks_group_member.compute_group_sp must be replaced
-/+ resource "databricks_group_member" "compute_group_sp" {
~ id = "314248904982378|8705618699735534" -> (known after apply)
~ member_id = "8705618699735534" # forces replacement -> (known after apply) # forces replacement
# (1 unchanged attribute hidden)
}
这是模块的内部
azurerm_machine_learning_compute_instance
:
data "azuread_service_principal" "compute_managed_identity" {
display_name = var.managed_identity_name
depends_on = [var.dependency_resource]
}
resource "databricks_group" "compute_group" {
display_name = "aml-compute-group"
workspace_access = true
databricks_sql_access = true
allow_cluster_create = false
allow_instance_pool_create = false
force = true
}
resource "databricks_service_principal" "sp" {
application_id = data.azuread_service_principal.compute_managed_identity.client_id
display_name = var.managed_identity_name
active = true
external_id = data.azuread_service_principal.compute_managed_identity.client_id
workspace_access = true
databricks_sql_access = true
allow_cluster_create = false
allow_instance_pool_create = false
force = true
}
resource "databricks_group_member" "compute_group_sp" {
group_id = databricks_group.compute_group.id
member_id = databricks_service_principal.sp.id
}
我发现了我的错误。
错误是我使用
data
而不是 ressource
来获取之前创建的现有托管身份。使用 data
它将删除具有相同名称的旧服务主体,并使用相同名称重新创建它。
如果我直接使用 ID,而不是发送托管身份名称以将它们“取回”到模块内,则可以避免此错误。