-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usability improvements for *_iam_policy and *_iam_binding resources #8354
Comments
Hi, Eric - thanks for the detailed issue! Unfortunately, there would be some significant problems with making the provider be strictly additive on create. From what I understand, that's how the provider used to work, and it caused a lot of confusion for folks. A cleaner solution would require changes to how Terraform providers work more generally; we will make the case to Hashicorp to provide an API that allows us to better support this type of resource. |
Hey @melinath, thanks for following up! Would it be possible for you to link some more information on what kind of problems folks had ran into in the past? If I understand correctly the In this case, however, I am not suggesting we make it configurable. It should only be non-authoritative during creation, and revert to the current behavior otherwise. This means that from a purely user-interacting-with-Terraform standpoint there is no difference during the initial apply, but subsequent plans would show a diff - and rightfully so, as it will provide information to the user about IAM principles that are not present in their Terraform configuration. I'm not sure that I agree that it would be more confusing this way; from internal usage at my company it is far more confusing that people have important IAM bindings blown away without any warning, compared to any confusion that might arise from seeing a diff on your next plan. The issues caused by removed IAM bindings are incredibly hard to debug, as it surfaces to users as their subsequent Google API calls failing for reasons that they do not understand. It is only after digging through Cloud Audit logs, or by having being bit by this in the past, that you realize that the user accidentally removed IAM permissions from a critical Google-managed service account. My fear with trying to get this into Terraform core is that it would fundamentally be a new Terraform paradigm, might not be accepted by upstream, and even if so, would be out of reach of users at my company for quite some time. Making a change to the Google provider, however, is something that could be done quickly, is relatively simple, and (as far as I can tell) is safe. That being said, an acceptable Terraform core solution could be for resources to optionally auto-import themselves. There is only ever one |
FYI this is not true. Things like Shared VPC/cross project service-accounts/services routinely require you to modify IAM roles granted to Google Service Agents. I understand the frustations with This module is very helpful https://github.com/terraform-google-modules/terraform-google-iam/tree/master/modules/projects_iam Acts like |
@upodroid good point! You can scratch that from my original proposal.
Obviously I'm biased here, but I believe my initial suggestion of "additive on create" is easy to use and easy to implement. In fact, the logic is present in I fail to see what drawbacks there are to this approach - yes, there may be some confusion for folks who see IAM principles show up in a diff in their next plan, but that confusion is much simpler compared to "these IAM principles were removed and no Terraform logs will show that", which is the current case.
Thanks for that link, wasn't aware. That being said, my company often uses I agree that it reduces the blast radius, as conflicts would be limited to only roles that a human would conceivably want to use (e.g. not the Finally, I'd like to add that I was able to achieve something close to a "safe" I prefer my original suggestion as the above feels a lot more complex than what a module should be taking on, and it apparently relies on some undocumented API output, but I would be open to discussing other options too. |
@ericnorris yeah, the issue with making it non-authoritative is exactly what you said: the initial apply goes through and then the following apply has a diff. Although this doesn't bother everyone, there are a significant number of users for whom it's a large concern, either because they are new to terraform and don't understand what's happening, or because they want terraform to make their state be exactly what they expect after a single apply and don't want to have to determine whether some resources need to be updated again. TLDR the solution we want to propose to hashicorp is exactly what you suggested at the end of your previous message: being able to specify that a particular resource should try an auto-import if it doesn't exist in state yet. :-) |
Understood, and it may end up being that we fundamentally disagree about the following point:
I want to emphasize the "are new to terraform and don't understand what's happening" group. With additive on create, they will at least be told what is happening. With how it is now, they are not; the Terraform logs will show nothing out of the ordinary during the plan and apply. These users are also the least equipped to be able to then trudge through Google documentation and Cloud Audit logs to determine what roles are missing after they've removed them silently. If they came to me (or you) with confusion over an unexpected diff, it would be far easier to explain what is happening. I cannot see how the current state of affairs is better than upsetting the other group of folks, the "they want terraform to make their state be exactly what they expect after a single apply" group. This already is not the case in a number of situations - in fact, applying a More concretely: resource "google_project_iam_binding" "editor" {
role = "roles/editor"
members = [
# ...
]
}
resource "google_project_service" "containerregistry" {
service = "containerregistry.googleapis.com"
} ...without knowing ahead of time and doing something with Even if the "auto-import" feature became a thing, again, these users will almost certainly always have to deal with the fact that they are working with abstractions over abstractions, and eventual consistency is the only thing they can count on. At a minimum I would be happy seeing this as either a separate resource, or a parameter on the existing resource, if you wanted to preserve behavior for the latter group while allowing my company to use something that I believe is friendlier to the former group. All that said though, I appreciate your time discussing this! |
I wanted to weigh in with a couple of points of my own here:
I think that carrying forward a discussion to HashiCorp about a pattern for "import on create" for a subset of resources (I personally refer to it as singleton resources, as there's exactly one and they always exist) would be the best solution here. It could be introduced in a minor version of the provider as it would be consistent with the current |
Thanks @rileykarson!
I understand why this would be the ideal, but I mentioned a case where it happens now, and it would continue to happen even with an auto-import functionality. I am sure you are well aware, but there will almost certainly always be some resources that cause delayed effects that cannot be expressed in a single apply. As another (non-IAM) example, consider: resource "google_project_service" "appengine" {
service = "appengine.googleapis.com"
}
resource "google_app_engine_application" "needed_for_cloud_scheduler" {
...
}
resource "google_cloud_scheduler_job" "a_job" {
...
} ...even with
@dossett is actually a member of my company :) I believe the confusion you are referring to here in the linked issue comes with competing IAM resources, which I think will continue to be a problem regardless of my proposal or using auto-import. That being said, if I pointed out that there will still be problems over competing IAM resources, but I believe that it's far easier for me to establish training that explain to folks "pay attention when bindings are being removed" than "don't ever use this resource without running
My understanding (and I could be wrong!) is that the behavior you are referring to was that it could optionally be authoritative or additive with the
I understand and agree, but I was mainly proposing this as solution that would allow for preserving behavior for what I believe is a small minority of users, the ones that really want their IAM principles blown away on create. I'd prefer my suggested behavior to be the only option. I really do think that this would be a simple change that would net a huge win for the people I work with and our ability to use the authoritative versions of resources. If Terraform ever supported auto-import, this functionality easily could be reverted, but at least we'd have something in the interim to help prevent the kinds of issues I frequently see. Otherwise our only recourse would be to potentially "ban" anything other than Anyhow, I really do appreciate the discussion and your time. Feel free to close this issue if you'd like, and tag me in any public GitHub issues with Hashicorp so I could follow along. |
Definitely! The difference here is introducing that behaviour intentionally, which I'd like to avoid.
Acknowledged! Misread on my part, you're right. I've seen other cases similar sentiment to this section in isolation in the past & I think it stands:
That's how I think the summary of my current take is:
I'm assigning to the |
hashicorp/terraform#19017 is tracking work to implement some kind of read-before-create to solve exactly this kind of situation; anything on our end would be blocked by that work. |
Can somebody help with this? I'm trying to create a service account and assign it a role, and it succeeds but it deletes all the default service accounts, including the Google API's Service Agent. How do we preservice the service agent? (<PROJECT_NUMBER>@cloudservices.gserviceaccount.com) |
@kaizenlabs: Are you using an |
One of my coworkers just reached out to me on Slack and said:
I hesitate to say this but |
We could probably use heftier docs somewhere than an individual reference page to cover how to use the IAM resources- we expect You'd want to use a policy validation layer like Sentinel, |
That would be a use case for the Sentinel feature of Terraform Cloud.
I for instance rely on Terraform to ensure the revocation of dangerous permissions granted by Google to e.g. the default Compute Engine service account. (In most cases I accomplish this by using |
This is no longer required now. There is an organization policy |
Community Note
Description
I'm sure you know by now there is a decent amount of care required when using the
*_iam_policy
and*_iam_binding
versions of IAM resources. There are a number of "be careful!" and "note" warnings in the resources that outline some of the potential pitfalls, but there are hidden dangers as well. For example, using thegoogle_project_iam_policy
resource may inadvertently remove Google's service agents' (https://cloud.google.com/iam/docs/service-agents) IAM roles from the project. Or, the dangers of usinggoogle_storage_bucket_iam_policy
andgoogle_storage_bucket_iam_binding
, which may remove the default IAM roles granted toprojectViewers:
,projectEditors:
, andprojectOwners:
of the containing project.The largest issue I encounter with people running into the above situations is that the initial
terraform plan
does not show that anything is being removed. While the documentation forgoogle_project_iam_policy
notes that it's best toterraform import
the resource beforehand, this is in fact applicable to all*_iam_policy
and*_iam_binding
resources. Unfortunately this is tedious, potentially forgotten, and not something that you can abstract away in a Terraform module.On the other hand, the more authoritative forms of IAM resources are beneficial for ensuring repeatability, discoverability, and security, so I prefer them over the simpler
*_iam_member
resources.I'd like to suggest that on creation the
*_iam_policy
and*_iam_binding
resources act as strictly additive, similar to the*_iam_member
resources, but remain authoritative on update. Doing this will provide feedback in the nextterraform plan
of IAM principles that may have accidentally been left out and would have potentially been removed.If this is undesirable, an alternative may be to fail hard on an existing IAM policy or binding and requiring that the resource be imported, but I prefer the above suggestion.
Finally, I'd like to propose that the specific resource implementations be more opinionated in how they operate. Authoritative
google_project_iam_*
resources could (optionally, of course) filter out https://cloud.google.com/iam/docs/service-agents from the diffs, as there is likely very little reason to manipulate their permissions. Similarly the authoritativegoogle_storage_bucket_*
resources could, again optionally, preserve the bindings that are granted toprojectViewers
, etc.This may be a harder sell than the first suggestion, but I think anything that Google can do to reduce the burden on developers understanding the implicit IAM grants that their products get would be a huge benefit to many. I could potentially see Google exposing API endpoints of "expected bindings" or similar to remove the burden from the provider itself, but that is likely beyond the scope of this issue.
New or Affected Resource(s)
Potential Terraform Configuration
No changes, though this could be a new resource name or parameter.
References
Not exhaustive, and likely has some false postives:
The text was updated successfully, but these errors were encountered: