-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(embedded): Hide sensitive payload data from guest users #25878
fix(embedded): Hide sensitive payload data from guest users #25878
Conversation
superset/dashboards/api.py
Outdated
@@ -342,6 +342,11 @@ def get( | |||
$ref: '#/components/responses/404' | |||
""" | |||
result = self.dashboard_get_response_schema.dump(dash) | |||
if security_manager.is_guest_user(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfrag1 we should adhere to the "shift left" mentality and filter out inaccessible fields at the DAO layer as opposed to the API. This ensures we adhere to the DRY principle avoiding the need to repeat (and/or miss) said logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley Thanks for reviewing, but I'm not sure how we're supposed to properly filter out certain fields of a record in the DAO layer. As far as I'm aware the security logic in the DAO layer is only "row-level", protecting entire objects, there's no column-level protections. Are you suggesting in the appropriate DAO method do something like
dashboard = db.session.query(...
if is_guest:
dashboard.owners = []
dashboard.changed_by = None
I don't like this since it introduces the possibility of these changes accidentally being committed, but please let me know if you had something else in mind or if this pattern does exist currently but I'm just not aware of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida @dpgaspar @michael-s-molina @villebro what are your thoughts on where sanitization of the data response should occur? Note this doesn't seem to be API specific and thus I'm not sure we should explicitly be defining said logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing this at the API layer is problematic because there's no single point of verification and also because you will query data that you don't have access to and filter it out later, wasting resources (imagine a table with many columns and you only have access to one column). This will also be true for the suggested code below:
dashboard = db.session.query(...
if is_guest:
dashboard.owners = []
dashboard.changed_by = None
Considering our current architecture, where we offer a RLS (Row Level Security) solution instead of depending on the database layer for these types of controls, the correct way of implementing this is to augment our security model to also support Column Level Security (CLS) or even Object Level Security (OLS).
I don't like this since it introduces the possibility of these changes accidentally being committed
Ideally, you won't query for what you don't have access to, eliminating this concern because the values won't be in the session.
Pinging @villebro for awareness given his current work on an improved security model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do appreciate the discussion here, but this really feels like premature optimization to me. The changes in this PR only affect two endpoints, and I don't predict that this kind of logic will needed in a widespread manner.
If I'm wrong and there is a larger need/desire for this kind of security measure, I definitely agree we should re-assess, but I personally don't see a need to come up with a more "perfect" solution at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do appreciate the discussion here, but this really feels like premature optimization to me. The changes in this PR only affect two endpoints, and I don't predict that this kind of logic will needed in a widespread manner.
I thought that by referencing the concept of Object Level Security (OLS) it would be more clear why this is not a premature optimization and why we will have the same need in a widespread manner.
I'd argue the API layer is where that kind of security check belongs
If you research a little bit about CLS you'll see that many of these security checks are implemented at the database layer, even further from the API layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's risky to implement this fix in a single interface, as it gives no guarantees that this information won't be exposed via some other interface, either pre-existing, or one that's added later. Mind you, this is not the only security concern regarding guest tokens - the issue of exposing granular information about the underlying query has been raised multiple times in the community, and is also something I feel we need to tackle soon.
Could we setup a meeting to discuss viable long-term solutions to these issues? I propose meeting to agree on the the following:
- A long term vision for fixing this and other similar security concerns. I agree with @michael-s-molina that baking this into the security work I've started sounds like a good idea. I can share what I have so far and my thoughts of how this can be taken forward.
- Agreeing on a quick fix to unblock heavy users of guest tokens to bridge the gap between now and when we have a proper security mechanism in place for handling this and other similar security issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that by referencing the concept of Object Level Security (OLS) it would be more clear why this is not a premature optimization and why we will have the same need in a widespread manner.
This is referring to a security feature similar to row-level security where Superset could apply security filters to queries against connected databases, while the security issue at hand in this PR has to do with data from the Superset metadata DB.
If you research a little bit about CLS you'll see that many of these security checks are implemented at the database layer, even further from the API layer.
My understanding is that CLS is a security feature commonly implemented in data warehouses/BI tools, not a guide for how a server should implement security checks for data it owns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a great idea to meet on long term solutions. If we were to move the logic for this endpoint into the schema (I left one more suggestion that would remove this from the api altogether) should we move forward with this for now as @villebro suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a great idea to meet on long term solutions. If we were to move the logic for this endpoint into the schema (I left one more suggestion that would remove this from the api altogether) should we move forward with this for now as @villebro suggested?
@eschutho I also think moving this into the Marshmallow schema is probably the best solution for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more suggestion.
superset/dashboards/api.py
Outdated
schema = ( | ||
self.dashboard_get_response_schema_guest | ||
if security_manager.is_guest_user() | ||
else self.dashboard_get_response_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfrag1 I think we can even move the schema logic into Marshmallow dynamically with either a type function or a @marshmallow.post_dump decorator. That way the logic isn't in the api layer and you won't need two schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfrag1 I would prefer it if we remove the owners property all together if guest tokens aren't supposed to know who the owners are. Or is there some other reason we want to keep the property there? Also, I feel we really should add tests for both cases here (hence "Request changes").
superset/dashboards/schemas.py
Outdated
@post_dump() | ||
def post_dump(self, serialized: dict[str, Any], **kwargs: Any) -> dict[str, Any]: | ||
if security_manager.is_guest_user(): | ||
serialized["owners"] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to fully remove the property owners
, as replacing it with an empty list is IMO conveying incorrect info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro I agree. The only caveat would be if the owners field is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley ideally I think we should relax that requirement if we want to make this data conditional on the permissions of the logged in user. Based on the proposed changes, I think it should be as follows:
- regular user: all relevant owner info is conveyed (current behavior)
- guest token: no owner property is present in the response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro I initially opted to keep the owners property in the payload because the dashboard frontend expects it to always be there, and fails to render if it's not.
I found a way to exclude it from the response payload and have the client tack the property on after the dashboard is requested, which allows the embedded dashboard to render successfully.
superset/dashboards/schemas.py
Outdated
@post_dump() | ||
def post_dump(self, serialized: dict[str, Any], **kwargs: Any) -> dict[str, Any]: | ||
if security_manager.is_guest_user(): | ||
serialized["owners"] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@villebro Are you able to take another look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the iterations @jfrag1 !
(cherry picked from commit 386d4e0)
(cherry picked from commit 386d4e0)
SUMMARY
Currently, guest users can get access to more data than they need to when viewing an embedded dashboard. The
api/v1/dashboard/{id}
andapi/v1/dashboard/{id}/datasets
endpoints would return owners' first and last names. The latter would return information about the underlying database(s) the datasets belong to, which could be useful for an attacker. None of this data is necessary for the embedded dashboard to render.I recognize this PR does break typical API patterns by stripping out certain data depending on the type of user requesting, which is pretty ugly. However, this approach avoids making breaking changes for regular API use, while making the embedded experience more secure.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION