Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][C++] Support AWS_ROLE_ARN env variable for pyarrow.fs.S3FileSystem input #45702

Open
lozbrown opened this issue Mar 7, 2025 · 3 comments

Comments

@lozbrown
Copy link

lozbrown commented Mar 7, 2025

Describe the enhancement requested

Documentation here https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html

mentions supporting AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN but strangly role id can only be taken as role_id argument

request to support AWS_ROLE_ARN env variable instead

Component(s)

Python

@raulcd
Copy link
Member

raulcd commented Mar 7, 2025

We use the AWS SDK and it seems this is already handled through the AWS SDK. From our code comment:
https://github.com/apache/arrow/blob/main/cpp/src/arrow/filesystem/s3fs.cc#L282-L289
and from the AWS SDK CPP:
https://github.com/aws/aws-sdk-cpp/blob/9f6bfafbb9a82efe1c18e7d457fc97b848de3ee6/src/aws-cpp-sdk-core/source/auth/STSCredentialsProvider.cpp#L40

Have you tried using the environment variable? Maybe is just a matter of updating the documentation to reflect that it can be defined?

I have no experience with this part of the codebase but just for validation purposes.

@raulcd raulcd changed the title Support AWS_ROLE_ARN env variable for pyarrow.fs.S3FileSystem input [Python][C++] Support AWS_ROLE_ARN env variable for pyarrow.fs.S3FileSystem input Mar 7, 2025
@lozbrown
Copy link
Author

Hi @raulcd

yes, I've tested this, attempting to use role results in an error of the following form

OSError: When getting information for key 'schemas/meta.db/trino_queries_iceberg/metadata/00000-41568416-bc76-4236-afab-a7bec772eb32.metadata.json' in bucket 'REDACTED-BUCKET': AWS Error ACCESS_DENIED during HeadObject operation: No response body.

the following is a workaround for this, but probably not great as it leads to a static credential for a short lived credential.

import boto3
session = boto3.session.Session()
os.environ['AWS_ACCESS_KEY_ID']=session.get_credentials().access_key
os.environ['AWS_SECRET_ACCESS_KEY']=session.get_credentials().secret_key
os.environ['AWS_SESSION_TOKEN']=session.get_credentials().token

this may be related to #38421

@raulcd
Copy link
Member

raulcd commented Mar 10, 2025

So, if I understand correctly, using the AWS_ROLE_ARN environment variable works but there seems to be an issue on the AWS SDK where it seems to use an anonymous user when using AWS role arn. This is both using the environment variable or using the parameter role_arn from our API , as described on the issue linked, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants