Overview
This document describes how to use Anaconda Enterprise to load data from Amazon S3 using per-account credentials for Amazon AWS. This approach can be used when multiple collaborators have access to a project in AE, and each user wants to utilize their own private AWS credentials within a shared project or deployment without the need to commit credentials to files within a shared project or expose their AWS credentials to other users.
Using the per-account credential functionality in AE 5.2.0, credentials can be associated with individual AE user accounts. The credentials are stored securely as a Kubernetes secret and are available as a file within a notebook session or deployment container.
For example, in an Amazon S3 bucket named “anaconda-enterprise-data”, if User A has access to Files A and B, and User B has access to Files C and D, each user can define their own private AWS credentials associated with their AE account. Each user can collaborate on the same project in AE and can only work with data that they have access to. When deploying a project, only the credentials for the user deploying the project will be available.
Configuration of Amazon AWS credentials in Anaconda Enterprise
Configure your Amazon S3 credentials by adding a new credential in Anaconda Enterprise in the Settings screen. Provide an arbitrary hostname of the credential, an arbitrary username, and your Amazon S3 credentials in the Git API Token field:
The format of the credential should follow the INI format for an AWS configuration file:
aws_access_key_id=AKIA...
aws_secret_access_key=69EH...
Accessing data in Amazon S3 from Anaconda Enterprise
After the credentials have been stored in Anaconda Enterprise, they will be accessible from a notebook session or deployment at the following location in the container:
where token-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx is the credential that was provided in the settings screen followed by a random hash, e.g., token-73a49867-e464-4404-b7b1-3529cf28cebb.
Data can be accessed using typical Python packages for Amazon S3, including boto, boto3, and s3fs. The location of the configuration file can be specified via an environment variable in a notebook or script using the AWS_CONFIG_FILE environment variable:
os.environ['AWS_CONFIG_FILE'] = '/var/run/secrets/credentials/token-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
or in a terminal using:
or in the anaconda-project.yml file within a project using:
AWS_CONFIG_FILE: /var/run/secrets/credentials/token-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Only the credentials for the logged-in AE user will be available in their own sessions and deployments. Credentials for other users will not be available.
Example: Loading data from Amazon S3 into a pandas dataframe
Import libraries:
import pandas as pd
from s3fs.core import S3FileSystem
Specify the location of the AWS configuration file, which will only contain the credentials for the logged-in AE user:
Specify the Amazon S3 bucket and key (i.e., data file) to use:
bucket = 'anaconda-enterprise-data'
key = 'my-data.csv'
We can list the contents of a bucket:
['my-data.csv']
Load the data from Amazon S3 into pandas:
Print the contents of the dataframe:
Using per-account credentials with other applications
This document describes how to use Anaconda Enterprise to store API keys/secrets for Amazon S3 using per-account credentials.
This approach can be also used to store any account credentials such as API tokens, database credentials, account tokens, etc. and use them in a notebook or script in a session or deployment without the need to save or commit secrets or credentials to code within a project.