How to connect to AWS s3 buckets with python

The boto3 package provides quick and easy methods to connect, download and upload content into already existing aws s3 buckets. The boto3 package is the AWS SDK for Python and allows access to manage S3 secvices along with EC2 instances.

To use the package you will need to make sure that you have your AWS acccount access credentials. Your account access credentials can be found at https://console.aws.amazon.com/iam/home under Users by selecting your username and going to Security credentials.

Your Access key ID should be available at this location, and you will also needs your Secret Access Key, which can only be accessed once, so will need to be saved in a safe location. If you have lost your Secret Access Key, you can generate a new set of keypairs at any time.

The first step required is to download and install the aws.s3 library, fortunately it is already available on CRAN so becomes an easy download

# pip install boto3
import boto3

Although you could specify your security credentials in every call, it’s often easier to specify the credentials once at the beginning of the code

s3=boto3.client('s3', 'us-east-1',
                        aws_access_key_id="my_access_key",
                  aws_secret_access_key="my_secret_key")

From here we can start exploring the buckets and files that the account has permission to access. In order to get a list of files that exist within a bucket

# get a list of objects in the bucket
result=s3.list_objects_v2(Bucket='my_bucket', Delimiter='/*')
for r in result["Contents"]:
  print(r["Key"])

with that information available you can now either copy a file from the remote s3 bucket and save it locally, or upload a local file into the destination bucket

# download file to local directory from s3 bucket
s3.download_file('my_bucket', 's3filename.txt', 'localfilename.txt')

# upload file from local directory to s3 bucket
with open('localfilename.txt', "rb") as f:
  s3.upload_fileobj(f, 'my_bucket', 's3filename.txt')

Many s3 buckets utilize a folder structure. AWS implements the folder structure as labels on the filename rather than use an explicit file structure. To access files under a folder structure you can proceed as you normally would with Python code

# download a file locally from a folder in an s3 bucket
s3.download_file('my_bucket', 's3folder/s3filename.txt', 's3filename.txt')

# upload a local file into a folder in an s3 bucket
with open('localfilename.txt', "rb") as f:
  s3.upload_fileobj(f, 'my_bucket', 's3folder/s3filename.txt')