How to connect to AWS s3 buckets from R
The aws.s3 library for R provides quick and easy methods to connect, download and upload content into already existing aws s3 buckets. One of the benefits of the aws.s3 library is that it uses the AWS S3 REST API, and does not require that the AWS command-line be installed on a users system
To use the package you will need to make sure that you have your AWS acccount access credentials. Your account access credentials can be found at https://console.aws.amazon.com/iam/home under Users by selecting your username and going to Security credentials.
Your Access key ID should be available at this location, and you will also needs your Secret Access Key, which can only be accessed once, so will need to be saved in a safe location. If you have lost your Secret Access Key, you can generate a new set of keypairs at any time.
The first step required is to download and install the aws.s3 library, fortunately it is already available on CRAN so becomes an easy download
install.packages("aws.s3")
library(aws.s3)
Although you could specify your security credentials in every call, it’s often easier to specify the credentials once at the beginning of the code
Sys.setenv("AWS_ACCESS_KEY_ID" = "my_access_key",
"AWS_SECRET_ACCESS_KEY" = "my_secret_key",
"AWS_DEFAULT_REGION" = "us-east-1",
"AWS_SESSION_TOKEN" = "my_token")
From here we can start exploring the buckets and files that the account has permission to access. In order to get a list of buckets available to the user account you have used you can run
bucketlist()
or you can check that a specific bucket exists and then get a list of files that exist within the bucket
#check that the bucket exists
bExists=bucket_exists("my_bucket")
#if the bucket exists, get a list of all files in the bucket
if(bExists){
get_bucket(bucket="my_bucket")
}
with that information available you can now either copy a file from the remote s3 bucket and save it locally, or upload a local file into the destination bucket
# copy a file locally
save_object("s3filename.txt", file="localfilename.txt", bucket="my_bucket")
# upload a local file into the bucket
putobject(file="localfilename.txt", object="s3filename.txt", bucket="my_bucket")
Many s3 buckets utilize a folder structure. AWS implements the folder structure as labels on the filename rather than use an explicit file structure. To access files under a folder structure you can proceed as you normally would with R code
# copy a file locally from a folder in an s3 bucket
save_object("s3folder/s3filename.txt", file="localfilename.txt", bucket="my_bucket")
# upload a local file into a folder in an s3 bucket
putobject(file="localfilename.txt", object="s3folder/s3filename.txt", bucket="my_bucket")