AWS: Python Setup

This entry is part 1 of 3 in the series AWS & Python

When you want to work with S3 or a Kinesis Stream we first need to setup the connection. At the time of this writing I am using boto3 version 1.3.1.

Next we need to import the package.

import boto3

Next we setup the session and specify what profile we will be using.

profile = boto3.session.Session(profile_name='prod')

The profile name comes from the “credentials” file. You can set the environment variable “AWS_SHARED_CREDENTIALS_FILE” to specify what credentials file to use. You can setup the credentials file like below. You can change the “local” to anything you want. I normally use “stage”, “dev” or “prod”.

[local]
aws_access_key_id=##KEY_ID##
aws_secret_access_key=##SECRET_ACCESS_KEY##
region=##REGION##

Next we need to setup the connection to S3. To do this we will need to use the profile we created above.

connection_s3 = profile.resource('s3')

If we want to also use a Kinesis stream then we need to setup the connection. To do this we will need the profile we created above.

connection_kinesis = profile.client('kinesis')

AWS: Python Kinesis Streams

This entry is part 2 of 3 in the series AWS & Python

If you haven’t already done so please refer to the AWS setup section which is part of this series. As time goes on I will continually update this section.

To put something on the Kinesis Stream you need to utilise the “connection_kinesis” you setup already in the previous tutorial on setting up the connection. You will need to set the partition key, data and stream.

response = connection_kinesis.put_record(StreamName=##KINESIS_STREAM##, Data=##DATA##, PartitionKey=##FILE_NAME##)

Depending on your data you may need to utf8 encode. For example below.

bytearray(##MY_DATA##, 'utf8')

To read from the kinesis stream you need to setup the shard iterator then retrieve the data from the stream. Not forgetting to grab the new shard iterator from the returned records. Remember to not query against the queue to fast.

#shardId-000000000000 is the format of the stream
shard_it = connection_kinesis.get_shard_iterator(StreamName=##KINESIS_STREAM##, ShardId='shardId-000000000000', ShardIteratorType='LATEST')["ShardIterator"]

recs = connection_kinesis.get_records(ShardIterator=shard_it, Limit=1)

#This is the new shard iterator returned after queueing the data from the stream.
shard_it = out["NextShardIterator"]

 

AWS: Python S3

This entry is part 3 of 3 in the series AWS & Python

If you haven’t already done so please refer to the AWS setup section which is part of this series. As time goes on I will continually update this section.

To work with S3 you need to utilise the “connection_s3” you setup already in the previous tutorial on setting up the connection.

To load a file from a S3 bucket you need to know the bucket name and the file name.

connection_s3.Object(##S3_BUCKET##, ##FILE_NAME##).load()

If you want to check if the check if a file exists on S3 you do something like the below. However you will need to import botocore.

import botocore

def keyExists(key):
    file = connection_s3.Object(##S3_BUCKET##, ##FILE_NAME##)
    
    try:
        file.load()
    except botocore.exceptions.ClientError as e:
        exists = False
    else:
        exists = True
    
    return exists, file

If you want to copy a file from one bucket to another or sub folder you can do it like below.

connection_s3.Object(##S3_DESTINATION_BUCKET##, ##FILE_NAME##).copy_from(CopySource=##S3_SOURCE_BUCKET## + '/' + ##FILE_NAME##)

If you want to delete the file you can use the “keyExists” function above and then just call “delete”.

##FILE##.delete()

If you want to just get a bucket object. Just need to specify what bucket and utilise the S3 connection.

bucket = connection_s3.Bucket(##S3_BUCKET##)

To upload a file to S3’s bucket. You need to set the body, type and name. Take a look at the below example.

bucket.put_object(Body=##DATA##,ContentType="application/zip", Key=##FILE_NAME##)

If you want to loop over the objects in a bucket. It’s pretty straight forward.

for key in bucket.objects.all():
	file_name = key.key
	response = key.get()
	data = response_get['Body'].read()

If you want to filter objects in a bucket.

for key in bucket.objects.filter(Prefix='##PREFIX##').all():
        file_name = key.key
	response = key.get()
	data = response_get['Body'].read()