Reading & Writing data from/to S3 using NiFi
NiFi, being a framework designed to automate the flow of data between systems provides rich set of processors to interact with various systems like AWS, Azure, Hadoop, Kafka, HBASE, MongoDB, Couchbase..etc
As headline suggests, this topic covers about ListS3 and PutS3Object to read and write data to S3 respectively.
ListS3:
ListS3 extracts all objects from given S3 bucket. It doesn’t require an incoming relationship. For every object extracted from S3, it creates a FlowFile. Like GetFile processor, it maintains a state to identify the objects created after last iteration. Though this supports most of the features like AWS S3 SDK, proper values need to be provided for property like bucket, region and credentials related detail in order to achieve basic functionality.
PutS3Object:
Writes the FlowFiles to given Amazon S3 Object. For example, output of ListS3 can be directly mapped to PutS3Object to write all FlowFiles generated from previous step. Like ListS3, this also supports most of the features of S3 like multipart upload, storage class, server side encryption..etc. But, the FlowFiles can be written with default features for a given bucket and filename generated by upstream job.
DAG created above serves the purpose of copying the data from one bucket to other in few seconds(depends on file size). In AWS, this can be achieved either by using replication feature or Lambda. However, NiFi helps us to implement the feature for free of cost.