S3 (Simple Storage Service) is an AWS service designed for scalable file storage. Kondado's S3 data pipeline for the Data Warehouse enables you to access CSV files in your analytics cloud.
Adding the data source
To automate S3 ETL with Kondado for your database, follow the steps below.
1) In your AWS account, click on the top right and select “My Security Credentials”
2) On the new page, select “Access Keys (access key ID and secret access key)”
3) Click on “Create New Access Key”
4) Click on “Show Access Key” and copy the displayed values
6) Give your data source a name, fill in the values obtained in step (4) and the name of the bucket where your files are.
Now just save the data source and start integrating your S3 CSVs into your Data Lake or Data Warehouse
Pipeline
Relationship Chart
CSV
You can indicate the name of a file or even the beginning of the file name and we will integrate all of them.
Once executed, the pipeline will save the highest change date of the files it read and, on the next run, only look for files that have a later change date.
In order to absorb files with different columns, the data will be pivoted on the target and will follow the following pattern: