S3 (data source)

S3 (Simple Storage Service) is an AWS service designed for scalable file storage. Kondado's S3 data pipeline for the Data Warehouse enables you to access CSV files in your analytics cloud.

Adding the data source

To automate S3 ETL with Kondado for your database, follow the steps below.

1) In your AWS account, click on the top right and select “My Security Credentials”

2) On the new page, select “Access Keys (access key ID and secret access key)”

3) Click on “Create New Access Key”

4) Click on “Show Access Key” and copy the displayed values

5) On Kondado platform, go to add data sources page and select the S3 data source

6) Give your data source a name, fill in the values obtained in step (4) and the name of the bucket where your files are.

Now just save the data source and start integrating your S3 CSVs into your Data Lake or Data Warehouse

Pipeline

Relationship Chart

CSV

You can indicate the name of a file or even the beginning of the file name and we will integrate all of them.

Once executed, the pipeline will save the highest change date of the files it read and, on the next run, only look for files that have a later change date.

In order to absorb files with different columns, the data will be pivoted on the target and will follow the following pattern:

Field Type

row_number

int

column_number

int

first_column_value

text

value

text

__file_basename

text

__file_path

text

__file_name

text

__kdd_insert_time

timestamp