S3 (data source)

S3 (Simple Storage Service) is an AWS service designed for scalable file storage. Kondado's S3 data pipeline for the Data Warehouse enables you to access CSV files in your analytics cloud.

Adding the data source

To automate S3 ETL with Kondado for your database, follow the steps below.

1) In your AWS account, click on the top right and select “My Security Credentials”

2) On the new page, select “Access Keys (access key ID and secret access key)”

3) Click on “Create New Access Key”

4) Click on “Show Access Key” and copy the displayed values

5) On Kondado platform, go to add data sources page and select the S3 data source

6) Give your data source a name, fill in the values obtained in step (4) and the name of the bucket where your files are.

Now just save the data source and start integrating your S3 CSVs into your Data Lake or Data Warehouse

Pipeline

Relationship Chart

Gráfico de relacionamento entre tabelas

CSV

You can indicate the name of a file or even the beginning of the file name and we will integrate all of them.

Once executed, the pipeline will save the highest change date of the files it read and, on the next run, only look for files that have a later change date.

In order to absorb files with different columns, the data will be pivoted on the target and will follow the following pattern:

Field	Type
row_number	int
column_number	int
first_column_value	text
value	text
__file_basename	text
__file_path	text
__file_name	text
__kdd_insert_time	timestamp