S3 (Simple Storage Service) is an AWS service designed for scalable file storage. The data integration from S3 to the Data Warehouse created by Kondado allows you to access CSV files in your analytics cloud.
Adding the data source
Let's consider that you are using a bucket called generic-bucket-name
Step 1: Create a New IAM Policy
- Go to the IAM Console: https://console.aws.amazon.com/iam
- In the left-hand navigation, click on Policies.
- Click the Create policy button.
- Under the JSON tab, paste the following policy document:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::generic-bucket-name", "arn:aws:s3:::generic-bucket-name/*" ] } ] }
- Click Next: Tags, then Next: Review.
- For Policy Name, enter
S3ReadListGenericBucketPolicy. - Click Create policy.
Step 2: Create a New IAM User
- Go to the IAM Console again: https://console.aws.amazon.com/iam
- In the left-hand navigation, click on Users.
- Click Add user.
- Enter the user name
S3ReadUserGenericBucket. - Select Access key - Programmatic access.
- Click Next: Permissions.
Step 3: Attach the Policy to the User
- On the Permissions page, click Attach existing policies directly.
- Search for the policy you just created:
S3ReadListGenericBucketPolicy. - Check the box next to the policy and click Next: Tags, then Next: Review.
- Click Create user.
Step 4: Download Access Keys
- After creating the user, the Access Key ID and Secret Access Key will be displayed.
- Download or copy these credentials, as they will not be shown again.
Now, the user S3ReadUserGenericBucket has programmatic access to read and list files from the generic S3 bucket.
Step 5: Creating it in Kondado
- Access our platform and click on CREATE + > Source > select the S3 data source
- Give your data source a name, fill in the values from Step 4, your bucket name (eg
generic-bucket-name) then click SAVE

Pipelines
CSV files
The integration for reading CSV files will create a table in your destination where all fields will be of type text, and special characters will be replaced.
All files must have a header.
The following parameters are available:
- Start date of reading: Refers to the modification date of the files. It indicates from which date the data will start being read. If you choose to set your integration as Full, this parameter will be ignored
- Column delimiter: Specify which character is used to separate the columns in the file
- File prefix: Indicate the prefix of the files to be included, do not start with the bucket name or with “/” or “s3://”. Do not use wildcards. Example: folder_x/folder_y/file_prefix_
- Compression: Choose GZIP if this compression is applied to your files, or CSV if there is no compression. If you choose GZIP, your file must have the extension “.gz” or “.gzip” and contain only one file inside the archive. If you choose CSV, your file must have the “.csv” extension
- Header: List the first line (header) of your files. It is not necessary to maintain the order of the columns. If a file has fields that are not in this header, these fields will be ignored. If the file does not contain all the fields in the header, those fields will simply not be read without causing errors. Do not use spaces between fields, only commas. For example: col_x,col_y,col_z

After this step, you will be able to choose the replication type for your integration.
If you choose Integral replication, all files that match the prefix will always be read, and if any file is deleted between one execution and another, it will also be deleted from your table. This will not happen with Incremental replication, that's why the former is recommended in cases where files are deleted (not just modified). Integral replication can increase your record count.
When choosing Incremental replication, whenever a file is modified, it will be updated in your destination. You can locate the file that generated the insertion of a given row and when that row was inserted/modified using the columns _kdd_file_name and _kdd_insert_time, respectively.
