S3 (data source)

S3 (Simple Storage Service) is an AWS service designed for scalable file storage. The data integration from S3 to the Data Warehouse created by Kondado allows you to access CSV files in your analytics cloud.

Adding the data source

Let's consider that you are using a bucket called generic-bucket-name

Step 1: Create a New IAM Policy

  1. Go to the IAM Console: https://console.aws.amazon.com/iam
  2. In the left-hand navigation, click on Policies.
  3. Click the Create policy button.
  4. Under the JSON tab, paste the following policy document:
    1. {  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Action": [        "s3:GetObject",        "s3:ListBucket"      ],      "Resource": [        "arn:aws:s3:::generic-bucket-name",        "arn:aws:s3:::generic-bucket-name/*"      ]    }  ] }
  5. Click Next: Tags, then Next: Review.
  6. For Policy Name, enter S3ReadListGenericBucketPolicy.
  7. Click Create policy.

Step 2: Create a New IAM User

  1. Go to the IAM Console again: https://console.aws.amazon.com/iam
  2. In the left-hand navigation, click on Users.
  3. Click Add user.
  4. Enter the user name S3ReadUserGenericBucket.
  5. Select Access key - Programmatic access.
  6. Click Next: Permissions.

Step 3: Attach the Policy to the User

  1. On the Permissions page, click Attach existing policies directly.
  2. Search for the policy you just created: S3ReadListGenericBucketPolicy.
  3. Check the box next to the policy and click Next: Tags, then Next: Review.
  4. Click Create user.

Step 4: Download Access Keys

  1. After creating the user, the Access Key ID and Secret Access Key will be displayed.
  2. Download or copy these credentials, as they will not be shown again.

Now, the user S3ReadUserGenericBucket has programmatic access to read and list files from the generic S3 bucket.

Step 5: Creating it in Kondado

  1. Access our platform and click on CREATE + > Source > select the S3 data source
  2. Give your data source a name, fill in the values from Step 4, your bucket name (eg generic-bucket-name) then click SAVE

Pipelines

CSV files

The integration for reading CSV files will create a table in your destination where all fields will be of type text, and special characters will be replaced.

All files must have a header.

The following parameters are available:

  • Start date of reading: Refers to the modification date of the files. It indicates from which date the data will start being read. If you choose to set your integration as Full, this parameter will be ignored
  • Column delimiter: Specify which character is used to separate the columns in the file
  • File prefix: Indicate the prefix of the files to be included, do not start with the bucket name or with “/” or “s3://”. Do not use wildcards. Example: folder_x/folder_y/file_prefix_
  • Compression: Choose GZIP if this compression is applied to your files, or CSV if there is no compression. If you choose GZIP, your file must have the extension “.gz” or “.gzip” and contain only one file inside the archive. If you choose CSV, your file must have the “.csv” extension
  • Header: List the first line (header) of your files. It is not necessary to maintain the order of the columns. If a file has fields that are not in this header, these fields will be ignored. If the file does not contain all the fields in the header, those fields will simply not be read without causing errors. Do not use spaces between fields, only commas. For example: col_x,col_y,col_z

After this step, you will be able to choose the replication type for your integration.

If you choose Integral replication, all files that match the prefix will always be read, and if any file is deleted between one execution and another, it will also be deleted from your table. This will not happen with Incremental replication, that's why the former is recommended in cases where files are deleted (not just modified). Integral replication can increase your record count.

When choosing Incremental replication, whenever a file is modified, it will be updated in your destination. You can locate the file that generated the insertion of a given row and when that row was inserted/modified using the columns _kdd_file_name and _kdd_insert_time, respectively.

Add S3 as a data source on Kondado

Configure AWS IAM credentials and connect your S3 bucket to Kondado for CSV data integration.

1
Create an IAM policy for S3 access

In the AWS IAM Console, create a new policy named S3ReadListGenericBucketPolicy with s3:GetObject and s3:ListBucket permissions for your specific bucket ARN and its contents.

2
Create an IAM user with programmatic access

Add a new user (e.g., S3ReadUserGenericBucket) in the IAM Console and select Access key - Programmatic access as the credential type.

3
Attach the policy to the IAM user

On the Permissions page, attach the S3ReadListGenericBucketPolicy directly to your new user, then complete the user creation process.

4
Save the access keys securely

Download or copy the Access Key ID and Secret Access Key immediately—they are shown only once and are required for the Kondado connection.

5
Configure the S3 source in Kondado

In the Kondado platform, click CREATE + > Source > S3, enter your credentials, bucket name, and CSV parameters (delimiter, prefix, compression, header), then save.

6
Choose your replication strategy

Select Integral replication if files may be deleted between runs, or Incremental replication to track changes via _kdd_file_name and _kdd_insert_time columns.

Frequently asked questions

What AWS permissions does Kondado need for S3 integration?
Kondado requires s3:GetObject and s3:ListBucket permissions on your specific bucket. These should be configured through a custom IAM policy attached to a dedicated IAM user with programmatic access.
What file format does Kondado support for S3 data sources?
Kondado supports CSV files. All files must include a header row. You can also use GZIP-compressed files with the .gz or .gzip extension, provided the archive contains only one CSV file inside.
How should I format the file prefix parameter?
The file prefix should indicate the path to your files without starting with the bucket name, /, or s3://. Do not use wildcards. Example: folder_x/folder_y/file_prefix_
What happens if my CSV files have different columns than the header I specified?
Fields present in a file but not listed in your configured header will be ignored. If a file is missing some header fields, those fields will simply not be read without causing errors. The column order in the header does not need to match the file.
What is the difference between Integral and Incremental replication for S3?
With Integral replication, all matching files are always read, and deletions in S3 are reflected in your destination table—recommended when files are deleted, though it may increase record counts. With Incremental replication, modified files are updated in your destination, and you can trace rows using _kdd_file_name and _kdd_insert_time columns. Learn more about data integration options on our platform.
Where can I send my S3 data after connecting it to Kondado?
Once your S3 source is configured, you can send the data to various destinations such as BI tools, data warehouses, or spreadsheets for analysis and visualization.

Written by·Published 2023-07-17·Updated 2026-04-25