Google Cloud Storage

Google Cloud Storage is a service for storing objects and files (eg CSV and JSON) on Google Cloud.

By adding the Google Cloud Storage data source in Kondado, you will be able to create ETLs from your files directly to your Data Warehouse or Data Lake with just a few clicks.

Adding the data source

To add the Google Cloud Storage connection, follow the steps below:

1) Login to your Google Cloud account

2) Click on this link to access the Service Accounts section;

3) Once in the Service Accounts section, click on “CREATE SERVICE ACCOUNT”

4) In the first step, fill in a name for your service account (eg “kondado gcs”) and click on “CREATE”

5) In the second step of the creation process, select the Role “Storage Object Admin” and click CONTINUE

6) Now just click on “DONE” to finish the creation

7) Once created, you will be directed to a list of all active service accounts. Locate the one you just created and, on the three vertical points on the right, click on “Create key”

8) In the dialog, select the type “JSON” and then click on “CREATE”

9) After clicking create, the key will be downloaded to your computer. Open the downloaded file in a text editor, it will look something like this:

10) Login to Kondado platform, go to add data sources page and select the Google Cloud Storage data source;

11) On the add data source page, do the following:

In “Bucket” fill in the name of your bucket

In “JSON Credential”, copy and paste the file values from step (9)

12) Now just click on “SAVE” and you will be ready to upload your files from Google Cloud Storage to your Data Warehouse or Data Lake

Pipelines

Relationship Chart

CSV

You can indicate the name of a file or even the beginning of the file name and we will integrate all of them.

Once executed, the pipeline will save the highest change date of the files it read and, on the next run, only look for files that have a later change date.

In order to absorb files with different columns, the data will be pivoted on the target and will follow the following pattern:

Field Type

row_number

int

column_number

int

first_column_value

text

value

text

__file_basename

text

__file_path

text

__file_name

text

__kdd_insert_time

timestamp