Getting Started with Analyzing Cloud Data: Creating a Postgres database on AWS (Amazon Web Services)

Getting Started with Analyzing Cloud Data: Creating a Postgres database on AWS (Amazon Web Services)

This article is part of our series to help you get started with analyzing cloud data. 

RDS versus EC2

As we mentioned earlier, AWS has its own server option for databases called Relational Database Service. Creating a database through RDS instead of first creating a server (EC2) and then installing a database on it offers several advantages, including:

  • No need to install anything! The database is already pre-configured.
  • AWS has developed several functions specific to databases, such as creating read replicas with just a few clicks and automatic version updating.
  • Specific metrics to monitor databases like used space, latency and performance
  • Super-fast backups and restores

Postgresql (or PG or Postgres)

In this article, we will teach you how to create a Postgresql database. Postgres is an open-source relational database, with a very active development community (and supporting tutorials), which is continually reinventing itself and comes with several extensions that can be easily installed.

In addition to these advantages, Postgres is the ideal database for small and medium data analysis projects (small & medium data), as it supports multiple data insertions (something common when we are integrating data from multiple sources) with little impact on performance. of the data being consulted, in addition to being efficient in crossing tables (joins).

Until recently, Postgres supported operations that MySQL (another popular and commonly used database for Data Warehouses and Data Lakes) did not, such as full joins, window functions and common table expressions (CTEs) – all vital to analyze data efficiently.

Creating a Postgres database on AWS

To create your Postgres database on AWS, log in to your account and follow the steps below:

1) Choose your region. On the right side of the top menu, you have the option to choose the region where you will create the database. AWS Regions indicate which data center your database will be created in, and the hourly prices for that database will vary by region. Creating a database in São Paulo, for example, is much more expensive than creating in Virginia, despite offering better latency. As this article is dedicated to analytical databases (Data Warehouses and Data Lakes), we find a latency of a few microseconds acceptable and will be creating in the US East 1 region (N. Virginia):

1.png

2) Select the RDS service by clicking (1) “Services” on the left of the top bar, then type (2) “RDS” in the search bar and (3) select the service:

2.png

3) On the RDS main screen, select “Create database” as shown in the image below:

3.png

4) Now you will be on your database configuration screen. In the first option, you must select “Standard Create”, which will be the method that we will use in this article:

4.png

5) Just below, you should select which type of database you want to create. Select "PostgreSQL":

5.png

6) In the next section, you can choose a creation template. As we mentioned earlier, AWS offers a benefit for new members called free-tier, which helps us save a lot during the first few months. Let's select this option as shown in the image below:

6.png

7) In the section below, we will configure some database details, among them the server identifier, the master username and its password. Filling it in is quite simple, as shown below. Fill in this information and do not forget to write down the master user password, as this information cannot be retrieved later:

7.png

8) The next section has already been filled in according to the free-tier template we chose, we can keep it as shown in the image below:

8.png

9) The storage section must also be filled in as per the free-tier template and does not require modification:

9.png

10) The same goes for the availability & durability section, which we can leave as per the template:

10.png

11) In the connectivity section, you can choose the VPC where you will create your database, or even create a new one. If your account is recent, just leave the VPC that is selected. Also in this section, it is important to change some details so that your database can be accessed from machines outside your network, such as your visualization and analysis tools (don't worry, we will block the firewall in your VPC later on). Click on Additional connectivity configuration as shown below:

11.png

12) In the additional settings, you'll be asked if you'd like your database to be publicly accessible (again, don't worry about naming, we'll put in other rules to prevent public access later on). Select yes, as shown in the image below. For simplicity, we will also keep the same security group as the VPC where we are creating it:

12.png

13) After this configuration, find the “Additional configuration” section that will be right at the bottom of the page as shown in the image below:

13.png

14) Give your database an initial name, for example “kondado”:

14.png

15) After that, just scroll to the bottom of the page and click on “Create database”:

15.png

16) After a few moments, you will be redirected to the page that lists your databases and you can see that your new database is being created. While this process is running, let's configure the security rules of your database by clicking on it as shown in the image below:

16.png

17) On your database's page, (1) find the Connectivity & security section as shown in the image below and (2) click on the name of your security group, as shown in the image below:

17.png

18) On the security group screen, let's put some rules about who can access your bank: (1) Locate the “Inbound” tab and then (2) click on “Edit” as shown in the image:

18.png

19) As you can see, your bank is open to be accessed only from machines belonging to the same VPC. If you want to keep this setting, you can skip directly to step 25:

19.png

20) To withdraw the original access rule, click on the “X” to the right of the “All traffic” rule:

20.png

21) Add new rules by clicking on “Add Rule” as shown in the figure. A new rule must be added for each IP address you want to access your database:

21.png

22) To add a new rule, you must select the correct port (5432 in the case of Postgres) in the “Type” column as shown in the image below:

22.png

23) Important rules to be placed are your own IP address (to allow access with PowerBI for example) and Kondado servers that can be found by clicking here, as shown in the image below:

23.png

24) After you put all the addresses you want click on “Save”.

25) Go back to the RDS home page (as per step 2) and select your database as per step 16.

26) Your database is created! On the “Connectivity & security” page, you will find the address (host) of your database. Username, password and database name were filled in in steps 7 and 14:

24.png

27) Now you’re ready to add the new database as a destination on the Kondado platform!

To enter your destination on Kondado, pay attention to the information below:

  • Your database address (host) can be found on the “Connectivity & security” page, as shown in the image above (step 26);
  • The user and password fields are the information entered in step 7 of this tutorial (“Master Username” and “Master Password”);
  • The database field must be filled in with the information entered in the “Initial Database Name” field in step 14 of this tutorial.

Centralize data from +80 sources with a few clicks

Start your free trial