Quantcast
Channel: Data Management & Data Architecture
Viewing all articles
Browse latest Browse all 22

Data Engineering with dbt – first steps using PostgreSQL and Oracle

$
0
0

dbt is a Data Engineering tool supporting version control with CI/CD for transformations and materialization.

The approach with dbt differs from tools like SSIS, DataFactory, Informatica.  The developer models the target tables/views and the transformations. dbt uses these models to create target views or target tables and run transformations from the source to the target view/tables. The models are in plain text, so version control with CI/CD is easily supported.

dbt claims to be a self-service Data Engineering tool so that not only Data Engineers but especially Data Scientists, Data Analysts, and other Data Enthusiasts can use the tool to create their Data Engineering models.

The article shows the first steps to use dbt and connect to PostgreSQL or Oracle. Upcoming articles will deal with building the ETL process.

BTW, dbt stands for data build tool.

The article shows basic steps to install dbt with PostgreSQL and Oracle adapters, the configuration of the database connections and test with the dbt example project. The prerequisites to follow the steps in the article are:

  • Python and pip are available
  • PostgreSQL client is available
  • Oracle client is available, and tnsnames.ora is configured

dbt installation

PIP is used to install dbt-core and adapters for databases like PostgreSQL or Oracle.

pip install dbt-core dbt-postgres dbt-oracle

The following screenshot shows a sample output from the dbt installation. Python packages dbt-core, dbt-postgres and dbt-oracle are downloaded and installed.

dbt installation

And now, check the installation with dbt –version.

Initialize and create dbt project

The following command creates a new dbt project with some samples. The name of the project is the last parameter, in this case, test: dbt init test.

dbt init

The following screenshots show the created directories:

  • root directories in the screenshot on the left
  • models folder with examples directory in the screenshot on the right
dbt root directories
model directory

Database connection

Configuration of database connections is in the file profiles.yml. The following example file contains two connections.

The first configuration is for a PostgreSQL cluster installed locally on port 5432. Database name and schema have to be specified.

The second configuration is for Oracle. Tnsnames.ora must be configured and contains the address name XEPDB1. A schema is also required.

Additionally, username and password get values from the environment for flexibility and safety:

  • If it is necessary to change parameters flexibly, then values from environment parameters can be used.
  • Sensitive information never belongs in configuration files being added to version control. Parameter DBT_USER contains the username, and DBT_ENV_SECRET_PASSWORD contains the password. If DBT_ENV_SECRET_* is used as a prefix, dbt will suppress the value for outputs (e.g. not written into log files).

Finally, target specifies the current environment.

test:
  outputs:

    psq-dev:
      type: postgres
      threads: 1
      host: localhost
      port: 5432
      user: "{{env_var('DBT_USER','postgres')}}"
      pass: "{{env_var('DBT_ENV_SECRET_PASSWORD')}}"
      dbname: postgres
      schema: abu

    ora-dev:
      type: oracle
      threads: 1
      tns_name: XEPDB1
      user: "{{env_var('DBT_USER')}}"
      pass: "{{env_var('DBT_ENV_SECRET_PASSWORD')}}"
      schema: abu

  target: psq-dev

Debug and run dbt

The next command checks if the dbt configuration is correct. The environment variables are set first (Linux: export DBT_USER=… or export DBT_ENV_SECRET_PASSWORD=…) as the debug also makes a database connection test.

dbt debug

And finally, the dbt example is run. dbt init creates two example models: my_first_dbt_model.sql and my_second_dbt_model.sql in the folder ../test/models/example. Then, the run command deploys those models and runs four tests, aka data quality checks, e.g. for the uniqueness of values. These tests are defined in schema.yml in the folder ../test/models/example.

dbt run

The article showed the first steps to using dbt as a Data Engineering tool. The upcoming article will demonstrate how to build models with different kinds of materialization. 

Some links for more information about dbt:


Viewing all articles
Browse latest Browse all 22

Trending Articles