Log-based Change Data Capture - lessons learnt
My article on medium summarizes experiences from various projects with log-based change data capture (CDC). There are many use cases for which CDC is beneficial. Some DBs even have CDC functionality...
View ArticleAnonymization techniques and data privacy
Anonymization techniques are essential for data analytics or in test/dev databases. Anonymization and pseudonymization are very different but often confused. GDPR does not apply to anonymized data...
View ArticlePostgreSQL partitioning guide
PostgreSQL partitioning is a powerful feature when dealing with huge tables. Partitioning allows breaking a table into smaller chunks, aka partitions. Logically, there seems to be one table only if...
View ArticlePostgreSQL columnar extension cstore_fdw
PostgreSQL columnar extension cstore_fdw is a storage extension which is suited for OLAP-/DWH-style queries and data-intense applications. Columnar analytical databases have unique characteristics...
View ArticlePostgreSQL application_name
PostgreSQL application_name can be set in the connection string. The view pg_stat_activity will show the application_name to help to identify the sessions. The article shows how to set...
View ArticleData Engineering with dbt – first steps using PostgreSQL and Oracle
dbt is a Data Engineering tool supporting version control with CI/CD for transformations and materialization. The approach with dbt differs from tools like SSIS, DataFactory, Informatica. The...
View ArticleMaterialization examples of Data Engineering with dbt
dbt offers several materialization options to create ETL/ELT processes. The article shows and compares various approaches how to use dbt for ETL/ELT. A previous post contains an introduction into dbt:...
View ArticleData Vault and Star Schema with PlantUML: Entity Relationship Diagram as Code
Entity Relationship Diagram as code means developers use the same tools for creating the diagrams – or documentation in general – as for coding. Documentation includes more than just source code and...
View ArticlePredictions about data for 2023 and beyond
Predictions about data for 2023 and beyond. End of the year: it’s the time for predictions. Let’s have a look at some predictions regarding data. There are many predictions for Machine Learning, Deep...
View ArticleData visualization with Flourish
Flourish is a data visualization and storytelling platform that helps data enthusiasts understand and communicate complex data. With a wide range of customizable templates and interactive features,...
View ArticleHow to Be Useful: Unpacking Arnold Schwarzenegger’s Secrets to Success
Did you know that the man who conquered bodybuilding, Hollywood, and the political arena believes that his multifaceted success boils down to just seven principles? Yes, Arnold Schwarzenegger, in his...
View ArticleVector Database – What, Why, and How
In today’s data-driven world, vector databases are available to handle complex, high-dimensional data. This article describes vector databases including use cases as well as an example with the...
View ArticleSimilarity search in vector databases: a comprehensive guide
Similarity search in vector databases has emerged as a pivotal technique enabling efficient retrieval of information by comparing complex data points within high-dimensional spaces. The ability to...
View ArticleVector Indexes in Vector Databases: Semantic Search Performance
Vector indexes are crucial for semantic search performance, optimizing efficient querying. In this article, I will delve into various types of vector indexes, their workings, pros and cons, and...
View ArticleOracle AI Vector – Semantic Search
With the advent of Large Language Models (LLM), vector databases are becoming increasingly popular. Vector databases and similar approaches have existed for a long time such as geodata have long been...
View ArticleVector Vanguard: Tracking the Pulse of Vector Tech 07/2024
Welcome to “Vector Vanguard: Tracking the Pulse of Vector Tech 07/2024” – a source for the latest developments in vector databases, vector indexes, RAG (Retrieval-Augmented Generation), similarity...
View ArticleVector Vanguard: Tracking the Pulse of Vector Tech 08/2024
Welcome to “Vector Vanguard: Tracking the Pulse of Vector Tech 08/2024” – a source for the latest developments in vector databases, vector indexes, RAG (Retrieval-Augmented Generation), similarity...
View ArticleSQL’s unstoppable evolution: DBMS Innovations and how relational DBs...
Michael Stonebraker and Andrew Pavlo wrote about DBMS innovations in their paper “What Goes Around Comes Around… And Around…” which revisits the evolution of data models and database systems over the...
View ArticleVector Vanguard: Tracking the Pulse of Vector Tech 09/2024
Welcome to “Vector Vanguard: Tracking the Pulse of Vector Tech 09/2024” – a source for the latest developments in vector databases, vector indexes, RAG (Retrieval-Augmented Generation), similarity...
View ArticleHumanizing Data Strategy: A concise summary of Tiankai Feng’s 5 Cs Framework
“Humanizing Data Strategy: Leading Data with the Head and the Heart” by Tiankai Feng focuses on a people-centered approach to data strategy. The book introduces the Five Cs Framework, which highlights...
View Article