How to merge new data inserted and updated in Postgres

Introduction

Integrating new data and updates into an existing PostgreSQL table is crucial in ETL workflows. This ensures your target table is always up-to-date with the latest data, preventing duplication or information loss. Using the Merge transformation component in SSIS makes this process efficient and straightforward.

This article will guide you in using the Merge component with PostgreSQL databases, leveraging ZappySys SSIS PowerPack components to simplify the process.

Prerequisites

Before getting started, ensure you have the following:

  • ZappySys SSIS PowerPack: Download and install the ZappySys SSIS PowerPack.
  • A PostgreSQL database with the necessary tables set up.

Step-by-Step Guide

Step 1: Configure Your Source Tables

You’ll need two source tables to perform the merge operation:

  1. SourceTable: Contains the latest data, including new and updated rows.
  2. TargetTable: Holds the existing data that needs to be updated or supplemented with new rows.

In this tutorial, we’ll use:

Step 2: Use the Sort Component

Sorting the data is critical for the Merge transformation to work correctly.

  1. Drag the Sort Component into your Data Flow Task for both SourceTable and TargetTable.
  2. Configure sorting by selecting the columns to sort and specifying their sort order.
  3. Enable the Remove duplicates option to eliminate redundant rows.

Note: For large datasets, avoid using the Sort component, as it loads all data into memory, potentially impacting performance.

Step 3: Configure the Merge Component

Once both datasets are sorted, connect them to the Merge component.

  1. Link the Sort Output of SourceTable and TargetTable to the Merge component.
  2. Define the columns to merge by mapping them in the Merge configuration.

Step 4: Add a PostgreSQL Destination Component

Finally, insert the merged data into your PostgreSQL database using the PostgreSQL Destination Adapter.

  1. Drag the PostgreSQL Destination component into your Data Flow Task.

  2. Configure the connection settings:

  3. Select the target table in your PostgreSQL database:

  4. Map the Merge component output to the PostgreSQL Destination table:

  5. Run the SSIS package to execute the Merge operation and observe the results:
    Example Result

Step 5: A better option, Upsert Destination Component

The better way you can do this is using the Upsert Destination component; you can preview data from the element, select the options to insert or/and update and more.

  1. Select your PostgreSQL connection, the table, map the columns and check the column you want for the update/insert

  2. Go to the Advanced tab for more options if it needs it

  3. Run the package, and you will see the result
    Example result 2

Conclusion

Using the Merge transformation component in SSIS and the ZappySys PostgreSQL components is a powerful and efficient way to integrate new and updated data into your tables with our PostgreSQL destination and Upsert Destination. This approach ensures your datasets remain accurate and synchronized, saving time and improving data quality in your ETL workflows.

Explore the full capabilities of the ZappySys SSIS PowerPack for more advanced features to optimize your data integration tasks.

References