How to remove duplicated data in XML using SSIS

Introduction

Duplicate records can cause data management and reporting issues when working with large datasets, especially when using XML API or files. Properly handling these duplicates is essential for maintaining accurate and clean data. This article will show you how to remove duplicate rows from an XML Source using the ZappySys SSIS PowerPack and create a new XML file with the XML Destination, ensuring your data is organized and ready for further processing or analysis.

Prerequisites

Steps

  1. Add a Data Flow Task to your SSIS package.

  2. Inside the Data Flow, drag and drop an XML Source. Enter a valid URL or XML file path. For this example, we will use Example 1 from the component with duplicate entries in the Author and genre records:

  3. Preview the data in the XML Source, filter to extract the books nodes, and save the configuration.

  4. Drag a Sort Transformation component and connect it to the XML Source. In the Sort Transformation editor, select the column (or columns) where duplicates exist. Check the box Remove rows with duplicate sort values and save the configuration.

  5. Drag an XML Destination component and connect it to the Sort Transformation. Configure it to create a new XML file that stores the deduplicated data and saves the configuration.

  6. Execute the package. The resulting output will show a reduced number of rows. By removing duplicates, the XML data will now contain unique entries. In our example, the 12 original rows are reduced to 5 unique rows.

  7. If necessary, use more than one column in the Sort Transformation to define duplicates. Select two or more columns for advanced deduplication.

  8. After modifying the columns, rerun the package for different results based on your selections.

Video Tutorial

Conclusion

By following these steps, you can easily remove duplicate rows from your XML source and create a clean, organized XML file with ZappySys XML Destination. This streamlined process simplifies data integration and enhances the accuracy of your data reporting. Should you face any issues or need further assistance, our support team is ready to help via chat on our website

References

For more details on working with XML files in SSIS, visit our XML Destination and XML Source documentation.