How to filter XML data using SSIS

Introduction

Filtering data in XML format is a common requirement when working with APIs and structured data sources in SQL Server Integration Services (SSIS). The ZappySys XML Source component allows you to efficiently fetch, parse, and filter XML data. This article will guide you through the process of filtering XML data using XPath expressions in SSIS.

Prerequisites

Steps

  1. Add a Data Flow Task to your SSIS package.
  2. Drag and drop an XML Source component into the Data Flow. Provide a valid URL or local XML file. Here is the sample XML we will be using for demonstration:

<store storeid="s1">
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <tag>tag1</tag>      
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</store>
  1. Now, in the XPath filter, we can use XPath to filter our data. Below are several examples:

Filtering scenarios

Selecting an array

Filtering array on a single condition

Filtering array on multiple conditions

  • To filter books that belong to the genre ‘Computer’ and have a price less than 10:
    $.store.book[?(@genre=='Computer' || @price[*]<'10' )]

  • To filter books where the author is ‘Galos, Mike’ and the genre is ‘Computer’:
    $.store.book[?(@author=='Galos, Mike' && @genre=='Computer' )]

Filtering array using Regular Expressions

  • To filter books where author name is exactly ‘Gambardella, Matthew’ or ‘Ralls, Kim’:
    $.store.books[?(@author=~ /^Gambardella, Matthew|Ralls, Kim$/ )]

  • To filter books where author name ends with ‘Kim’:
    $.store.books[?(@author=~ /Kim$/ )]

Conclusion

Filtering XML data efficiently using ZappySys XML Source and XPath expressions in SSIS helps streamline data processing, reducing unnecessary data loads and improving overall performance. By applying targeted filters, you can extract only the required information, leading to optimized workflows and faster data transformations.

XPath expressions offer powerful filtering capabilities, including conditional selections, nested scans, and regular expressions, making them a versatile tool for handling complex XML structures. Whether working with APIs, local XML files, or large datasets, mastering XML filtering can significantly enhance your data integration processes.

To get started, download and install the ZappySys SSIS PowerPack and begin refining your XML extraction techniques. If you have any questions or require further assistance, feel free to contact ZappySys Support.

References

Contact us

If you encounter any challenges or have specific use cases, please contact our support team via chat or ticket.