How to extract text from a XML in SSIS

Introduction

In data integration tasks, working with XML data is often required. SSIS (SQL Server Integration Services) provides powerful tools through the ZappySys SSIS PowerPack to handle XML data efficiently. A common requirement is extracting text from XML using XPath expressions. The FUN_XML_TO_TEXT function is perfect for this task, allowing you to specify custom path expressions to extract and format XML values into readable text.

In this article, we’ll demonstrate how to use the FUN_XML_TO_TEXT function to extract values from XML and format them with custom separators such as spaces, semicolons, or newlines.

Prerequisites

  • SSIS PowerPack: Download and install the ZappySys SSIS PowerPack from this site.

Steps

We’ll use the function FUN_XML_TO_TEXT, which extracts text from XML using an XPath expression. You can specify expressions like /data/rows/row[0]/@name.

  1. Drag and drop a Data Flow Task into your SSIS package.

  2. Drag a valid source component into the Data Flow that contains XML values in one of its columns. For this example, we’ll use an XML Source, with the following XML data:

[
    {
        "xml1": "<ns1:x xmlns:ns1='http://abc.com'><token>ABCD</token></ns1:x>",
        "xml2": "<ns1:x xmlns:ns1='http://abc.com'><ns1:data sessionid='12345'>abcdefg</ns1:data></ns1:x>",
        "xml3": "<x><row><id>A</id></row><row><id>B</id></row></x>"
    }
]

Each XML element will be extracted differently:

  • XML1: Extract data from element text when namespace prefixes are used.
  • XML2: Extract data from attributes when namespace prefixes are used.
  • XML3: Extract IDs from all row elements and join values with a semicolon.
  1. Connect a Template Transform Component to the XML source. This is where we will apply the FUN_XML_TO_TEXT function.

    Here’s the function syntax:

    <<input_xml|~|xml_path[|~|separator]|~|indent,FUN_XML_TO_TEXT>>
    
    • input_xml: [String] XML document from which you want to extract text.
    • xml_path: [String] XPath filter expression you want to use for extraction.
    • separator (Optional): [String] String to use as a separator to join multiple matches. You can use \r (carriage return), \n (linefeed), \r\n (newline), \t (tab). Default is {space}.
    • indent: Indent selected nodes.

    The examples we will use:

    Example 1: <<<%example.xml1%>|~|/data/rows/row/name,FUN_XML_TO_TEXT>>
    Example 2: <<<%example.xml2%>|~|/data/rows/row/@id|~|;,FUN_XML_TO_TEXT>>
    Example 3: <<<%example.xml3%>|~|/data/rows/row/name|~|\n,FUN_XML_TO_TEXT>>
    

  2. Connect a destination component (e.g., Trash Destination, SQL Server Destination) and run the package.

    Package Result

  3. Here’s the result after extracting and formatting the XML data:

    Extract Result

Video Tutorial

Watch the tutorial

Conclusion

This article explored using the FUN_XML_TO_TEXT function from the ZappySys SSIS PowerPack to extract and format text from XML using XPath expressions. This powerful function allows you to easily extract and format XML data with custom separators, making it simpler to work with complex XML structures. Should you face any issues or need further assistance, our support team is ready to help via chat on our website.

References