Data Transformation

Data is essential to the day-to-day operations of every enterprise. Unfortunately, the ad hoc development of many legacy systems has created information silos that contain redundant and inconsistent data. The recent move towards Service Oriented Architecture (SOA) provides a framework for integrating these information silos and leveraging existing IT assets to create more flexible, agile enterprise systems. A key process of SOA is data transformation.

This article provides an overview of data transformation, introduces some of the tools and technologies used for transforming data, and discusses why data transformation is important for successfully implementing SOA.

What is Data Transformation?

Data transformation is the process of converting data from one format (e.g. a database file, XML document, or Excel sheet) to another. Because data often resides in different locations and formats across the enterprise, data transformation is necessary to ensure data from one application or database is intelligible to other applications and databases, a critical feature for applications integration.

In a typical scenario where information needs to be shared, data is extracted from the source application or data warehouse, transformed into another format, and then loaded into the target location. Extraction, transformation, and loading (together known as ETL) are the central processes of data integration. Depending on the nature of the integration scenario, data may need to be merged, aggregated, enriched, summarized, or filtered.

The first step of data transformation is data mapping. Data mapping determines the relationship between the data elements of two applications and establishes instructions for how the data from the source application is transformed before it is loaded into the target application. In other words, data mapping produces the critical metadata that is needed before the actual data conversion takes place.

For instance, in field mapping, the information in one application might be rendered in lowercase letters while another application stores information in uppercase letters. This means the data from the source application needs to be converted to uppercase letters before being loaded into the corresponding fields in the target application.

The structure of stored data may also vary between applications, requiring semantic mapping prior to the transformation process. For instance, two applications might store the same customer credit card information using slightly different structures:

APPLICATION A EXAMPLE APPLICATION B EXAMPLE
Cardholder First Name JOHN Cardholder Name JOHN DOE
Cardholder Last Name DOE Card Type VISA
Card Type and Card Number VISA
0123 4567 8910 1112
Card Number 0123 4567 8910 1112
Expiration Date 05/2012 Expiration Date 05/2012

 

To ensure that critical data isn’t lost when the two applications are integrated, information from Application A needs to be reorganized to fit the data structure of Application B. In some cases, data can be unstructured, such as an email from a customer, requiring data transformation to make the information contained in it usable.

Transformers: Data Integration Tools and Technologies

System administrators can choose from a variety of data integration products that automate the ETL process with visual mapping tools and drag-and-drop technologies. ETL capabilities can be found as standalone data integration software, as built-in tools in database servers, or as components of EAI middleware. These tools simplify and expedite complex data transformations by reducing the need for writing new code.

Standalone integration software and ETL tools found in database servers are particularly useful solutions for migrating data from enterprise applications to data warehouses and other IT projects geared towards data management. Informatica’s PowerCenter, for instance, is an example data integration solution that is capable of accessing data in a variety of formats, including flat files and web applications, while Talend’s Integration Suite offers similar data integration capabilities based on open source technologies. SQL Server Integration Services (SISS) is an ETL tool included in Microsoft SQL Server that extracts operational data from different sources for transformation and loading into a database.

For enterprises interested in integrating processes and data, middleware tools such as an Enterprise Service Bus (ESB) provide a way to share data between applications. Mule ESB, for instance, is not only capable of exchanging and transforming data through the DataMapper graphical interface, but also service creation and service orchestration - key processes for developing composite applications using existing units of functionality in a Service Oriented Architecture.

Data Transformation and SOA

One of the main goals of SOA is to maximize resource utilization by leveraging existing IT assets to build new business applications. This requires a fundamental shift in how data is understood. Legacy applications were originally designed for individual departments, with business functions tightly connected to data. Over time, however, the number of applications within the enterprise grew, resulting in the fragmentation of data into information silos.

With SOA, data no longer belongs to a single department, but rather to the entire enterprise. This means that data from one application should be usable to other applications and requires the loosening of data from business processes and transformation into the right format. Without data transformation, information and data remain unusable and SOA fails to reach its potential in delivering tangible benefits to the enterprise.

Want to know more about the future of Mule ESB?

The Mule development team has released new Development features to the platform, including: