Pages

Tuesday, January 31, 2012

Custom pipeline component for processing DBF, Excel and other ODBC types

This custom pipeline component save the incoming binary stream to a physical file on the BizTalk machine and then used basic ADO.NET to parse the DBF File into an XML document. I then modified/extended this pipeline component to accept and parse other ODBC files to XML, such as:
DBF
Excel
FoxPro
Possibly others such as Access Files.

At this point in time, this custom pipeline component will only parse Excel and DBF files, but it is possible to modify the component to process other ODBC types.
By using this custom pipeline component in a BizTalk Receive Pipeline it will do the following:
Raw DBF, Excel messages are delivered to BizTalk by any transport such as:
File
FTP
MSMQ
etc. etc.

The raw message will be parsed to XML in a BizTalk Receive Pipeline with the parsed XML message published into the MsgBox.
This component requires no special APIs and uses basic ADO.NET code to parse the ODBC type files into XML.

You can download the full source code for the Custom Pipeline component at the end of this entry.
The component works as below:
1) The incoming file is saved to a temporary file on the BizTalk machine.
2) An OLEDB connection will be used to connect to the file from 1).
3) A Sql query is performed against the OLEDB datasource.
4) The results from the query are stored to an ADO.NET dataset/datatable.
5) The XML is extracted from the datatable and modified for a root node name and target namespace.
6) The temporary file from 1) is deleted
7) The XML from 5) is added back to the pipeline message stream.


The custom pipeline component was coded as a Decoder pipeline component, but it could be modified to implement a Disassembler pipeline component.


The Custom Pipeline Component exposes a number of properties for dynamic configuration.
The connection string and query differs slightly for an Excel and DBF file. Therefore the configuration for an Excel file and DBF file are discussed separately:
Excel
The incoming Excel file to be parsed looks as below:




The resultant parsed XML file will look as below:
Note: Only two Employee nodes are present in the XML file due to a filter condition in the configuration (see below).
The Configuration for this Pipeline is as below:
1) ConnectionString -> The OLEDB Connection string for the Excel file.
The following is set for the ConnectionString property:
Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=Excel 8.0;
But, the final Connection String that is produced by the code looks like below:
Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=Excel 8.0;Data Source=C:\Temp\afgd1234.xls

This is because the code, dumps the Excel File to the TempDropFolderLocation and must dynamically add the Data Source section to the connection string.
Note : Other Connection Properties for an Excel File:
"HDR=Yes;" indicates that the first row contains columnnames, not data
"IMEX=1;" tells the driver to always read "intermixed" data columns as text
(Above From: http://www.connectionstrings.com/ )

2) DataNodeName -> The XML Node name for the Data. In this case Employee
3) DeleteTempMessages -> If set to True, will delete the Excel file that is dropped to the TempDropFolderLocation after processing.
4) Filter -> Filter for the SqlStatement. In this case, will only Select LastNames Like %B%
Note: This is optional. If all data is to be returned, leave blank.

5) Namespace -> NameSpace for the resultant XML message.
6) RootNodeName -> Root Node Name for the resultant XML Message.
7) SqlStatement -> OLEDB Select Statement.
SQL syntax: SELECT * FROM [sheet1$] - i.e. worksheet name followed by a "$" and wrapped in "[" "]" brackets.
(Above From: http://www.connectionstrings.com/ )

Note: The SqlStatement could also look as below:
Select FirstName,LastName FROM [sheet1$]  (only bring back selected columns)
Select FirstName as FName, LastName as LName FROM [sheet1$] (rename the column Names in the resultant XML)

8) TypeToProcess -> In this case Excel File.
DBF

The incoming DBF file to be parsed looks as below:

The resultant parsed XML file will look as below:
Note: Only two Items nodes are present in the XML file due to a filter condition in the configuration (see below).
The Configuration for this Pipeline is as below:
Note: The above is an example of Per Instance Pipeline Configuration for BizTalk 2006.

1) ConnectionString -> The OLEDB Connection string for the DBF file.
The following is set for the ConnectionString property:
Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=dBASE IV;
But, the final Connection String that is produced by the code looks like below:
Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=dBASE IV;Data Source=C:\Temp\

This is because the code, dumps the DBF File to the TempDropFolderLocation and must dynamically add the Data Source section to the connection string.
2) DataNodeName -> The XML Node name for the Data. In this case Items
3) DeleteTempMessages -> If set to True, will delete the DBF file that is dropped to the TempDropFolderLocation after processing.
4) Filter -> Filter for the SqlStatement. In this case, will only Select PRICE >= 200 and PRICE <=500
Note: This is optional. If all data is to be returned, leave blank.

5) Namespace -> NameSpace for the resultant XML message.
6) RootNodeName -> Root Node Name for the resultant XML Message.
7) SqlStatement -> OLEDB Select Statement.
In this case only have the columns part of the Select Statement as below:
Select * 

This is because the code dumps the DBF File to the TempDropFolderLocation and must dynamically add the FROM statement as below:
SELECT * FROM i0lb1gcr.dbf

Note: The SqlStatement could also look as below:
Select COD, PRICE (only bring back selected columns)
Select COD as Id, Price as Amount (rename the Node Names in the resultant XML)

8) TypeToProcess -> In this case DBF File.
Note: When configuring a Pipeline Component in the BizTalk Server 2006 Administration console,
for TypeToProcess :
0 -> Excel
1 -> DBF


Finally:

The not so good things about this Component are:
1) It has to write the ODBC file locally to disk before parsing. This will create
extra disk I/O. I did test it with multiple submissions of 1 MB DBF files. The performance still seemed
pretty good.

2) The types of Excel files it can process are flat. If you're Excel files to process are
complex, not sure how well this Component will parse to XML.

The good things about this component are:
1) The code to parse the ODBC files is dead simple, looks something like the below:
 OleDbDataAdapter oCmd;
 // Get the filter if there is one
 string whereClause = " ";
 if (Filter.Trim() != " ")
   whereClause = " Where " + Filter.Trim();
 if (this.TypeToProcess == odbcType.Excel)
   oCmd = new OleDbDataAdapter(this.SqlStatement.Trim() + whereClause, oConn);
 else // dbf
   oCmd = new OleDbDataAdapter(this.SqlStatement.Trim() + " From " + filename + whereClause, oConn);
 oConn.Open();
 // Perform the Select statement from above into a dataset, into a DataSet.
 DataSet odbcDataSet = new DataSet();
 oCmd.Fill(odbcDataSet, this.DataNodeName);
 oConn.Close();
 // Write the XML From this DataSet into a String Builder
 System.Text.StringBuilder stringBuilder = new StringBuilder();
 System.IO.StringWriter stringWriter = new System.IO.StringWriter(stringBuilder);
 odbcDataSet.Tables[0].WriteXml(stringWriter);


2) This code can be modified to process other types of ODBC files. The modifications
may be minor.

3) You can filter the data in an incoming Excel or DBF file.

1) Prerequisites
VS2005 and BTS2006

Note: If you want to use in VS2003/BTS2004, unzip the file.
Create a new VS2003 Class project (Reference Microsoft.BizTalk.Pipeline) and then copy the code from DecodeODBC.cs 
to your VS2003 Class file. I think the code will work as is. But there might be a little tweaking or a few .Net 2.0 
features that will have to be re-written for .Net 1.1

Note: This code was authored with the RTM version of VS2005 and Beta2 of BTS 2006. Other versions of BTS 2006 should work.

1) Unzip the zip file on the root of your c:\ drive.
A folder called  c:\ODBCCustomPipe will be created.

2) Open up the c:\ODBCCustomPipe\ODBCPipelineComponent.sln in VS2005
Build the project.

Note: The output assembly for the project will be placed in:

C:\Program Files\Microsoft BizTalk Server 2006\Pipeline Components\

If your Pipeline directory is different from the above, right mouse button on the ODBCPipelineComponent project in VS2005 
and then choose Properties. Select the Build Tab and then modify the Output path.

Note: Installing Pipeline components in the GAC is optional.

3) In a new BTS or existing BTS project add a Receive Pipeline.
With this pipeline opened up in Visual Studio:
a) Goto the toolbox
b) Right Mouse Button BizTalk Pipeline Components section in the toolbox.
c) Select Choose Items.. in the pop-up menu.
d) In the Choose ToolBox Items dialog, select the BizTalk Pipeline Components tab.
e) Find the ODBC File decoder in the list and select it and hit OK.
f) The ODBC File Decoder should be added to your Toolbox

4) Drag the ODBC File decoder from the toolbox to the Decode section of your Receive
Pipeline. Click on the ODBC File decoder in your pipeline and go to the Properties window
in VS2005. The properties for the Pipeline can now be set, or a Receive Port/Receive Location
can be created that uses this pipeline, and the properties can be set/overridden for the pipeline at that point (Per 
Instance Pipeline Configuration in the BizTalk Server 2006 Administration Console).
For instructions on how to set the properties, a description will show up for each property in
the VS2005 property window. Also go to :

http://objectsharp.com/blogs/matt/archive/2006/01/29/3778.aspx

for instructions on how to set the properties to parse a DBF or Excel file in the pipeline.

5) Deploy the BizTalk project that contains the Receive Pipeline

6) Create a Receive Port/Receive Location to receive a DBF or Excel file.
Configure the Receive Location to use the receive pipeline.
Note: You can configure the properties for the ODBC File decoder custom component
in the BizTalk Server 2006 administration console (Per Instance Pipeline Configuration).

7) Create a Send Port that subscribes to the messages.

For example, create a filter on the Send Port that subscribes to the messages from
the Receive Port/Receive Location created in 6). This send port should be using the
PassThrough Pipeline.

The filter might look something like the below:

BTS.ReceivePortName == MyReceivePortNameHere


8) Test using your own Excel or DBF files, or there are a couple of test files in:
C:\ODBCCustomPipe\Files. An XML version of the DBF and Excel files should be dropped to the
configured Send Port from Step 7).


9) If you want to create a Schema from the produced XML Files.
a) Get a copy of the XML Files that the Send Port generated.
b) In a BizTalk Project, add Generated Items, choose the Generate Schemas/Well Formed XML
and choose the XML file that was dropped off by the send port.

10) If you need the MessageType property set on your parsed XML messages.
a) Build and Deploy the BizTalk project that contains the Schemas from Step 9)
b) Go back and modify your Receive Pipeline from step 3) and add
a XML disassembler component to your Disassemble stage of your pipeline.
Note: (You do not have to set any properties on it)
c) Redeploy the BTS project that contains the receive pipeline.
Note: (A Host Instance Restart is required)
d) You can then go back to your send ports from Step 7) and change the Filter condition
to something like below:

BTS.MessageType      == http://DBF#Root























1 comment:

  1. Hi Divya,

    I am now looking for this kind of component to process excel files.
    It would be great if you can please share source code and any set up file.

    Thanks
    Venu

    ReplyDelete