Saturday, December 31, 2011

Change Detection

An efficient way to compare data of two database tables 

Oracle Database, SQL Server Database

No question, the most convenient way to detect changes is CDC (Change Data Capture), which means that you only get the changed data rows. In Oracle this can be implemented using Oracle CDC, Golden Gate, Trigger, Materialized View Logs or in SQL Server there is also a CDC functionality.
But due to company rules this is not always an option and sometimes it's also necessary to synchronise the DWH with the source system when you are missing some changes in the DWH, for example after errors in the ETL process or if CDC is disabled for some reason, e.g. after structural changes in the source system. For small dimension tables comparing the source table with the DWH table is not a big deal. But when we need to compare large tables containing millions of rows, this might be a performance issue.

After some testing, I found the by far fastest approach to compare two tables here: On Injecting and Comparing
As we also need updated values, I added an analytic function to partition the result set by the key columns. 
The structure of the result is similiar to the one CDC would give you. You get the inserted rows (I), deleted rows (D) and updated rows (U) with the new values and also the old values (O) if desired.

SELECT Key1,Key2,Attribute1,Attribute2,Attribute3
       CASE WHEN count = 2 and tbl1 = 1 
            THEN 'U'
            WHEN count = 1 and tbl1 = 1
            THEN 'I'
            WHEN count = 1 and tbl1 = 0 
            THEN 'D'
            ELSE 'O' 
       end dmltype
  FROM (SELECT Key1,Key2,Attribute1,Attribute2, Attribute3,
               min(count1) tbl1,         
             count(*) over (partition by Key1,Key2 order by null) count
          FROM(SELECT Key1,Key2,Attribute1,Attribute2,Attribute3, 
                       count(tbl1) count1
                       ,count(tbl2) count2
                  FROM ( SELECT  Key1,Key2,Attribute1,Attribute2,Attribute3
                               1 tbl1, 
                               to_number(null) tbl2 
                           FROM  source_table                     
                        UNION ALL
                         SELECT  Key1,Key2,Attribute1,Attribute2,Attribute3
                               to_number(null) tbl1, 
                               2 tbl2 
                           FROM  DWH_Table                     
                       )
                 GROUP BY Key1,Key2,Attribute1,Attribute2,Attribute3
                 having count(tbl1) != count(tbl2)
               )
               GROUP BY Key1,Key2,Attribute1,Attribute2,Attribute3 
       )

The performance of this query is incredible compared to other solutions, like using an outer join operation. Thank you Tom Kyte!
Please note that you must cast the source and target columns to the same datatype before you can perform the union. If you are comparing tables from databases with different character sets you also have to convert text columns into the same character set, for example using the oracle convert function (e.g. convert('text','WE8EBCDIC500', 'WE8ISO8859P1'))

The query returns inserted, deleted and updated rows, which can be the source for an usual ETL process, that inserts the SCD (1,2,3) data into the DWH.

Saturday, December 24, 2011

QaaWS and web tier architecture

Connection problems using QaaWS in a web tier architecture

SAP Business Objects XI 3.1

In a recent project it took me some days to get QaaWS working in a BO web tier architecture. I didn't find enough information on the internet, so I want to share my experiences here. The idea was to publish a Query as a Web Service on the Webserver, which gets the data from a database connection stored on the BO Server. A public Xcelsius flash object can then access the data via Web Service.

Our web tier architecture consists of
  • 1 BO Server (CMS, Repository)
  • 1 Web Server (Tomcat 5.5) - exposed to the internet
In retrospect there were three main points I had to consider to finally get Xcelsius via QaaWS and web tier architecture working:

Assign ports
To get the connection between the BO Server and Web Server working when using a firewall you have to manually assign port numbers to the 'Request Port' in your CMC for all necessary BO services. Uncheck the Auto assign box and enter a free port number. If you don't do that, the services would pick different random port numbers. Enter for your CMS request port 6401 for example and don't forget to assign to the other servers a port number, if you need them (Webi processing server, etc.)


After that you can open those ports on your firewall. The Web Server should now communicate with the BO Server.

Adjust dsws.properties
If your QaaWS still doesn't provide any data, you probably have to add your CMS server name to the dsws.properties configuration file. To my mind that's a bug, because I already entered the name of the CMS Server when I did the web tier installation and the connection works, but it does not when using QaaWS. Dswsbobje is looking for the CMS on the Web Server and of course can't find it. So open your dsws.properties file and just add/change the line

domain=Server:port (e.g. domain=BOXI31:6400)

Restart your Web Server. Now also QaaWS (the dswsbobje application) knows where your CMS is.

Add crossdomain.xml
Another prerequisite when accessing data external with flash is that you need crossdomain.xml file in your webapps-root directory (error codes: #2170 or #2048) . Therefore you find lots of information on the internet (e.g. here or here). Just add a text file with the file name 'crossdomain.xml' to the ROOT folder of your web server (e.g. C:\Program Files (x86)\Business Objects\Tomcat55\webapps\ROOT) with the following content:

<?xml version="1.0"?><!DOCTYPE cross-domain-policy SYSTEM"http://www.macromedia.com/xml/dtds/cross-domain-policy.dtd "><cross-domain-policy><allow-http-request-headers-from domain="*" headers="*" secure="false" /><allow-access-from domain="*" secure="false" /></cross-domain-policy>

Restart your Web Server. Try to open your flash object again.

Friday, December 23, 2011

Wait event asynch descriptor resize

Oracle Create Materialized View hangs and waiting for asynch descriptor resize - event 

Oracle Database 11g Release 2 x64 (11.2.0.1.0), Windows Server 2008 R2 64bit

Recently I wanted to create a materialized view for a usual query, which took about 1 minute to execute while the creation of the materialized view didn't stop for hours until I killed the process.

Monitoring the session showed me the wait event asynch descriptor resize. The following article explains the event very detailed and suggests to contact oracle support if it causes bigger problems: Asynch descriptor resize wait event in Oracle. Upgrading the database could also solve the issue.

As for me those solutions weren't possible I started to trace the session and saw lots of sequential reads: 

"WAIT #6: nam='db file sequential read' ela= 4384 file#=5 block#=2092840 blocks=1 obj#=77661 tim=7422676905762"

After hours of tracing and googleing I tried to simplify my query step by step and came to the conclusion that some 'special' SQL constructs cause that the materialized view creation never stops. 

The causing SQL part was a subquery that returns the latest record of a table:
(select a, b, c, day
from table x
where day = (select max(day) from table y where x.a=y.a))

I could rewrite the query using oracle analytic functions (Analytic functions by Example) and voilĂ  my materialized view was created in about 1 minute.

The same problem occured in a query with 'NOT IN(' followed by a subquery. Also in that case there are several ways to rewrite and probably tune the query (NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle).


Another try could be setting SGA values manually to fixed values instead of using AMM and memory target.

Sorry that I don't have an explanation why those query constructs cause the materialized view creation to hang. Obviously it's a bug and probably solved in later releases. Anyway, if you have the same issue, I hope my post helps you to identify the cause of it.