Tuesday, May 13, 2008

brief about Slowly Changing Dimensions

Slowly Changing Dimensions:

The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below:
Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record:

Customer Key Name State
1001 Christina Illinois

At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem.

There are in general three ways to solve this type of problem, and they are categorized as follows:

Type 1: The new record replaces the original record. No trace of the old record exists.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Type 3: The original record is modified to reflect the change.

Type 1:
We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we compare and contrast among the three alternatives.
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:

Customer Key Name State
1001 Christina California

Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.

Disadvantages:
- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.

Usage:
About 50% of the time.

When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

Type 2:
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The newe record gets its own primary key.
In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the table:

Customer Key Name State
1001 Christina Illinois
1005 Christina California

Advantages:
- This allows us to accurately keep all historical information.

Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
- This necessarily complicates the ETL process.

Usage:
About 50% of the time.

When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

Type 3:
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.
In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
• Customer Key
• Name
• Original State
• Current State
• Effective Date

After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):

Customer Key Name Original State Current State Effective Date
1001 Christina Illinois California 15-JAN-2003

Advantages:
- This does not increase the size of the table, since new information is updated.
- This allows us to keep some part of history.

Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.

Usage:
Type 3 is rarely used in actual practice.

When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.

Monday, May 5, 2008

To create a DBLINK using Derived Tables

1. Create a DBLINK in Oracle on Server1 with the following statement:
2. CREATE DATABASE LINK dblink_name CONNECT TO user_name_on_server2 IDENTIFIED BY password USING 'connect_string_to_server2';
3. Create a synonym for the DBLINK on Server1 using the following statement:
4. CREATE SYNONYM synonym_name FOR user_name_on_server2.table_name_on_server2@dblink_name_server2
5. Ensure the synonym for the linked database on Server1 is added to the tnsnames.ora file of the target database on Server2. If not, Oracle will return the ORA-12154 error message.
6. Log in to Server1.
7. Query the DBLINK synonym using the following SQLPlus statement:
8. SELECT * FROM synonym_name
9. Log into Designer.
10. Click Insert Table > Derived Tables.
11. Query the DBLINK synonym using the following SQLPlus statement:
12. SELECT * FROM synonym_name
13. If an error is returned, close the Derived Tables dialog box and reopen. If the message "Parse OK" is returned, then click OK.