veerendra

Wednesday, August 20, 2008

46. Transformations-2

Aggregator Transformation

Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.

Expression Transformation

Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field.

Filter Transformation

Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet the condition. For example, to know all the employees who are working in Department 10 or to find out the products that falls between the rate category $500 and $1000.

Joiner Transformation

Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.

In order to join two sources, there must be atleast one matching port. at least one matching port. While joining two sources it is a must to specify one source as master and the other as detail.

The Joiner transformation supports the following types of joins:

1)Normal

2)Master Outer

3)Detail Outer

4)Full Outer

Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.

Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.

Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.

Full outer join keeps all rows of data from both the master and detail sources.

Lookup Transformation

Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up data in a relational table, view, or synonym. Lookup definition can be imported either from source or from target tables.

For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.

Difference between Connected and UnConnected Lookup Transformation:

Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation.

Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and returns one column from each row.

Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values.

Normalizer Transformation

Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data.

Rank Transformation

Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data.

For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.

Router Transformation

Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups.

For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to different tables.

Sequence Generator Transformation

Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys.

It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation).

NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.

Stored Procedure Transformation

Stored Procedure transformation is a Passive and Connected & UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc.

The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.

Sorter Transformation

Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.

Source Qualifier Transformation

Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation.

The Source Qualifier performs the various tasks such as overriding default SQL query, filtering records; join data from two or more tables etc.

Update Strategy Transformation

Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update, delete or data driven.

Transformations

XML Source Qualifier Transformation

XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources.

Advanced External Procedure Transformation

Advanced External Procedure transformation is an Active and Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality.

It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.

External Procedure Transformation

External Procedure transformation is an Active and Connected/UnConnected transformations. Sometimes, the standard transformations such as

Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.

Differences between Advanced External Procedure and External Procedure Transformations:

1) External Procedure returns single value, whereas Advanced External Procedure returns multiple values.

2) External Procedure supports COM and Informatica procedures whereas AEP supports only Informatica Procedures.

45. Transformations

Informatica - Transformations

In Informatica, Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target.
Transformations are of two types: Active and Passive.

Active Transformation

An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.

Passive Transformation

A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.

Transformations can be Connected or UnConnected.

Connected Transformation

Connected transformation is connected to other transformations or directly to target table in the mapping.

UnConnected Transformation

An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Following are the list of Transformations available in Informatica:

Aggregator Transformation

Expression Transformation

Filter Transformation

Joiner Transformation

Lookup Transformation

Normalizer Transformation

Rank Transformation

Router Transformation

Sequence Generator Transformation

Stored Procedure Transformation

Sorter Transformation

Update Strategy Transformation

XML Source Qualifier Transformation

Advanced External Procedure Transformation

External Transformation

In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail.

44. Informatica Components

Guidelines to work with Informatica Power Center

Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the Repository Server would access this repository to retrieve, store and manage metadata.

Power Center Client: Informatica client is used for managing users, identifiying source and target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.

Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client.

Power Center Server: Power Center server does the extraction from source and then loading data into targets.

Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source Analyzer is used for extracting metadata from source systems.
Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data from source to target.
Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.

Data Cleansing: The PowerCenter's data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).

Transformation: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.

Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.

Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.

Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

43. DWH FAQ"S

1) How many types of approaches in DHW?
Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball)
2) Explain Star Schema?
Star Schema consists of one or more fact table and one or more dimension tables that are related to foreign keys. Dimension tables are De-normalized, Fact table-normalized
Advantages: Less database space & Simplify queries.
3) Explain Snowflake schema?
Snow flake schema is a normalize dimensions to eliminate the redundancy.The dimension data has been grouped into one large table. Both dimension and fact tables normalized.
4) What is confirm dimension?
If both data marts use same type of dimension that is called confirm dimension.If you have same type of dimension can be used in multiple fact that is called confirm dimension.
5) Explain the DWH architecture?
6) What is a slowly growing dimension?
Slowly growing dimensions are dimensional data,there dimensions increasing dimension data with out update existing dimensions.That means appending new data to existing dimensions.
7) What is a slowly changing dimension?
Slowly changing dimension are dimension data,these dimensions increasing dimensions data with update existing dimensions.
Type1: Rows containing changes to existing dimensional are update in the target by overwriting the existing dimension.In the Type1 Dimension mapping, all rows contain current dimension data.
Use the type1 dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the target.Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type2 Dimension/version data mapping to update a slowly changing dimension when you want to keep a full history of dimension data in the table.version numbers and versioned primary keys track the order of changes to each dimension.
Type3: The type 3 dimension mapping filters source rows based on user-defined comparisions and inserts only those found to be new dimensions to the target.Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension the informatica server saves existing data in different columns of the same row and replaces the existing data with the updates.
8) When you use for dynamic cache.
Your target table is also look up table then you go for dynamic cache .In dynamic cache multiple matches return an error.use only = operator.
9) what is lookup override?
Override the default SQL statement.You can join multiple sources use lookup override.By default informatica server add the order by clause.
10) we can pass the null value in lookup transformation?Lookup transformation returns the null value or equal to null value.
1) what is the target load order?
You specify the target load order based on source qualifiers in a mapping.if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which informatica server loads data into the targets.
2) what is default join that source qualifier provides?
Inner equi join.
3) what are the difference between joiner transformation and source qualifier transformation?
You can join heterogeneous data sources in joiner transformation, which we cannot achive in source qualifier transformation.
You need matching keys to join two relational sources in source qualifier transformation.where you doesn’t need matching keys to join two sources.
Two relational sources should come from same data source in source qualifier.You can join relational sources, which are coming from different sources in source qualifier.You can join relational sources which are coming from different sources also.
4) what is update strategy transformation?
Whenever you create the target table whether you are store the historical data or current transaction data in to target table.
5) Describe two levels in which update strategy transformation sets?
6) what is default source option for update strategy transformation?
Data driven.
7) What is data driven?
The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping.
8) what are the options in the trarget session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table.
9) Difference between the source filter and filter?
Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source.

10) what is a tracing level?
Amount of information sent to log file.
-- What are the types of tracing levels?
Normal,Terse,verbose data,verbose intitialization.
--Expalin sequence generator transformation?
-- can you connect multiple ports from one group to multiple transformations?
Yes

1) can you connect more than one group to the same target or transformation?
NO
2) what is a reusable transformation?
Reusable transformation can be a single transformation.This transformation can be used in multiple mappings.when you need to incorporate this transformation into mapping you add an instance of it to mapping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transformation is a pointer to that transformation.U can change the transformation in the transformation developer, its instance automatically reflect these changes. This feature can save U great deal of work.
-- what are the methods for creating reusable transformation?
Two methods
1) Design it in the transformation developer.
2) Promote a standard transformation from the mapping designer.After you add a transformation to the mapping, you can promote it to status of reusable transformation.
Once you promote a standard transformation to reusable status, you can demote it to a standard transformation at any time.
If u change the properties of a reusable transformation in mapping , you can revert it to the original reusable transformation properties by clicking the revert.
3) what are mapping parameters and mapping variables?
Mapping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session.
When you use the mapping parameter , you declare and use the parameter in a mapping or mapplet.Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change through out the session. The informatica server save the value of mapping variable to the repository at the end of session run and uses that value next time you run the session.
4) can you use the mapping parameters or variables created in one mapping into another mapping?
NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables.
5) Can you are the mapping parameters or variables created in one mapping into any other result transformation.
Yes because the reusable transformation is not contained with any mapplet or mapping.
6) How the informatica server sorts the string values in rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string.

7) What is the rank index in rank transformation?
The designer automatically creates a RANKINDEX port for each Rank transformation. The informatica server uses the Rank Index port to store the ranking position for each record in a group.For example, if you create a Rank transformation that ranks the top 5 sales persons for each quarter, the rank index number the salespeople from 1 to 5.
8) what is the mapplet?
Mapplet is a set of transformation that you build in the mapplet designer and you can use in multiple mappings.
9) Difference between mapplet and reusable transformation?
Reusable transformation can be a single transformation.Where as mapplet use multiple transformations.
10) what is a parameter a file?
Paramater file defines the values for parameter and variables.

42. DWH FAQ's

1) What is Data warehouse?
Data warehouse is relational database used for query analysis and reporting. By definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time variant.
Subject oriented : Data warehouse is maintained particular subject.
Integrated : Data collected from multiple sources integrated into a
user readable unique format.
Non volatile : Maintain Historical date.
Time variant : data display the weekly, monthly, yearly.
2) What is Data mart?
A subset of data warehouse is called Data mart.
3) Difference between Data warehouse and Data mart?
Data warehouse is maintaining the total organization of data. Multiple data marts used in data warehouse. where as data mart is maintained only particular subject.
4) Difference between OLTP and OLAP?
OLTP is Online Transaction Processing. This is maintained current transactional data. That means insert, update and delete must be fast.
5) Explain ODS?
Operational data store is a part of data warehouse. This is maintained only current transactional data. ODS is subject oriented, integrated, volatile, current data.
6) Difference between Power Center and Power Mart?
Power center receive all product functionality including ability to multiple register servers and metadata across the repository and partition data.
One repository multiple informatica servers. Power mart received all features except multiple register servers and partition data.
7) What is a staging area?
Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing.
When ever your data put in data warehouse you need to clean and process your data.
8) Explain Additive, Semi-additive, Non-additive facts?
Additive fact: Additive Fact can be aggregated by simple arithmetical additions.
Semi-Additive fact: semi additive fact can be aggregated simple arithmetical
additions along with some other dimensions.
Non-additive fact: Non-additive fact can’t be added at all.
9) What is a Fact less Fact and example?
Fact table which has no measures.
10) Explain Surrogate Key?
Surrogate Key is a series of sequential numbers assigned to be a primary key for the table.

Monday, August 11, 2008

41. Sites

1. http://www.java2s.com/Tutorial/Oracle/0020__Introduction
2. http://washingtondc.craigslist.org/mld/res/792098463.html
3. http://www.indiana.edu/~dss/Services/Informatica/etl_developer.html

Monday, July 28, 2008

40. Final Nav

Final Navigation

1. Clk the Target folder in Navigator.

2. Tools----W. Designer (don’t delete previous targets)

3. Targets---Create (manually)

Drag and drop source folder/target folder in Nav.

Import from DB/DWH.

4. Clk the Mappings folder

5. Tools---M.Designer(don’t delete old mappings)

6. Mappings---Create and Name it

7. Drag and drop sources.(if not available import them in S.Analyzer)

8. Drag and drop targets.

9.Create Transformation and Enter Cols.

Drag and drop cols to transformation from sources or targets.

Or (optional)

Drag and drop transformation from transformation folder.

10. Link all sources, transformations and targets.

11. Validate and save.

---------------------------------------------------------------------------------------------------------------------------------