Wednesday, August 20, 2008

46. Transformations-2

Aggregator Transformation

Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.


Expression Transformation

Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field.


Filter Transformation

Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet the condition. For example, to know all the employees who are working in Department 10 or to find out the products that falls between the rate category $500 and $1000.


Joiner Transformation


Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.

In order to join two sources, there must be atleast one matching port. at least one matching port. While joining two sources it is a must to specify one source as master and the other as detail.


The Joiner transformation supports the following types of joins:


1)Normal

2)Master Outer

3)Detail Outer

4)Full Outer


Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.


Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.


Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.


Full outer join keeps all rows of data from both the master and detail sources.


Lookup Transformation

Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up data in a relational table, view, or synonym. Lookup definition can be imported either from source or from target tables.

For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.



Difference between Connected and UnConnected Lookup Transformation:


Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation.


Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and returns one column from each row.


Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values.


Normalizer Transformation

Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data.



Rank Transformation

Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data.

For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.


Router Transformation

Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups.

For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to different tables.


Sequence Generator Transformation

Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys.


It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation).


NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.



Stored Procedure Transformation

Stored Procedure transformation is a Passive and Connected & UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc.


The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.



Sorter Transformation

Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.


Source Qualifier Transformation

Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation.

The Source Qualifier performs the various tasks such as overriding default SQL query, filtering records; join data from two or more tables etc.


Update Strategy Transformation

Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update, delete or data driven.



45. Transformations

Informatica - Transformations

In Informatica, Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target.
Transformations are of two types: Active and Passive.


Active Transformation

An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.


Passive Transformation

A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.

Transformations can be Connected or UnConnected.


Connected Transformation

Connected transformation is connected to other transformations or directly to target table in the mapping.


UnConnected Transformation

An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.


Following are the list of Transformations available in Informatica:

Aggregator Transformation

Expression Transformation

Filter Transformation

Joiner Transformation

Lookup Transformation

Normalizer Transformation

Rank Transformation

Router Transformation


Sequence Generator Transformation

Stored Procedure Transformation

Sorter Transformation

Update Strategy Transformation

XML Source Qualifier Transformation

Advanced External Procedure Transformation

External Transformation


In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail.

44. Informatica Components


Guidelines to work with Informatica Power Center


  • Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the Repository Server would access this repository to retrieve, store and manage metadata.

  • Power Center Client: Informatica client is used for managing users, identifiying source and target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.

  • Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client.

  • Power Center Server: Power Center server does the extraction from source and then loading data into targets.

  • Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source Analyzer is used for extracting metadata from source systems.
    Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data from source to target.
    Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.

  • Data Cleansing: The PowerCenter's data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).

  • Transformation: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.

  • Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.

  • Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.

  • Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

  • Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

43. DWH FAQ"S

1) How many types of approaches in DHW?
Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball)
2) Explain Star Schema?
Star Schema consists of one or more fact table and one or more dimension tables that are related to foreign keys. Dimension tables are De-normalized, Fact table-normalized
Advantages: Less database space & Simplify queries.
3) Explain Snowflake schema?
Snow flake schema is a normalize dimensions to eliminate the redundancy.The dimension data has been grouped into one large table. Both dimension and fact tables normalized.
4) What is confirm dimension?
If both data marts use same type of dimension that is called confirm dimension.If you have same type of dimension can be used in multiple fact that is called confirm dimension.
5) Explain the DWH architecture?
6) What is a slowly growing dimension?
Slowly growing dimensions are dimensional data,there dimensions increasing dimension data with out update existing dimensions.That means appending new data to existing dimensions.
7) What is a slowly changing dimension?
Slowly changing dimension are dimension data,these dimensions increasing dimensions data with update existing dimensions.
Type1: Rows containing changes to existing dimensional are update in the target by overwriting the existing dimension.In the Type1 Dimension mapping, all rows contain current dimension data.
Use the type1 dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the target.Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type2 Dimension/version data mapping to update a slowly changing dimension when you want to keep a full history of dimension data in the table.version numbers and versioned primary keys track the order of changes to each dimension.
Type3: The type 3 dimension mapping filters source rows based on user-defined comparisions and inserts only those found to be new dimensions to the target.Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension the informatica server saves existing data in different columns of the same row and replaces the existing data with the updates.
8) When you use for dynamic cache.
Your target table is also look up table then you go for dynamic cache .In dynamic cache multiple matches return an error.use only = operator.
9) what is lookup override?
Override the default SQL statement.You can join multiple sources use lookup override.By default informatica server add the order by clause.
10) we can pass the null value in lookup transformation?Lookup transformation returns the null value or equal to null value.
1) what is the target load order?
You specify the target load order based on source qualifiers in a mapping.if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which informatica server loads data into the targets.
2) what is default join that source qualifier provides?
Inner equi join.
3) what are the difference between joiner transformation and source qualifier transformation?
You can join heterogeneous data sources in joiner transformation, which we cannot achive in source qualifier transformation.
You need matching keys to join two relational sources in source qualifier transformation.where you doesn’t need matching keys to join two sources.
Two relational sources should come from same data source in source qualifier.You can join relational sources, which are coming from different sources in source qualifier.You can join relational sources which are coming from different sources also.
4) what is update strategy transformation?
Whenever you create the target table whether you are store the historical data or current transaction data in to target table.
5) Describe two levels in which update strategy transformation sets?
6) what is default source option for update strategy transformation?
Data driven.
7) What is data driven?
The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping.
8) what are the options in the trarget session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table.
9) Difference between the source filter and filter?
Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source.

10) what is a tracing level?
Amount of information sent to log file.
-- What are the types of tracing levels?
Normal,Terse,verbose data,verbose intitialization.
--Expalin sequence generator transformation?
-- can you connect multiple ports from one group to multiple transformations?
Yes

1) can you connect more than one group to the same target or transformation?
NO
2) what is a reusable transformation?
Reusable transformation can be a single transformation.This transformation can be used in multiple mappings.when you need to incorporate this transformation into mapping you add an instance of it to mapping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transformation is a pointer to that transformation.U can change the transformation in the transformation developer, its instance automatically reflect these changes. This feature can save U great deal of work.
-- what are the methods for creating reusable transformation?
Two methods
1) Design it in the transformation developer.
2) Promote a standard transformation from the mapping designer.After you add a transformation to the mapping, you can promote it to status of reusable transformation.
Once you promote a standard transformation to reusable status, you can demote it to a standard transformation at any time.
If u change the properties of a reusable transformation in mapping , you can revert it to the original reusable transformation properties by clicking the revert.
3) what are mapping parameters and mapping variables?
Mapping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session.
When you use the mapping parameter , you declare and use the parameter in a mapping or mapplet.Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change through out the session. The informatica server save the value of mapping variable to the repository at the end of session run and uses that value next time you run the session.
4) can you use the mapping parameters or variables created in one mapping into another mapping?
NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables.
5) Can you are the mapping parameters or variables created in one mapping into any other result transformation.
Yes because the reusable transformation is not contained with any mapplet or mapping.
6) How the informatica server sorts the string values in rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string.

7) What is the rank index in rank transformation?
The designer automatically creates a RANKINDEX port for each Rank transformation. The informatica server uses the Rank Index port to store the ranking position for each record in a group.For example, if you create a Rank transformation that ranks the top 5 sales persons for each quarter, the rank index number the salespeople from 1 to 5.
8) what is the mapplet?
Mapplet is a set of transformation that you build in the mapplet designer and you can use in multiple mappings.
9) Difference between mapplet and reusable transformation?
Reusable transformation can be a single transformation.Where as mapplet use multiple transformations.
10) what is a parameter a file?
Paramater file defines the values for parameter and variables.

42. DWH FAQ's

1) What is Data warehouse?
Data warehouse is relational database used for query analysis and reporting. By definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time variant.
Subject oriented : Data warehouse is maintained particular subject.
Integrated : Data collected from multiple sources integrated into a
user readable unique format.
Non volatile : Maintain Historical date.
Time variant : data display the weekly, monthly, yearly.
2) What is Data mart?
A subset of data warehouse is called Data mart.
3) Difference between Data warehouse and Data mart?
Data warehouse is maintaining the total organization of data. Multiple data marts used in data warehouse. where as data mart is maintained only particular subject.
4) Difference between OLTP and OLAP?
OLTP is Online Transaction Processing. This is maintained current transactional data. That means insert, update and delete must be fast.
5) Explain ODS?
Operational data store is a part of data warehouse. This is maintained only current transactional data. ODS is subject oriented, integrated, volatile, current data.
6) Difference between Power Center and Power Mart?
Power center receive all product functionality including ability to multiple register servers and metadata across the repository and partition data.
One repository multiple informatica servers. Power mart received all features except multiple register servers and partition data.
7) What is a staging area?
Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing.
When ever your data put in data warehouse you need to clean and process your data.
8) Explain Additive, Semi-additive, Non-additive facts?
Additive fact: Additive Fact can be aggregated by simple arithmetical additions.
Semi-Additive fact: semi additive fact can be aggregated simple arithmetical
additions along with some other dimensions.
Non-additive fact: Non-additive fact can’t be added at all.
9) What is a Fact less Fact and example?
Fact table which has no measures.
10) Explain Surrogate Key?
Surrogate Key is a series of sequential numbers assigned to be a primary key for the table.

Monday, August 11, 2008

41. Sites

1. http://www.java2s.com/Tutorial/Oracle/0020__Introduction
2. http://washingtondc.craigslist.org/mld/res/792098463.html
3. http://www.indiana.edu/~dss/Services/Informatica/etl_developer.html

Monday, July 28, 2008

40. Final Nav

Final Navigation

1. Clk the Target folder in Navigator.

2. Tools----W. Designer (don’t delete previous targets)

3. Targets---Create (manually)

Or

Drag and drop source folder/target folder in Nav.

Or

Import from DB/DWH.

4. Clk the Mappings folder

5. Tools---M.Designer(don’t delete old mappings)

6. Mappings---Create and Name it

7. Drag and drop sources.(if not available import them in S.Analyzer)

8. Drag and drop targets.

9.Create Transformation and Enter Cols.

Or

Drag and drop cols to transformation from sources or targets.

Or (optional)

Drag and drop transformation from transformation folder.

10. Link all sources, transformations and targets.

11. Validate and save.

---------------------------------------------------------------------------------------------------------------------------------

39. DWH Nav Steps

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Repository Manager:

1.Rt clk on Repository(veerurepo)---2.Connect---3.Enter Username---4.Enter Password--5.Connect.

1.Clk Folder menu---2.Create----3.Enter name(Vfolder)---4.Ok---5.Close window.

Designer:

1.Rt clk on Repo(veerurepo)---2.Connect---3.UN&Password--4.Connect--5.Clk folder (Vfolder)—6.Clk source folder----

Tools---Source Analyzer--1.Sources---2.Import from Database.---3.Enter ODBC dsn------ 4.Username( in caps SCOTT)---5.Enter owner name (in caps SCOTT)---6.Enter Password (tiger)---7.Connect.—8.select tables---9.ok.

Note: If DSN is not created the procedure is as follows.

1.Clk on Ellipse button.---2.System DSN---3.Add---4.Select oracle in OraDB10g-home1(driver)--5.Finish---6.Enter kodad DSN in Data Source Name---7.Select TNS service name (server)---8.User Id (scott)---9.Test connection---10.Password(tiger)---11.Ok---

12.Ok(Oracle ODBC Driver Configuration window)---13.Ok(ODBC data source Adm window)--14.Select ODBC data source(kodad DSN)---15.User name (SCOTT)---16.Owner name (SCOTT)---17.Password(tiger)---18.Connect.---19.select tables---20.ok.

Note: UN must be in Caps.

Tools---Warehouse Designer---1.Clk Target folder------2.Targets---3.Create---4.Enter table name(vtable)---5.select the database---6.create---7.done---8.Dbl clk on Target table--9.Columns tab---10.Clk on icon---11.Enter the fields---12.Apply---13.Ok—14.Targets---15.Generate/ExecuteSQL---16.Enable selected Tables---17.Create table---18.Primary key19.Connect----20. Enter ODBC dsn----21.User name---22.Password ----23.connect-----24.Generate and Execute button---25.close---26.Message box ---27.OK.

Tools---Transformation Developer----1.Create---2.select Transformation type----3.Enter name--4.Create---5.Done---6.Dbl Clk Transformation----7.Clk on icon to enter the fields---8.Do Editings----9.Validate---10.ok---11.Apply---12.ok.

Tools--Mapping Designer---1.Goto navigator---2.Explore source folder---3.Drag & drop source tables—4.Explore target folder---5.Drag & drop targrt tables---6.Explore Transformation—7. Drag and drop Transformation----8.Link Source,Transformation,Target---9.Mappings----10.Validate.

------------------------------------------------------------------------------------------------------------

WORKFLOW MANAGER:

1.From left panel----2.select Task folder----3.Tools---4.Task Developer---5.Tasks---6.create---7.select mapping---8.ok---9.done---

10.connections---11.Relational---12.new---13.Select DB (oracle)---14.ok---

15.enter name---16.UN---17.Pwd---18.connect string---19.code page---20.ok---

21.Repeat from 12 to 20 steps.---22.close window---23.Dbl clk on task(EDIT)---

24.mapping tab---25.select sources from left panel---26.clk down arrow in value column in TYPE row---27.select veerusource in Relational connection Browser---28.ok---

29.select target from left panel---30. clk down arrow in value column in TYPE row .---31.select veerutarget in Relational connection Browser and click normal ---32.ok---33.Apply---34.ok---35.Tools---36.Workflow Designer---37.workflows---38.create---39.enter name for workflow(veeruworkflow)---40.select server and clk Ok---41.clk session from left panel---42.Drag and drop veerutask to right panel---43.Establish link---44.Save---

45.workflow--- start worlflow---select * from Target Table(vtable) for OUTPUT.

In DATABASE

For Source/Metadata Database

Connect sys as sysdba;

Create user veerus identified by veerus;

Grant connect, resource to veerus;

Connect veerus/veerus;

Select * from tab;

For Target Database

Connect sys as sysdba;

Create user veerut identified by veerut;

Grant connect, resource to veerut;

Connect veerut/veerut;

Select * from tab;

To copy tables

Connect scott/tiger;

Grant select on emp to veerus;

To paste tables into our sources

Connect veerus/veerus;

Create table emp as (select * from scott.emp);

Select * from tab;

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Precautions

1.In source Analyzer---------- Remove all the FKs.

2.In Warehouse Designer---- Enable selected tables, create table, Pk, connect,

Generate and Execute, close.

3.In Workflow Manager----- In create workflow window after naming select server.

Thursday, July 17, 2008

38. ETL Concepts

Source: http://www.learnbi.com/informatica9.htm

Formatting Columns of Numeric Data:

  • 1. Open the Microsoft Excel and select the columns that consist of numeric data.
  • 2. Choose Format Cells.
  • 3. In the Number tab, select Number.
  • 4. Specify the decimal place and click OK.
  • 5. Click File save.
Step IV. Importing Excel source definition:

  • 1. In the Informatica Designer connect to the respective repository and open folder for the source definition.
  • 2. Open the source analyzer and select Source from Import from Database.
  • 3. Import table opens up and select the Microsoft data source which is the Microsoft excel worksheet name in the ODBC Data source name.
  • 4. Click Browse and open up the ODBC administrator.
  • 5. In the user or system DSN tabs, double click the Microsoft excel driver.
  • 6. Click select workbook and browse for the Microsoft excel file, which is considered as the relational database.
  • 7. Click OK and connect.
  • 8. It is not required to enter the data base user name or the password. The ranges defined in the Microsoft excel appears as the username.
  • 9. There is no Owner name as we are not using the database.
  • 10. Select the table we wanted to import and click Ok.
  • 11. The newly created source definition would be there in the navigator under the data base name. Choose Repository Save.


37. Diagram of ETL

source: http://www.learndatamodeling.com/etl.htm

Figure 1.12 : Sample ETL Process Flow

36. ETL Concepts

Source: http://www.learndatamodeling.com/etl.htm

Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping
The definition of the relationship and data flow between source and target objects.

Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.

Staging Area
A place where data is processed before entering the warehouse.

Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Wednesday, July 16, 2008

35. INFORMATICA FAQ'S

SOURCE: 1. http://www.allinterview.com/showanswers/69154.html
SOURCE: 2. http://deviinformatica.blogspot.com/2007/09/informatica-interview-questions.html

Re: Differnce between filter and router?
Answer
# 1
first the Similarity between Router and Filter is

-Both transformation would able to discard data based on
some filter condition.

The Differences are...
1.Using router transformation one would able to test
incoming data for multiple filter condition and if none of
the condition met the incoming rows gets reouted to DAFAULT
group.

2.Using filter transformation one would able to pass data
to next transformation based on single filter pipeline
condition.There is no option of routing data that doesnt
match the condition like ROUTER Transformation.

Monday, July 14, 2008

34. PL/SQL SYLLABUS

techonthenet.com

Oracle is a relational database technology.

PLSQL stands for "Procedural Language extensions to SQL", and can be used in Oracle databases. PLSQL is closely integrated into the SQL language, yet it adds programming constructs that are not native to SQL.

We've categorized Oracle and PLSQL into the following topics:

Data Types SELECT Statement
Literals (Constants) DISTINCT
Declaring Variables COUNT / SUM / MIN / MAX
Is Null / Is Not Null

WHERE Clause
Loops and Conditional Statements "AND" Condition
Sequences (Autonumber) "OR" Condition
Transactions Combining "AND" with "OR"
Cursors
Functions (Built-In) (By Category) "LIKE" Condition
Functions (Built-In) (Alphabetical) "IN" Function
Oracle System Tables BETWEEN Condition

EXISTS Condition
Primary Keys GROUP BY
Foreign Keys HAVING
Unique Constraints ORDER BY (sort by)
Check Constraints
Indexes JOINS (inner, outer)

Subqueries
Creating Functions
Creating Procedures UNION Query
Creating Triggers UNION ALL Query
Exception Handling INTERSECT Query
Oracle Error Messages MINUS Query


Grant/Revoke Privileges UPDATE Statement
Roles (set of privileges) INSERT Statement
Change Password DELETE Statement


Synonyms (create, drop) Tables (create, alter, drop, temp)

Views

33. SQL Syllabus on net

http://download.oracle.com/docs/cd/B10501_01/server.920/a90842/ch13.htm#1012319

Shortcuts to Items in the List

% A B C D E F G H I J K L M N O P Q R S
S is currently selected
T U V W X
SA SC SD SE SG SH SI SK SL SO SP SQ ST SU SW SY

Statements, Keywords, Packages, and Functions

SAVE: Definition

SAVE EXCEPTIONS: Definition

SAVE Function: Definition

32. SQL TUTORIAL


http://www.sql-tutorial.net
SQL Tutorial
SQL Database Table
SQL SELECT
SQL SELECT INTO
SQL DISTINCT
SQL WHERE
SQL LIKE
SQL INSERT INTO
SQL UPDATE
SQL DELETE
SQL ORDER BY
SQL OR & AND
SQL IN
SQL BETWEEN
SQL Aliases
SQL COUNT
SQL MAX
SQL MIN
SQL AVG
SQL SUM
SQL GROUP BY
SQL HAVING
SQL JOIN
SQL Training
SQL Server
SQL Hosting

Sunday, July 13, 2008

31. SQL QUERIES

TIZAG.COM

SQL - Subqueries

MySQL offers a very limited support for subqueries, however Oracle and DB2 fully support them. Subqueries are Select queries placed within an existing SQL statement. They may exist in any of the following types of SQL statements.

  • Select
  • Insert
  • Update
  • Delete
  • Set
  • Do

Subqueries are great for answering very specific questions regarding your data inside your database. For instance, as the employer you may notice employee number 101 had a great day yesterday with sales. Just given this information we can use a subquery to pull the employee lastname and first name from our database.

SQL Code:

SELECT * FROM employees
WHERE id =
(SELECT EmployeeID FROM invoices WHERE EmployeeID='1');

SQL Table:

idLastnameFirstnameTitle
11DavisJulieMANAGER

Here we have pulled our employee information from the employees table by only knowing the employee number from the invoices table.

SQL - Subquery Inserts

Subqueries can be used to pull old data from your database and insert it into new tables. For instance if we opened up a third store and we wanted to place the same manager over 3 stores we could do this by pulling the manager's information using a subquery and then inserting the records. Also note that this form of insert will insert all cases where the subquery is true, therefore several rows may or may not be inserted depending upon how your table is set up.

SQL Code:

INSERT INTO employees3
(id,Lastname,Firstname,Title)
(SELECT id,Lastname,Firstname,Title
FROM employees WHERE Title='manager');

With complete mastery of a subqueries you can now see the power of the SQL language. The language is capable of nearly all things imaginable.

30. CREATE VIEW

A view is a virtual table based on the result-set of a SELECT statement.


What is a View?

In SQL, a VIEW is a virtual table based on the result-set of a SELECT statement.

A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from a single table.

Note: The database design and structure will NOT be affected by the functions, where, or join statements in a view.

Syntax

CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Note: The database does not store the view data! The database engine recreates the data, using the view's SELECT statement, every time a user queries a view.


Using Views

A view could be used from inside a query, a stored procedure, or from inside another view. By adding functions, joins, etc., to a view, it allows you to present exactly the data you want to the user.

The sample database Northwind has some views installed by default. The view "Current Product List" lists all active products (products that are not discontinued) from the Products table. The view is created with the following SQL:

CREATE VIEW [Current Product List] AS
SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No

We can query the view above as follows:

SELECT * FROM [Current Product List]

Another view from the Northwind sample database selects every product in the Products table that has a unit price that is higher than the average unit price:

CREATE VIEW [Products Above Average Price] AS
SELECT ProductName,UnitPrice
FROM Products
WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)

We can query the view above as follows:

SELECT * FROM [Products Above Average Price]

Another example view from the Northwind database calculates the total sale for each category in 1997. Note that this view selects its data from another view called "Product Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName

We can query the view above as follows:

SELECT * FROM [Category Sales For 1997]

We can also add a condition to the query. Now we want to see the total sale only for the category "Beverages":

SELECT * FROM [Category Sales For 1997]
WHERE CategoryName='Beverages'

29. SELECT INTO

The SELECT INTO Statement

The SELECT INTO statement is most often used to create backup copies of tables or for archiving records.

Syntax

SELECT column_name(s) INTO newtable [IN externaldatabase]
FROM source


Make a Backup Copy

The following example makes a backup copy of the "Persons" table:

SELECT * INTO Persons_backup
FROM Persons

The IN clause can be used to copy tables into another database:

SELECT Persons.* INTO Persons IN 'Backup.mdb'
FROM Persons

If you only want to copy a few fields, you can do so by listing them after the SELECT statement:

SELECT LastName,FirstName INTO Persons_backup
FROM Persons

You can also add a WHERE clause. The following example creates a "Persons_backup" table with two columns (FirstName and LastName) by extracting the persons who lives in "Sandnes" from the "Persons" table:

SELECT LastName,Firstname INTO Persons_backup
FROM Persons
WHERE City='Sandnes'

Selecting data from more than one table is also possible. The following example creates a new table "Empl_Ord_backup" that contains data from the two tables Employees and Orders:

SELECT Employees.Name,Orders.Product
INTO Empl_Ord_backup
FROM Employees
INNER JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID

28. GROUP BY

GROUP BY...

GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate of all column values every time they are called, and without the GROUP BY function it was impossible to find the sum for each individual group of column values.

The syntax for the GROUP BY function is:

SELECT column,SUM(column) FROM table GROUP BY column


GROUP BY Example

This "Sales" Table:

Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100

And This SQL:

SELECT Company, SUM(Amount) FROM Sales

Returns this result:

Company SUM(Amount)
W3Schools 17100
IBM 17100
W3Schools 17100

The above code is invalid because the column returned is not part of an aggregate. A GROUP BY clause will solve this problem:

SELECT Company,SUM(Amount) FROM Sales
GROUP BY Company

Returns this result:

Company SUM(Amount)
W3Schools 12600
IBM 4500


HAVING...

HAVING... was added to SQL because the WHERE keyword could not be used against aggregate functions (like SUM), and without HAVING... it would be impossible to test for result conditions.

The syntax for the HAVING function is:

SELECT column,SUM(column) FROM table
GROUP BY column
HAVING SUM(column) condition value

This "Sales" Table:

Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100

This SQL:

SELECT Company,SUM(Amount) FROM Sales
GROUP BY Company
HAVING SUM(Amount)>10000

Returns this result

Company SUM(Amount)
W3Schools 12600

27. FUNCTIONS

SQL has a lot of built-in functions for counting and calculations.


Function Syntax

The syntax for built-in SQL functions is:

SELECT function(column) FROM table


Types of Functions

There are several basic types and categories of functions in SQL. The basic types of functions are:

  • Aggregate Functions
  • Scalar functions

Aggregate functions

Aggregate functions operate against a collection of values, but return a single value.

Note: If used among many other expressions in the item list of a SELECT statement, the SELECT must have a GROUP BY clause!!

"Persons" table (used in most examples)

Name Age
Hansen, Ola 34
Svendson, Tove 45
Pettersen, Kari 19

Aggregate functions in MS Access

Function Description
AVG(column) Returns the average value of a column
COUNT(column) Returns the number of rows (without a NULL value) of a column
COUNT(*) Returns the number of selected rows
FIRST(column) Returns the value of the first record in a specified field
LAST(column) Returns the value of the last record in a specified field
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
STDEV(column)
STDEVP(column)
SUM(column) Returns the total sum of a column
VAR(column)
VARP(column)

Aggregate functions in SQL Server

Function Description
AVG(column) Returns the average value of a column
BINARY_CHECKSUM
CHECKSUM
CHECKSUM_AGG
COUNT(column) Returns the number of rows (without a NULL value) of a column
COUNT(*) Returns the number of selected rows
COUNT(DISTINCT column) Returns the number of distinct results
FIRST(column) Returns the value of the first record in a specified field (not supported in SQLServer2K)
LAST(column) Returns the value of the last record in a specified field (not supported in SQLServer2K)
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
STDEV(column)
STDEVP(column)
SUM(column) Returns the total sum of a column
VAR(column)
VARP(column)


Scalar functions

Scalar functions operate against a single value, and return a single value based on the input value.

Useful Scalar Functions in MS Access

Function Description
UCASE(c) Converts a field to upper case
LCASE(c) Converts a field to lower case
MID(c,start[,end]) Extract characters from a text field
LEN(c) Returns the length of a text field
INSTR(c,char) Returns the numeric position of a named character within a text field
LEFT(c,number_of_char) Return the left part of a text field requested
RIGHT(c,number_of_char) Return the right part of a text field requested
ROUND(c,decimals) Rounds a numeric field to the number of decimals specified
MOD(x,y) Returns the remainder of a division operation
NOW() Returns the current system date
FORMAT(c,format) Changes the way a field is displayed
DATEDIFF(d,date1,date2) Used to perform date calculations

26. ALTER

ALTER TABLE

The ALTER TABLE statement is used to add or drop columns in an existing table.

ALTER TABLE table_name 
ADD column_name datatype
ALTER TABLE table_name 
DROP COLUMN column_name

Note: Some database systems don't allow the dropping of a column in a database table (DROP COLUMN column_name).


Person:

LastName FirstName Address
Pettersen Kari Storgt 20


Example

To add a column named "City" in the "Person" table:

ALTER TABLE Person ADD City varchar(30)

Result:

LastName FirstName Address City
Pettersen Kari Storgt 20

Example

To drop the "Address" column in the "Person" table:

ALTER TABLE Person DROP COLUMN Address

Result:

LastName FirstName City
Pettersen Kari

25. DROP

Drop Index

You can delete an existing index in a table with the DROP INDEX statement.

Syntax for Microsoft SQLJet (and Microsoft Access):

DROP INDEX index_name ON table_name

Syntax for MS SQL Server:

DROP INDEX table_name.index_name

Syntax for IBM DB2 and Oracle:

DROP INDEX index_name

Syntax for MySQL:

ALTER TABLE table_name DROP INDEX index_name


Delete a Table or Database

To delete a table (the table structure, attributes, and indexes will also be deleted):

DROP TABLE table_name

To delete a database:

DROP DATABASE database_name


Truncate a Table

What if we only want to get rid of the data inside a table, and not the table itself? Use the TRUNCATE TABLE command (deletes only the data inside the table):

TRUNCATE TABLE table_name

24. CREATE

Create a Database

To create a database:

CREATE DATABASE database_name


Create a Table

To create a table in a database:

CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
.......

)

Example

This example demonstrates how you can create a table named "Person", with four columns. The column names will be "LastName", "FirstName", "Address", and "Age":

CREATE TABLE Person
(
LastName varchar,
FirstName varchar,
Address varchar,
Age int
)

This example demonstrates how you can specify a maximum length for some columns:

CREATE TABLE Person
(
LastName varchar(30),
FirstName varchar,
Address varchar,
Age int(3)
)

The data type specifies what type of data the column can hold. The table below contains the most common data types in SQL:

Data Type Description
integer(size)
int(size)
smallint(size)
tinyint(size)
Hold integers only. The maximum number of digits are specified in parenthesis.
decimal(size,d)
numeric(size,d)
Hold numbers with fractions. The maximum number of digits are specified in "size". The maximum number of digits to the right of the decimal is specified in "d".
char(size) Holds a fixed length string (can contain letters, numbers, and special characters). The fixed size is specified in parenthesis.
varchar(size) Holds a variable length string (can contain letters, numbers, and special characters). The maximum size is specified in parenthesis.
date(yyyymmdd) Holds a date


Create Index

Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries.

Note: Updating a table containing indexes takes more time than updating a table without, this is because the indexes also need an update. So, it is a good idea to create indexes only on columns that are often used for a search.

A Unique Index

Creates a unique index on a table. A unique index means that two rows cannot have the same index value.

CREATE UNIQUE INDEX index_name
ON table_name (column_name)

The "column_name" specifies the column you want indexed.

A Simple Index

Creates a simple index on a table. When the UNIQUE keyword is omitted, duplicate values are allowed.

CREATE INDEX index_name
ON table_name (column_name)

The "column_name" specifies the column you want indexed.

Example

This example creates a simple index, named "PersonIndex", on the LastName field of the Person table:

CREATE INDEX PersonIndex
ON Person (LastName)

If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name:

CREATE INDEX PersonIndex
ON Person (LastName DESC)

If you want to index more than one column you can list the column names within the parentheses, separated by commas:

CREATE INDEX PersonIndex
ON Person (LastName, FirstName)