Informatica Real Time Scenarios-By Hari Babu Naidu.Daddala: Interview Questions Set 5

101.What is the difference between an expression transformation and an aggregator transformation?
Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups.
102.What is a source qualifier? What is meant by Query Override?

Source Qualifier represents the rows that the Power Center Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation.

Power Center Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query.

8 103. What are the contents in a session log in general?

- Allocation of system shared memory

- Execution of Pre-session commands/ Post-session commands

- Session Initialization

- Creation of SQL commands for reader/writer threads

- Start/End timings for target loading

- Error encountered during session

- Load summary of Reader/Writer/ DTM statistics

104.what are the types of partitioning in informatica?

I. Round-robin partitioning. The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.

II. Hash partitioning. The Informatica Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the Informatica Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the Informatica Server processes groups of rows with the same partition key in the same partition.

III. Key range partitioning. You specify one or more ports to form a compound partition key. The Informatica Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.

IV. Pass-through partitioning. The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

105.How will you optimize a Lookup transformation?

I. Be sure to delete the unwanted columns from the lookup as they affect the lookup cache very much.

II. If the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier

III. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in the cache, it swaps the data from the file to the cache and keeps doing this until it finds the data. This is quite expensive for obvious reasons being an I/O operation. Increase the cache so that the whole data resides in the memory

IV. Sequential and Concurrent caches: The 8.x version of PowerCenter gives us this wonderful option to build the caches of the lookups either concurrently or in a sequential manner depending on the business rule. If no business rule dictates otherwise, concurrent cache building is a very handy option.

106.What is the difference between a connected look up and unconnected look up?

Connected Lookup	Unconnected Lookup
Connected lookup takes input values directly from other transformations in the pipeline.	Unconnected lookup doesn’t take inputs directly from any other transformation, but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping.
Dynamic caching can be used	Dynamic caching is not applicable
Returns default value when there is no matching record	Returns NULL when there is no matching record
Returns as many columns in the lookup table	Only one return port

107.What is the use of update override query in the target instance?

I. The update can be performed base on a target column value.

II. The update can be performed without defining a key column in informatica and in the database

108.What is a data warehouse?

It is a process for building decision support systems and knowledge management enviroment that supports

both day-to-day tactical decision making and long-term business strategies.

Bill Inmon’s definition: "Subject-oriented, integrated, time variant, non-volatile collection of data in support

of management's decision making process."

Ralph Kimball’s definition: "A copy of transaction data specifically structured for query and analysis."

109. list out what are active t/r and what are passive t/r?

Aggregator - Active & Connected

source qualifier - Active & Connected

Filter - Active & Connected

Expression - Passive & Connected

joiner - Active & Connected

lookup - passive & connected/unconnected

HTTP - passive & connected

normalizer - active & connected

rank - active & connected

router - active & connected

sequence - passive & connected

sorter - active & connected

stored procedure - passive & connected/unconnected

union - active & connected

Java – active & connected

SQL – active/passive & connected

110. What is the use of index, what is the impact in the performance when DML operation is performed in the indexed table?

Oracle uses indexes to avoid the need for large-table, full-table scans and disk sorts, which are required when the SQL optimizer cannot find an efficient way to service the SELECT query

The performance will be less when the DML operations like INSERT, UPDATE, DELETE are carried out.

111.What are the transformations cannot be used upstream from a union transformation?

Sequence generator and Update strategy

112.Configuring Mapping variables and how to use it ?

Go to the tab Mapping-->parameter and variables-->$$variable_name

There are 3 aggregation types, namely, COUNT, MIN, MAX

Once the variable is created, it can be used in the expression transformation with the functions like SETCOUNTVARIABLE($$variable_name), SETMAXVARIABLE( $$Variable_name, value ), SETVARIABLE ($$Variable_name, value)

SETCOUNTVARIABLE($$variable_name)

Counts the rows evaluated by the function and increments the current value of a mapping variable based on the count. Increases the current value by one for each row marked for insertion. Decreases the current value by one for each row marked for deletion. Keeps the current value the same for each row marked for update or reject.

SETMAXVARIABLE( $$Variable_name, value )

Sets the current value of a mapping variable to the higher among the two values mentioned above. The function executes only if a row is marked as insert.

SETVARIABLE ($$Time, SYSDATE)

Use with mapping variables with Max/Min aggregation type. The SETVARIABLE function executes only if a row is marked as insert or update.

113.what are Error Handling when Pushdown Optimization is enabled?

When the Integration Service pushes transformation logic to the database, it cannot track errors that occur in the database.

When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

If we configure a session for full pushdown optimization and the session fails, the Integration Service cannot perform incremental recovery because the database processes the transformations. Instead, the database rolls back the transactions. If the database server fails, it rolls back transactions when it restarts. If the Integration Service fails, the database server rolls back the transaction.

114.what are the general Some error codes shown in the session log?

BR - error related reader process, including ERP, flat file, relation sources

CMN - error related databases, memory allocation, lookup, joiner and internal errors

DBGR - error related to debugger

SE, TM, WID - errors related to transformations

PMF - error related caching in aggregator, lookup, joiner, rank

RR - error related to relational sources

EP - error related to external procedure

LM - error related Load Manager

REP - error related to repository functions

VAR - error related to mapping variable

WRT - error related to writer

115.What are the two types of errors that occur in a session run?

Fatal and Non-Fatal errors

Non-Fatal Errors: It is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session property tab with the “stop on errors” option. When you enable this option, the server counts Non-Fatal errors that occur in the reader, writer and transformations.

I. Reader errors can include alignment errors while running a session in Unicode mode.

II. Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and database errors.

III. Transformation errors can include conversion errors and any condition set up as an ERROR, Such as NULL Input.

Fatal Errors: This occurs when the server cannot access the source, target or repository. This can include loss of connection or target database errors, such as lack of database space to load data.

If the session uses normalizer (or) sequence generator transformations, the server cannot update the sequence values in the repository, and a fatal error occurs.

116.What are the different types of commit interval?

Target based commit and Source based commit

(a) Target based commit

- Server commits data based on the no of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.

- During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the buffer block is full, the Informatica server issues a commit command. As a result, the amount of data committed at the commit point generally exceeds the commit interval.

- The server commits data to each target based on primary/foreign key constraints.

(b) Source based commit

- Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.

- During a session, the server commits data to the target based on the number of rows from an active source in a single pipeline. The rows are referred to as source rows.

- A pipeline consists of a source qualifier and all the transformations and targets that receive data from source qualifier.

- Although the Filter, Router and Update Strategy transformations are active transformations, the server does not use them as active sources in a source based commit session.

- When a server runs a session, it identifies the active source for each pipeline in the mapping. The server generates a commit row from the active source at every commit interval.

- When each target in the pipeline receives the commit rows the server performs the commit.

117.Explain about Reject Loading?

I. During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target rejects data, the server writes the rejected row into the reject file.

II. You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data into a flat file target)

III. Reading Rejected data:

Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00

There are two main things, Row indicator and Column indicator

Row indicator: Row indicator tells the writer, what to do with the row of wrong data.

Row indicator	Meaning	Rejected By
0	Insert	Writer or target
1	Update	Writer or target
2	Delete	Writer or target
3	Reject	Writer

Column indicator: Column indicator is followed by the first column of data, and another column indicator. They appear after every column of data and define the type of data preceding it

Column Indicator	Meaning	Writer Treats as
D	Valid Data	Good Data. The target accepts it unless a database error occurs, such as finding a duplicate key
?	Overflow	Bad Data
N	Null	Bad Data
T	Truncated	Bad Data

118.Explain about the various threads created during a session run?

Various threads	functions
Master thread	Handles stop and abort requests from load manager
Mapping thread	One thread for each session. Fetches session and mapping information. Compiles mapping. Cleans up after execution
Reader thread	One thread for each partition. Relational sources use relational threads and Flat files use file threads
Writer thread	One thread for each partition writes to target
Transformation thread	One or more transformation for each partition

119. How to setup Event Raise and Event Wait tasks?

Ans:

I. Create an event - right click the workflow and edit-> events tab -> create an event

II. Place Eventraise task in the workflow space and give the created event name in the "user defined event" (properties tab)

III. Place eventwait task in the workflow space-> there are 2 options in events tab,

Pre-defined: This is a file watch

User-defined: This is used for the event created on the workflow properties

When the Eventraise is executed, it creates the event, it triggers the event wait to continue

64. Time task:

Ans: You can specify the period of time to wait before the Integration Service runs the next task in the workflow with the Timer task.

Two options:

Absolute time - give the exact time when to start the next task or refer a date-time variable

Relative time - give the hours, minute, seconds - from the start time of this task

- From the start time of the parent workflow/worklet

- From the start time of the top-level workflow

65. Decision task:

Ans: Use the Decision task instead of multiple link conditions in a workflow. Instead of specifying multiple link conditions, use the predefined condition variable in a Decision task to simplify link conditions.

66. Assignment task:

Ans: You can assign a value to a user-defined workflow variable with the Assignment task.

67. CONTROL TASK

Ans: We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition.

Control Option	Description
Fail Me	Fails the control task.
Fail Parent	Marks the status of the WF or worklet that contains the Control task as failed.
Stop Parent	Stops the WF or worklet that contains the Control task.
Abort Parent	Aborts the WF or worklet that contains the Control task.
Fail Top-Level WF	Fails the workflow that is running.
Stop Top-Level WF	Stops the workflow that is running.
Abort Top-Level WF	Aborts the workflow that is running.

220 .

Thanks

Ur's Hari

If you like this post, please share it by clicking on g+1 Button.

https://www.facebook.com/InformaticaDevelopers

Informatica Real Time Scenarios-By Hari Babu Naidu.Daddala

Tuesday, 22 April 2014

Interview Questions Set 5

1 comment:

Follow this blogspot