1.what is a commit interval and explain the types?
2. Logical Data Modeling
3. Physical Data Modeling
Logical vs. Physical Data Modeling
Dimensional Data Modeling
Types of Dimensional modeling
A commit interval is the interval at which power center server commits data to targets during a session. The commit interval the number of rows you want to use as a basis for the commit point.
Target Based commit: The power center server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
2.Ddifference between connected and unconnected lookup?
Connected lookup
|
Unconnected lookup
|
Receives input values directly from the pipe line.
|
Receives input values from the result of a clkp expression in a another transformation.
|
U can use a dynamic or static
Cache
|
U can use a static cache
|
Cache includes all lokkup columns used in the mapping(that is lookup table columns included in the lookup condition and lookup table columns linked as output ports to other transformations)
|
Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
|
Can return multiple columns from the same row or insert into the dynamic lookup cache.
|
Designate one return port(R).Returns one column from each row.
|
If there is no match for the lookup condition, the informatica server returns the default value for all output ports.If u configure dynamic caching the informatica server inserts rows into the cache.
|
If there is no matching for the lookup condition the informatica server returns NULL
|
Pass multiple output values to another transformatnion.Link lookup/output ports to another transformation
|
Pass one output value to another transformation.The lookup/output/return port passes the same value to the ---------------------------------------------------------
|
Supports user-defined default values.
|
Does not support user-defined default values.
|
6 3.explain index cache and data cache?
The informatica server stores conditions values in the index cache and output values in the data cache.
4.explain a work flow process?
The power center server uses both process memory and system shared memory to perform these tasks.
Load manager process: stores and locks the workflow tasks and start the DTM run the sessions.
Data Transformation Process DTM: Perform session validations,create threads to initialize the session,read,write and transform data, and handle pre and post session operations.
The default memory allocation is 12,000,000 bytes.
5.How the informatica server sorts the string values in rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string.
6.can you use the mapping parameters or variables created in one mapping into another mapping?
NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables.
7.Difference between the source filter and filter?
Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source.
8.What is data driven?
The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping.
9.Different Between OLTP and OLAP
OLTP
|
OLAP
| |
1
|
Application Oriented (e.g., purchase order it is functionality of an application)
|
Subject Oriented (subject in the sense customer, product, item, time)
|
2
|
Used to run business
|
Used to analyze business
|
3
|
Detailed data
|
Summarized data
|
4
|
Repetitive access
|
Ad-hoc access
|
5
|
Few Records accessed at a time (tens), simple query
|
Large volumes accessed at a time(millions), complex query
|
6
|
Small database
|
Large Database
|
7
|
Current data
|
Historical data
|
8
|
Clerical User
|
Knowledge User
|
9
|
Row by Row Loading
|
Bulk Loading
|
10
|
Time invariant
|
Time variant
|
11
|
Normalized data
|
De-normalized data
|
12
|
E – R schema
|
Star schema
|
10.What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
ü It provides a central database for decision support throughout the enterprise
ü It is a collection of DATAMARTS
DATAMART
ü It is a subset of Datawarehousing
ü It is a subject oriented database which supports the needs of individuals depts. in an organizations
ü It is called high performance query structure
ü It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
ü It is defined as an integrated view of operational database designed to support operational monitoring
ü It is a collection of operational data sources designed to support Transaction processing
ü Data is refreshed near real-time and used for business activity
ü It is an intermediate between the OLTP and OLAP which helps to create an instance reports
11.Data Modeling Types:
ü Conceptual Data Modeling
ü Logical Data Modeling
ü Physical Data Modeling
ü Dimensional Data Modeling
1. Conceptual Data Modeling
ü Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE
ü Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.
ü Conceptual data modeling gives an idea to the functional and technical team about how business requirements would be projected in the logical data model.
ü This is the actual implementation and extension of a conceptual data model. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.
ü Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and demoralization are important parameters of a physical model.
Logical Data Model
|
Physical Data Model
|
Represents business information and defines business rules
|
Represents the physical implementation of the model in a database.
|
Entity
|
Table
|
Attribute
|
Column
|
Primary Key
|
Primary Key Constraint
|
Alternate Key
|
Unique Constraint or Unique Index
|
Inversion Key Entry
|
Non Unique Index
|
Rule
|
Check Constraint, Default Value
|
Relationship
|
Foreign Key
|
Definition
|
Comment
|
ü Dimension model consists of fact and dimension tables
ü It is an approach to develop the schema DB designs
ü Star schema
ü Snow flake schema
ü Star flake schema (or) Hybrid schema
ü Multi star schema
12.what are Important aspects of Star Schema & Snow Flake Schema ?
ü In a star schema every dimension will have a primary key.
ü In a star schema, a dimension table will not have any parent table.
ü Whereas in a snow flake schema, a dimension table will have one or more parent tables.
ü Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
ü Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies help to drill down the data from topmost hierarchies to the lowermost hierarchies.
13.what are the Types of Facts Tables?
- Additive - Measures that can be summed up across all dimensions.
- Ex: Sales Revenue
- Semi Additive - Measures that can be summed up across few dimensions and not with others
- Ex: Current Balance
- Non Additive - Measures that cannot be summed up across any of the dimensions.
- Ex: Student attendance
14.WHY NEED STAGING AREA FOR DWH?
ü Staging area needs to clean operational data before loading into data warehouse.
ü Cleaning in the sense your merging data which comes from different source.
ü It’s the area where most of the ETL is done
Data Cleansing
ü It is used to remove duplications
ü It is used to correct wrong email addresses
ü It is used to identify missing data
ü It used to convert the data types
ü It is used to capitalize name & addresses.
Informatica Architecture
The Power Center domain
It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from clients and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources, processes it as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back to the requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power Center installation.
Thanks
Ur's Hari
If you like this post, please share it by clicking on g+1 Button.
No comments:
Post a Comment