ETL Interview Questions

1) What is ETL?

In data warehousing architechture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load.  Extract does the process of reading data from a database.  Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.

2) Explain what are the ETL testing operations includes?

ETL testing includes

  • Verify whether the data is transforming correctly according to business requirements
  • Verify that the projected data is loaded into the data warehouse without any truncation and data loss
  • Make sure that ETL application reports invalid data and replaces with default values
  • Make sure that data loads at expected time frame to improve scalability and performance

3) Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?

The types of data warehouse applications are

  • Info Processing
  • Analytical Processing
  • Data Mining

Data mining can be define as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository
4) What are the benefits of data warehousing? 

Historical information for comparative and competitive analysis.

Enhanced data quality and completeness.

Supplementing disaster recovery plans with another data back up source.

5) What are the various tools used in ETL?

  • Cognos Decision Stream
  • Oracle Warehouse Builder
  • Business Objects XI
  • SAS business warehouse
  • SAS Enterprise ETL server

6) What is fact? What are the types of facts?

It is a central component of a multi-dimensional model which contains the measures to be analysed.  Facts are related to dimensions.

Types of facts are

  • Additive Facts
  • Semi-additive Facts
  • Non-additive Facts

7) Explain what are Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data warehouse.  It provides multi-dimensional analysis.

OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional form for reporting purposes.  It consists of facts called as measures categorized by dimensions.

8) Explain what is tracing level and what are the types?

Tracing level is the amount of data stored in the log files.  Tracing level can be classified in two Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose explains the tracing levels at each and every row.

9) Explain what is Grain of Fact?

Grain fact can be defined as the level at which the fact information is stored.  It is also known as Fact Granularity

10) Explain what factless fact schema is and what is Measures?

A fact table without measures is known as Factless fact table.  It can view the number of occurring events. For example, it is used to record an event such as employee count in a company.

The numeric data based on columns in a fact table is known as Measures

11) Explain what is transformation?

A transformation is a repository object which generates, modifies or passes data.  Transformation are of two types Active and Passive

12) Explain the use of Lookup Transformation?

The Lookup Transformation is useful for

  • Getting a related value from a table using a column value
  • Update slowly changing dimension table
  • Verify whether records already exist in the table

13) Why can’t we see the data in dataset?

  •  Dataset file is not a single file.
  •  It is a combination of 3 files, namely data file, descriptor file and control file.
  •  Data file contains the binary form data that is not readable
  •  Descriptor file has the configuration details like no. of partitions
  •  Control file – information about both data file and descriptor file will be controlled
  •  The data can be viewed by using  dataset management that is available in designer tools
    – orchestrate by executing the command ‘ $dump example.ds

14) What are snapshots? What are materialized views & where do we use them? What is a materialized view log?

  •  Snapshots are copies of read-only data of a master table.
  •  They are located on a remote node that is refreshed periodically to reflect the changes made to the master table.
  •  They are replica of tablesViews:
  •  Views are built by using attributes of one or more tables.
  •  View with single table can be updated, whereas view with multiple tables cannot be updated
  • Materialized View log:
  • A materialized view is a pre computed table that has aggregated or joined data from fact tables and dimension tables.
  •  To put it simple, a materialized view is an aggregate table.

15) What is Metadata?

Metadata is defined as the data about data. Metadata describes the entity and attributes description.