25+ IBM [ Netezza ] Interview Questions & Answers [ STEP-IN ]
Netezza Interview Questions and Answers

25+ IBM [ Netezza ] Interview Questions & Answers [ STEP-IN ]

Last updated on 04th Jul 2020, Blog, Interview Questions

About author

Ramki (Sr Associate Manager )

(5.0) | 16547 Ratings 1787

IBM Netezza (pronounced ne-teez-a) is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

1) What is real-time processing?

Ans:

Data sources such as JMS, WebSphere MQ, TIBCO, webMethods, MSMQ, SQP, and webservices can publish data in real-time. These real-time sources can be leveraged by Informatica Power Centre to process data on-demand. A session can be specifically configured for real-time processing.

2) Explain how data gets stored in Netezza and how does SPU failover take place?

Ans:

Data is stored based on a selected field(s) which are used for distibution.

==Data (A)==> Hash Function (B) ==> Logical SPU identifier list (C) ==> Physical SPU list (D) ==> Storage (E)

When data arrives, it is hased based on field(s) and a hash function (B) is used for this purpose. For example, for a hypothetical 32 node system system, logical SPU identifier list has 32 unique entries. If there are 1000 hashed data items from (B), there are 1000 entries in (C), all having only 32 SPU entries (a number of data items go to the same SPU, thus multiple (B) entries map to same (C)). For instance, (C) has values [3,19,30,7,20,25,11,3,22,19….]. This way, 1000 data entries are mapped. (D) has physical IP address of both primary and failover SPU. If there is a failover, this is the only place where Netezza need to update its entries. Same goes for a system that has a new SPU added. It is a little complicated, in principle, this is the concept.

3) What is a real-time processing terminating condition?

Ans:

A real-time processing terminating condition determines when the Integration Service stops reading messages from a real-time source and ends the session.

4) What are 4 environment variables that are required. What are different states on Netezza?

Ans:

Environment variables: NZ_HOST, NZ_DATABASE, NZ_USER and NZ_PASSWORD

-Online: Normal or usual state.

-Stopped: Netezza will shutdown after completing current queries, no new queries allowed.

-Offline: Waits for completion of current queries, new or queries in queue receive error.

-Paused: Same as above, but no error displayed. Typically caused during Netezza boot or startup.

-Down: Just plain down, could be due to Netezza server problem or user initiated.

5) What is a pipeline partition and how does provide a session with higher performance?

Ans:

Within a mapping, a session can break apart different source qualifier to target pipelines into their own reader/transformation/writer thread(s). This allows the Integration Service to run the partition in parallel with other pipeline partitions in the same mapping. The parallelism creates a higher performing session.

6) What is the maximum number of partitions that can be defined for in a single pipeline?

Ans:

You can define up to 64 partitions at any partition point in a pipeline.

7) What is a dynamic session partition?

Ans:

A dynamic session partition is where the Integration Service scales the number of session partitions at runtime. The number of partitions is based on a number of factors including number of nodes in a grid or source database partitions.

8) List three dynamic partitioning configurations that cause a session to run with one partition?

Ans:

Set dynamic partitioning to the number of nodes in the grid, and the session does not run on a grid.

– Create a user-defined SQL statement or a user-defined source filter.

-Use dynamic partitioning with an Application Source Qualifier.

9) List the different types of pushdown optimization that can be configured?

Ans:

Source-side pushdown optimization – The Integration Service pushes as much transformation logic as possible to the source database.

Target-side pushdown optimization – The Integration Service pushes as much transformation logic as possible to the target database.

Full pushdown optimization – The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization

10) Does Netezza support concurrent update of the same record?

Ans:

In case of conflict in which the same record is set for modification, Netezza rolls back recent transaction that is attempted on the same record, in fact same table. This is generally acceptable in DW environments. Netezza does support serialization transactions and does not permit dirty reads.

11) What happens to records that are loaded during nzload process, but were not committed.

Ans:

They are logically deleted and administrator can run nzreclaim, we may also truncate table.

12) What is a control task?

Ans:

A control task is used to alter the normal processing of a workflow by stopping, aborting, or failing a workflow or worklet.

13) Can a group become a member of another group in Netezza user administration. Can we use same group name for databases?

Ans:

In Netezza, public group is created automatically and every one is a member of this group by default. We can create as many groups and any user can be member of any group(s). Group can not be a member of another group. Group names, user names and database names are unique. That is, we can not have a database called sales and a group also called sales.

14) How can we give a global permission to user joe so that he can create table in any database?

Ans:

Login into system database and give that permission to user by saying “grant create table to joe;”

15) What permission will you give to connect to a database?

Ans:

List.  Grand list, select on table to public (if logged into sales database, this allows all users to query tables in sales database).

16) Do we need to drop all tables and objects in the database, before dropping a database?

Ans:

No, drop database will take care of it.

17) What constraints on a table are enforced?

Ans:

Not null and default. Netezza does not apply PK and FK.

18) Why NOT NULL specification is better in Netezza?

Ans:

Specifying not null results in better performance as NULL values are tracked at rowheader level. Having NULL values results in storing references to NULL values in header. If all columns are NOT NULL, then there is no record header.

19) Why do you prefer truncate instead of drop table command?

Ans:

Just empties data from table, keeping table structure and permission intact.

20) When no distribution clause is used while creating a table, what distribution is used by Netezza.

Ans:

First column (same as in Teradata).

    Subscribe For Free Demo

    21) Can we update all columns in a Netezza table?

    Ans:

    No, the column that is used in distribution clause cannot be used for updates. Remember, up to four columns can be used for distribution of data on SPU. From practical sense, updating distribution columns result in redistribution of data; the single most performance hit when large table is involved. This restriction makes sense.

    22) What is data slice and SPU?

    Ans:

    For me, they are the same! Of course, this answer is not accurate reply in your interview(s).

    23) What data type works best for zone maps?

    Ans:

    Zone maps work best for integer data types.

    24) What feature in Netezza you do not like?

    Ans:

    Of course, a large list, especially when compared to Oracle. PK and FK enforcement is a big drawback though this is typically enforced at ETL or ELT process [ELT: Extract, Transform and Load. Note that ‘Transform’ and ‘Load’ can happen within Netezza].

    25) List three real-time processing terminating conditions?

    Ans:

    Idle time – Time Integration Service waits to receive messages before it stops reading from the source.

    Message count – Number of messages the Integration Service reads from a real-time source before it stops reading from the source.

    Reader time limit – Amount of time in seconds that the Integration Service reads source messages from the real-time source before it stops reading from the source.

    26) What are the data warehouse appliances you know?

    Ans:

    • Netezza
    • Greenplum
    • Vertica
    • Teradata
    • Dataupia
    • Oracle Exadata

    27) What are the environment variables that are required to connect to netezza?

    Ans:

    The environment variables required are: NZ_HOST, NZ_DATABASE, NZ_USER, NZ_PASSWORD

    28) What are the different states of Netezza?

    Ans:

    • Online: Normal or usual state.
    • Stopped: Netezza will shut down after completing current queries, no new queries allowed.
    • Offline: Waits for completion of current queries, new queries in queue receives error.
    • Paused: Same as above, but no error displayed. Typically caused during Netezza boot up or startup.
    • Down: Just plain down, could be due to Netezza server problem or user initiated.

    29) What are the constraints on a table are enforced?

    Ans:

    The only constraint netezza supports is Not null. There are no primary key and foreign key constraints in netezza.

    30) Can you insert duplicate rows in netezza table?

    Ans:

    Yes. As there are no primary key constraints in netezza you can insert duplicate rows.

    Course Curriculum

    Get Hands-On Practical Netezza Training to Advance Your Career

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    31) How the NOT NULL specification on a column improves the netezza performance?

    Ans:

    Specifying Not Null on each column in table results in better performance. Netezza tracks the NULL values at rowheader level. Having NULL values results in storing references to NULL values in header. If all columns are NOT NULL, then there is no record header.

    32) How FPGA can be helpful in improving query performance?

    Ans:

    While reading data from the disk, the Field Programmable Gate Array (FPGA) on each SPU filters out unwanted data. This process of data elimination removes IO bottlenecks and frees up downstream components such as the CPU, memory and network from processing extra data.

    33) What is a snippet?

    Ans:

    A snippet is a small unit of work that are carried out in SPU.

    34).What are zonemaps?

    Ans:

    An extent is the smallest unit of disk allocation on a SPU. Zonemaps is internal mapping structures to the extents that take advantage of the internal ordering of data to eliminate extents that do not need to be scanned. Zonemaps transparently avoid scanning of unreferenced rows. Zonemaps are created for every column in the table and contain the minimum and maximum values for every extent.

    35) How the zonemaps are created and updated?

    Ans:

    Zonemaps are created and refreshed for every SPU when you Generate statistics, Nzload operation, Insert, update operations, Nzreclaim operation.

    36) What is generate statistics and generate express statistics OR what is the difference between generate statistics and generate express statistics?

    Ans:

    • Generate statistics is used to gather statistics about each table column’s proportion of duplicate values, minimum values, maximum values, null values, unique values and updates the system catalog tables.
    • The difference between ‘generate statistics’ and ‘generate express statistics’ is based on how the column uniqueness is calculates. The ‘generate express statistics’ calculates estimated dispersion values based on the sampling of rows in the table. ‘Generate express statistics’ uses approximation in generating the stats where as ‘generate statistics’ uses all the rows in the table.

    37) What is the use of creating materialized views?

    Ans:

    A materialized view reduces the width (number of columns) of data being scanned in the base table by creating a thin version (fewer columns) of the base table that contains a small subset of frequently queried columns.

    38) What is the distribution of materialized views?

    Ans:

    A materialized view has the same distribution key as the base table.

    39) What are the limitations of materialized views?

    Ans:

    • You cannot insert, update, delete or truncate a materialized view. Any changes on the base tables will reflect into materialized views.
    • You can specify only one base table in the from clause.
    • Base table can’t be a external table, system table or a temporary table.
    • You cannot use a where clause in the materialized view.
    • Expressions are not allowed as columns.

    40) What are the best practices of creating materialized views?

    Ans:

    • Create materialized views with few columns which are frequently queried.
    • Specify order by clause on the most restrictive columns (columns used in where clause).
    • Periodically or manually refresh the materialized views.

    41) Explain FPGA and how is it useful for query performance.

    Ans:

    FPGA: Field Programmable Gate Array (FPGA) is located on each SPU. Netezza is different from other architectures. Netezza can do a “hardware upgrade” through software by using FPGA. Hardware is reconfigured during install.

    While reading data from disk, FPGA on each SPU also helps in ‘filtering’ unnecessary data before getting loaded into memory on each SPU. This way, FPGA does not overwhelm with all the data from disk.

    42) What are different ways to load?

    Ans:

    1. nzload
    2. External tables
    3. Create table AS (aka, CTAS).
    4. Inserts (Eeeewee!!)

    43) Does everything gets cached in Netezza (or any other data appliance).

    Ans:

    Typically only schema and other database objects are cached in appliances. Data is not cached, in general. In most cases, data is not saved any where (in any cache or on host computer) and is streamed directly from SPU to client software.

    44) When are we likely to receive incorrect (aggregate) results.

    Ans:

    Very rarely a driver may return aggregated results that are still getting processed back to client. In this case, client may assume that calculation is complete, instead of updating with latest or final results. Obviously, driver has to wait for Netezza to complete operation on host computer, before delivering results.

    45) Explain how data gets stored in Netezza and how does SPU failover take place.

    Ans:

    Data is stored based on a selected field(s) which are used for distibution.

    ==Data (A)==> Hash Function (B) ==> Logical SPU identifier list (C) ==> Physical SPU list (D) ==> Storage (E)

    When data arrives, it is hased based on field(s) and a hash function (B) is used for this purpose. For example, for a hypothetical 32 node system system, logical SPU identifier list has 32 unique entries. If there are 1000 hashed data items from (B), there are 1000 entries in (C), all having only 32 SPU entries (a number of data items go to the same SPU, thus multiple (B) entries map to same (C)). For instance, (C) has values [3,19,30,7,20,25,11,3,22,19….]. This way, 1000 data entries are mapped. (D) has physical IP address of both primary and failover SPU. If there is a failover, this is the only place where Netezza need to update its entries. Same goes for a system that has a new SPU added. It is a little complicated, in principle, this is the concept.

    46) what are 4 environment variables that are required. What are different states on Netezza.

    Ans:

    Environment variables: NZ_HOST, NZ_DATABASE, NZ_USER and NZ_PASSWORD

    • Online: Normal or usual state.
    • Stopped: Netezza will shutdown after completing current queries, no new queries allowed.
    • Offline: Waits for completion of current queries, new or queries in queue receive error.
    • Paused: Same as above, but no error displayed. Typically caused during Netezza bootup or startup.
    • Down: Just plain down, could be due to Netezza server problem or user initiated.

    47) How Netezza updates records. Give an idea of how transactions are maintained and how read consistency is maintaned.

    Ans:

    Netezza does not update records in place, it marks records with delete flag. In fact, each record contains two slots, one for create xid another for delete xid. Delete xid allows us to mark a record with current transaction for deletion, up to 31 transactions are allowed in Netezza for all tables. As noted earlier, only one update at a time allowed on the same table though. Here update refers to transactions that are not committed yet. Coming back to delete xid, this is how Netezza maintains transaction roll back and recovery. Once a record is modified, it’s delete xid is given transaction id; this is changed from previous value of 0, all records when loaded will contain 0 for delete xid. Note that FPGA uses its intelligence to scan data before delivering them to host or applications.

    Sample data:

    [ROW id][Create xid][Delete xid]

    [R1][T1][0]                       // First time a record is loaded, record R1

    // After some time, updating the same record

    [R1][T1][T33]                  // Record R1 is updated; note T33

    [R33][T33][0]                 // New update record R33; similar to a new record this has zero for Delete Xid

    If the record is deleted, simply deletion xid will contain that transaction id.

    • Based on the above, how do you know a record is the latest. It has zero in delete xid flag.
    • Extending same logic, how do we know a record is deleted. It has non zero value in delete xid flag.
    • How do you roll back to transaction. Follow similar to above listing, we can roll back a transaction of our interest.

    Note that transaction id is located in create xid flag and that is our point of interest in this case. From what I know, row id and create id is never modified by Netezza.

    48) Can a group become a member of another group in Netezza user administration. Can we use same group name for databases.

    Ans:

    In Netezza, public group is created automatically and every one is a memeber of this group by default. We can create as many groups and any user can be member of any group(s). Group can not be a member of another group. Group names, user names and database names are unique. That is, we can not have a database called sales and a group also called sales.

    49) How can we give a global permission to user joe so that he can create table in any database.

    Ans:

    Login into system database and give that permission to user by saying “grant create table to joe;”

    50) What permission will you give to connect to a database.

    Ans:

    List.  Grant list, select on table to public (if logged into sales database, this allows all users to query tables in sales database).

    Course Curriculum

    Enroll in Best Netezza Certification Courses and Get Hired By TOP MNCs

    Weekday / Weekend BatchesSee Batch Details

    51) Do we need to drop all tables and objects in that database, before dropping a database.

    Ans:

    No, drop database will take care of it.

    52) What constraints on a table are enforced.

    Ans:

    Not null and default. Netezza does not apply PK and FK.

    53) Create Table AS (CTAS), does it distribute data randomly or based on table on which it received data.

    Ans:

    Response: Newly created table from CTAS gets distribution from the original table.

    54) Why do you prefer truncate instead of drop table command.

    Ans:

    Just empties data from table, keeping table structure and permission intact.

    55) Can we update all columns in a Netezza table.

    Ans:

    No, the column that is used in distribution clause cannot be used for updates. Remember, up to four columns can be used for distribution of data on SPU. From practical sense, updating distribution columns result in redistribution of data; the single most performance hit when large table is involved. This restriction makes sense.

    56) What is dataslice and SPU.

    Ans:

    For me, they are the same! Of course, this answer is not accurate reply in your interview(s).

    57) What data type works best for zone maps.

    Ans:

    Zone maps work best for integer data types.

    58) What is clustering in Netezza?

    Ans:

    Clustering is nothing but grouping of similar things.  In Netezza, we have 2 SMP hosts. These 2 hosts are part of a cluster. In this cluster, one host is active and another host is passive.

    59) List few back slash commands in Netezza

    Ans:

    Back slash commands: To avoid basic day to day SQL

    For example: \du is to display users. \l is to list the database users.

    60) What is the difference between “nzsql” and “nzsql -E”

    Ans:

    -E ==> To see the underlying query generated by back slash command.

    List 2 default databases in Netezza

    SYSTEM, MASTER_DB

    MASTER_DB: Is template database; helps us to create new databases faster.

    Netezza Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    61) What is the extent and page size in Netezza?

    Ans:

    Extent size: 3MB

    Page size: 128 KB

    Note: The above are fixed. we can’t change them.

    62) What are various maximums in Netezza?

    Ans:

    Columns: 1600 per table or view

    Maximum columns per distribution: 4

    Max Length of database and column names: 128 bytes

    Max number of characters in a char/varchar field: 64,000

    Max connections to the server: 2000. Default: 500

    Row size: 65,535 bytes

    63)   What are cluster base tables (CBT’s) in Netezza? And what are benefits of CBTs?

    Ans:

    A clustered base table (CBT) is a user table that contains data which is organized using one to four organizing keys. An organizing key is a column of the table that you specify for clustering the table records.

    Netezza uses the organizing keys to group records within the table and save them in the same or nearby extents. Netezza also creates zone maps for the organizing columns to accelerate the performance of queries on that table that restrict using the organizing keys.

    Benefits of CBTs

    • CBTs support “multi-dimension” lookups where you can organize records by one, two, Three or four lookup keys. if your queries commonly restrict on transaction type and store ID, you can organize records using both of those keys to improve query performance.
    • CBTs improve query performance by adding more zone maps for a table because the Organizing key columns are also zone mapped (if the organizing column data type supports zone maps). 
    • CBTs increase the supported data types for zone-mapped columns, thus allowing you to Improve performance for queries that restrict along multiple dimensions.
    • CBTs allow you to incrementally organize data within your user tables in situations where data cannot easily be accumulated in staging areas for pre-ordering before insertions/loads. CBTs can help you to eliminate or reduce pre-sorting of new table records prior to a load/insert operation. 
    • CBTs save disk space. Unlike indexes, materialized views and other auxiliary data structures, CBTs do not replicate the base table data and do not allocate additional data structures.

    64)    What is FPGA and how it is useful for query performance in Netezza?

    Ans:

    Field Programmable Gate Array (FPGA) is Netezza proprietary hardware installed on each SPU. The FPGA on each SPU filters out unwanted data, when it reads data from the disk. This process of data elimination removes IO bottlenecks and frees up downstream components such as the CPU, memory and network from processing extra data and hence the query performance increases.

    65)    What is the use of materialized views?

    Ans:

    A materialized view reduces the width (number of columns) of data being scanned in the base table by creating a thin version (fewer columns) of the base table that contains a small subset of frequently queried columns.

    66)    What are the best ways of creating and using materialized views?

    Ans:

    There are some best way to make best use of materialized views

    • Create materialized views with few columns which are frequently queried.
    • Specify order by clause on the most restrictive columns i.e. columns used in where clause.
    • Periodically or manually refresh the materialized views.

    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free