[50+] Big Data Greenplum DBA Interview Questions and Answers
Big-Data-Greenplum-DBA-Interview-Questions-and-Answers-ACTE

[50+] Big Data Greenplum DBA Interview Questions and Answers

Last updated on 17th Nov 2021, Blog, Interview Questions

About author

Navarasu Velayutham (GreenPlum DBA )

Navarasu Velayutham is a GreenPlum DBA who has more than 6 years of experience in SQL Databases. He is also an expert in DataStage, Hadoop, Microsoft Power BI, MicroStrategy, OBIEE, and Cognos.

(5.0) | 19847 Ratings 4041

We have collected the most commonly asked Greenplum Interview Questions and Answers will assist you to prepare for the Greenplum viva questions and answers that an interviewer might ask you during your interview. There are a lot of opportunities from many reputed companies in the world. According to research, Greenplum has a market share of about 3.25%. So, You still have opportunities to move ahead in your career in the Greenplum certification guide. ACTE offers advanced Greenplum Interview Questions that help you in cracking your interview & acquiring your dream career as a Greenplum Developer. This complete guide of Greenplum interview questions will encourage you to crack your Job interview easily.


    Subscribe For Free Demo

    1. How to check the distribution policy of test table sales?

    Ans:

      The Describe table sales show the distribution details:

      • psql>d sales
      • Table” public. sales”
      • Column | Type | Modifiers
      • ——–+———+———–
      • id | integer |
      • date | date |
      • Distributed by: (id)

    2. How many user schemas are there in the database?

    Ans:

        Use”dn” at psql prompt.

    3. When was my table last analyzed in the Greenplum database?

    Ans:

      In 4.x check pg_stat_operations for all action-name performed on any object.

      For example, a sales table:

      • gpdb=# select objname,action-name,statime from pg_stat_operations where objname like ‘sales’;
      • objname | action-name | statime
      • ——–+———–+——————————-
      • sales | CREATE | 2010-10-01 12:18:41.996244-07
      • sales | ANALYZE | 2010-10-06 14:38:21.933281-07
      • sales | VACUUM | 2010-10-06 14:38:33.881245

    4. How to check the size of a table?

    Ans:

      Table Level:

        psql> select pg_size_pretty(pg_relation_size(‘schema.tablename’));

      Replace schema.tablename with your search table.

      Table and Index:

        psql> select pg_size_pretty(pg_total_relation_size(‘schema.tablename’));

      Replace schema.tablename with your search table

    5. How to start/stop DB in admin mode?

    Ans:

      Admin Mode Utility mode
      The gpstart with option (-R) stands for Admin mode or restricted mode where only superusers can connect to the database when the database opens using this option.

      Utility mode allows you to connect to only individual segments when started using gpstart -m, for example< to connect to only master instance only:

      PGOPTIONS=’-c gp_session_role=utility’ psql

    6. How to check the Schema size?

    Ans:

      • psql> select schemaname ,round(sum(pg_total_relation_size(schemaname||’.’||tablename))/1024/1024) “Size_MB”
      • from pg_tables where schemaname=’SCHEMANAME’ group by 1;
      • [ Related Article: Analytics Tools in Big Data ]

    7. How to check the database size?

    Ans:

      To see the size of the specific database:

      psql> select pg_size_pretty(pg_database_size(‘DATBASE_NAME’)); Example:
      • gpdb=# select pg_size_pretty(pg_database_size(‘gpdb’));
      • pg_size_pretty
      • —————-
      • 24 MB
      • (1 row)
      • To see all database sizes:
      • psql> select datname,pg_size_pretty(pg_database_size(datname)) from pg_database;

    8. How to check partitioned table size including indexes and partitions?

    Ans:

      Table size with partitions:

      The following SQL gives you employee_dailly table size, which includes partitions.

      • select schemaname,tablename,round(sum(pg_total_relation_size(schemaname || ‘.’ || partitiontablename))/1024/1024) “MB”
      • from pg_partitions where table name=’employee_daily’ group by 1,2;
      • schemaname | tablename | MB
      • ———–+—————-+—–
      • public | employee_daily | 254

    9. How do I get help on the syntax to alter the table?

    Ans:

      In psql session type halter table which will display the syntax:

        gpdb=# h alter table

    10. Define greenplum architecture?

    Ans:

    Greenplum architecture
    Greenplum architecture

    11. How to connect in utility mode?

    Ans:

      From master host

      PGOPTIONS=’-c gp_session_role=utility’ psql -p -h

      Where:

      port is segment/ master database port.

      hostname is segment/master hostname.

    12. Where/How to find db logs?

    Ans:

      Master: Master gpdb log file is located in the $MASTER_DATA_DIRECTORY/pg_log/ directory and the file name depends on the database “log_filename” parameter.

      • $MASTER_DATA_DIRECTORY/pg_log/gpdb-yyyy-mm-dd_000000.csv –>Log file format with default installation.
      • ~pgadmin/gpAdminLogs/ –>gpstart,gpstop,gpstate and other utility logs.

      Segments:

      primary segments run below SQL to see log file location:

      • select dbid,hostname,datadir||’/pg_log’ from gp_configuration where content not in (-1) and is primary is true;
      • Mirror Segments run below SQL to see log file location:
        select dbid,hostname,datadir||’/pg_log’

    13. How to see the list of available functions in Greenplum DB?

    Ans:

      df schema name.function name (schemaname and function name support wildcard characters)

      • test=# df pub*.*test*
      • List of function
      • s
      • Schema | Name | Result data type | Argument data types
      • ——–+————-+——————+———————
      • public | bugtest | integer |
      • public | test | boolean | integer
      • public | test | void |
      • (3 rows)

    14. How to check whether Greenplum server is up and running?

    Ans:

      The gpstate is the utility to check gpdb status.

      Use gpstate -Q to show a quick status. Refer to gpstate –help for more options.

      Sample output:

      • [gpadmin@stinger2]/export/home/gpadmin>gpstate -Q
      • gpadmin-[INFO]:-Obtaining GPDB array type, [Brief], please wait…
      • gpadmin-[INFO]:-Obtaining GPDB array type, [Brief], please wait…
      • gpadmin-[INFO]:-Quick Greenplum database status from Master instance only
      • gpadmin-[INFO]:———————————————————-
      • gpadmin-[INFO]:-GPDB fault action value = readonly
      • gpadmin-[INFO]:-Valid count in status view = 4
      • gpadmin-[INFO]:-Invalid count in status view = 0
      • gpadmin-[INFO]:———————————————————-

    15. How to recover an invalid segment?

    Ans:

      3.3.X
      4.0.x
      Without “-F” option – The first files will be compared, the difference found and only different files will be synched (the -first stage could last a long time if there are too many files in the data directory).Without the “-F” option – The change tracking log will be sent and applied to the mirror.
      With the “-F” option – The entire data directory will be resynched.With the “-F” option – The entire data directory will be resynched.

    16. How to create a Database?

    Ans:

      There are two ways to create a gpdb database using psql session or the Greenplum createdb utility:

      Using psql session:

      gpdb=# h create the database

      Command: CREATE DATABASE

      Description: create a new database

      Syntax:

        CREATE DATABASE name [ [ WITH ] [ OWNER [=] db_owner ] [ TEMPLATE [=] template ] [ ENCODING [=] encoding ] [ TABLESPACE [=] tablespace ] [ CONNECTION LIMIT [=] connlimit ] ] Using createdb utility:

      Usage: $GPHOME/bin/createdb –help

      createdb [OPTION]… [DBNAME] [DESCRIPTION]

      Options:

      -D, –tablespace=TABLESPACE default tablespace for the database

      -e, –echo shows the commands being sent to the server

      -E, –encoding=ENCODING encoding for the database

      -O, –owner=OWNER database user to own the new database

      -T, –template=TEMPLATE template database to copy

      –help show this help, then exit

      –version output version information, then exit

    17. How do I get a list of databases in a Greenplum cluster?

    Ans:

      gpdb=# l (lowercase letter “l”)

      List of databases

      Name | Owner | Encoding

      ——{}———-

      gpdb | gpadmin | UTF8

      pperfmon | gpadmin | UTF8

      postgres | gpadmin | UTF8

      template0 | gpadmin | UTF8

      template1 | gpadmin | UTF8

      Check below SQL for more details on dbs.

      gpdb=# select * from pg_database;

    18. How to delete/drop an existing database in Greenplum?

    Ans:

      • gpdb=# h DROP Database
      • Command: DROP DATABASE
      • Description: remove a database
      • Syntax:DROP DATABASE [ IF EXISTS ] name

      Also check dropdb utility:

      • $GPHOME/bin/dropdb –help
      • dropdb removes a PostgreSQL database.
      • Usage:
      • dropdb [OPTION]… DBNAME

    19. Where can I get help on Postgres psql commands?

    Ans:

      In psql session

      “ ?” – for all psql session help

      “h ” For any SQL syntax help.

    20. What is the greenplum database?

    Ans:

    Greenplum database
    Greenplum database

    21. gpstart failed what should I do?

    Ans:

      Check upstart log file in ~pgadmin/gpAdminLogs/gpstart_yyyymmdd.log

      Take a look at the pg start log file for more details in

      $MASTER_DATA_DIRECTORY/pg_log/startup.log

    22. Why do we need gpstop -m and gpstart -m?

    Ans:

      The gpstart -m command allows you to start the master only and none of the data segments and is used primarily by support to get system level information/configuration. An end user would not regularly or even normally use it.

    23. What is the procedure to get rid of mirror segments?

    Ans:

      There are no utilities available to remove mirrors from Greenplum. You need to make sure all primary segments are good then you can remove the mirror configuration from gp_configuration in 3.x.

    24. How to run gpcheckcat?

    Ans:

      The gpcheckcat tool is used to check catalog inconsistencies between master and segments. It can be found in the $GPHOME/bin/lib directory:

      • Usage: gpcheckcat
      • [dbname]
      • -?
      • -B parallel: number of worker threads
      • -g dir : generate SQL to rectify catalog corruption, put it in dir
      • -h host : DB hostname
      • -p port : DB port number
      • -P passwd : DB password
      • -o : check OID consistency
      • -U uname : DB User Name
      • -v : verbose
      • Example:
      • gpcheckcat gpdb >gpcheckcat_gpdb_logfile.

    25. What is the difference between pg_dump and gp_dump?

    Ans:

      Pg_ dump gp_ dump
      pg_dump – Non-parallel backup utility, you need a big file system where the backup will be created in the master node only.gp_dump – Parallel backup utility. The backup will be created in the master and segments file system.

    26. What is gp detective and how do I run it in Greenplum?

    Ans:

      The detective utility collects information from a running Greenplum Database system and creates a bzip2-compressed tar output file. This output file helps with the diagnosis of Greenplum Database errors or system failures. for more details check help. gp detective –help

    27. How to delete a standby?

    Ans:

      To remove the currently configured standby master host from your Greenplum Database system, run the following command in the master only: # gpinitstandby -r

    28. How to re-sync a standby?

    Ans:

      Use this option if you already have a standby master configured, and just want to resynchronize the data between the primary and backup master host. The Greenplum system catalog tables will not be updated. # gpinitstandby -n (resynchronize)

    29. How to add mirrors to the array?

    Ans:

      The gpaddmirrors utility configures mirror segment instances for an existing Greenplum Database system that was initially configured with primary segment instances only.

      For more details check help.

      # gpaddmirrors –help

    30. Define Greenplum endpoint architecture?

    Ans:

    Greenplum endpoint architecture
    Greenplum endpoint architecture

    31. How to see primary to mirror mapping?

    Ans:

      From the database catalog following query list configuration on content ID, you can figure out primary and mirror for each content.

      gpdb=# select * from gp_configuration order by content. Note: starting from GPDB 4.x, gp_segment_configuration table is used instead. gpdb=# select * from gp_segment_configuration order by dbid;

    32. How to run gpcheckperf IO/netperf?

    Ans:

      Create a directory where you have free space and common in all hosts.

      For network I/O test for each nic card:

        gpcheckperf -f seg_host_file_nic-1 -r N -d /data/gpcheckperf > seg_host_file_nic_1.out
        gpcheckperf -f seg_host_file_nic-2 -r N -d /data/gpcheckperf > seg_host_file_nic_2.out

      For disk I/O:

        gpcheckperf -f seg_host_file_nic-1 -r ds -D -d /data/gpdb_p1 -d /data/gpdb_p2 -d /data/gpdb_m1 -d /data/gpdb_m2
    Course Curriculum

    Learn AWS Certified Big Data Specialty Online Training Course to Build Your Skills

    Weekday / Weekend BatchesSee Batch Details

    33. How to update postgresql.conf and reload it?

    Ans:

      In GP 4.0 version check gpconfig utility to change postgresql.conf parameters. In 3.X version manually change parameters in postgres.conf for more details check Greenplum Administrator’s Guide.

    34. How to manage pg_hba.conf?

    Ans:

      The pg_hba.conf file of the master instance controls client access and authentication to your Greenplum system. Check Greenplum Administrator’s Guide for instructions to add/change contents of this file.

    35. How would you implement compression and possibly explain the compression types?

    Ans:

      Table levelColumn level
      Table-level compression is applied to an entire table.Column-level compression is applied to a specific column. You can apply different column-level compression algorithms to different columns

    36. How to add a new user to the database?

    Ans:

    • Use create user utility to create users. See create user –help for more details.
    • You can also use SQL commands in psql prompt to create users.
    • For example: CREATE USER or ROLE

    37. How to create a password-free trusted env b/w all the segment hosts?

    Ans:

      Use gpssh-exkeys:

        gpssh-exkeys -h hostname1 -h hostname2 .. -h hostnameN

    38. How to check DB version and version at init DB?

    Ans:

      To check the version:

      • psql> select version();
      • or
      • postgres –gp-version
      • To check gp version at install:
      • psql> select * from gp_version_at_initdb;

    39. What is a vacuum and when should I run this?

    Ans:

      VACUUM reclaims storage occupied by deleted tuples. In normal GPDB operation, tuples that are deleted or obsoleted by an update are not physically removed from their table. They remain present on the disk until a VACUUM is done. Therefore, it is necessary to do VACUUM periodically, especially on a frequently updated table.

    40. An introduction to greenplum?

    Ans:

    Introduction to Greenplum
    Introduction to Greenplum

    41. How to find errors / fatal from log files?

    Ans:

      grep for ERRORS, FATAL, SIGSEGV in pg_logs directory.

    42. What is Analyze and how frequently should I run this?

    Ans:

      ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

      It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the query planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy is to run VACUUM and ANALYZE once a day during a low-usage time of day.

    43. What are resource queues?

    Ans:

      Resource queues are used to manage Greenplum database workload management. All user/queries can be prioritized using Resource queues. Refer to Admin guide for more details.

    44. What is gp_toolkit?

    Ans:

      The gp_toolkit is a database schema, which has many tables, views and functions to better manage Greenplum Database when DB is up. In 3.x earlier versions, it was referred to as gp_jetpack.

    45. What are the major differences between Oracle and Greenplum?

    Ans:

      OracleGreenplum
      Oracle is a relational database. Greenplum is MPP nature. Greenplum is shared-nothing architecture. There are many other differences in terms of functionality and behavior.

    46. Tell me some of the aspects/implementations/configurations you have done in Greenplum?

    Ans:

      A configuration like port change, Active directory authentication, pg_hba.conf changes, postgresql.conf changes, gpfdist, etc.

    47. Which parameters can you use to manage workload in a Greenplum database?

    Ans:

      workload management is done by creating resource queues and assigning various limits.

    48. How would you troubleshoot an issue/error/problem when there is no one available to help you or you are all by yourself?

    Ans:

      Look at the log files. querying the GP perfmon and tools schema for various data and statistics.

    49. What is good and bad about the Greenplum, compared to Oracle and Greenplum

    Ans:

      Greenplum is built on top of Postgresql. It is a shared-nothing, MPP architecture best for data warehousing env. Good for big data analytics purposes. Oracle is an all-purpose database.

    50. What is greenplum text mining data?

    Ans:

    Greenplum text mining data
    Greenplum text mining data

    51. What would you do to gather statistics in the database? as well as reclaim the space?

    Ans:

      VACUUM FULL, CTAS.

      A VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed Greenplum Database tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table.

      A VACUUM FULL is not recommended in Greenplum Database.

    52. What would you do when a user or users are complaining that a particular query is running slow?

    Ans:

    • look at the query plan
    • Look at the stats of the table/tables in the query
    • look at the table distribution keys and joins in the query
    • look at the network performance
    • look at the resource queues
    • look at the interconnect performance
    • look at the join order of tables in the query
    • look at the query itself i.e. if it can be written in a more efficient way

    53. Which option would you use to export the DDL of the database or table?

    Ans:

      s (-s | –schema-only Dump only the object definitions (schema), not data.)

    54. When you restore from a backup taken from gp_dump, can you import a table?

    Ans:

      NO. Yes if during the gp_dump you backed up one table only.

    55. What Is the Difference Between Vacuum And Vacuum Full?

    Ans:

      VacuumVacuum Full
      Unless you need to return space to the OS so that other tables or other parts of the system can use that space, you should use VACUUM VACUUM FULL is only needed when you have a table that is mostly dead rows, that is, the vast majority of its contents have been deleted. Even then, there is no point using VACUUM FULL unless you urgently need that disk space back for other things or you expect that the table will never again grow to its past size.

    56. Which command would you use to back up a database?

    Ans:

        gp_dump, gpcrondump, pg_dump, pg_dumpall, copy

    57. Can you explain the process of data migration from Oracle to Greenplum?

    Ans:

      There are many ways. The simplest steps are Unload data into csv files, create tables in the Greenplum database corresponding to Oracle, Create an external table, start gpfdist pointing to external table location, Load data into Greenplum. You can also use gpload utility. Gpload creates an external table at runtime.

    58. What is the Greenplum performance monitor and how to install it?

    Ans:

      It’s a monitoring tool that collects statistics on system and query performance and builds historical data.

    59. How do I monitor user activity history in the Greenplum database?

    Ans:

      Use Greenplum performance monitor (gpperfmon), which has GUI to monitor and query performance history.

    60. What is Pivotal Greenplum?

    Ans:

    Pivotal Greenplum
    Pivotal Greenplum

    61. What kind of locks should we focus on the MPP system when the system is slow /hung?

    Ans:

      Locks that are held for a very long time and multiple other queries are waiting for that lock also.

    62. How to check if my session queries are running or waiting on locks?

    Ans:

      Check the “waiting” column in pg_stat_activity and the “granted” column in pg_locks for any object-level locks.

    63. How to trace child processes on the segment server?

    Ans:

      When the session starts in master and segments, all the child processes in segments will be identified with a master session_id connection string (con+sess_id).

      For example:

      • gpdb=# select * from pg_Stat_activity;
      • datid | datname | procpid | sess_id |.. ..
      • ——-+———+———+———+
      • 16986 | gpdb | 18162 | 76134 | .. ..
      • In all segments child processes for session 76134:
      • [gpadmin@stinger2]/export/home/gpadmin/gp40>gpssh -f host_file /usr/ucb/ps -auxww |grep con76134
      • [stinger2] gpadmin 18162 1.7 6.0386000124480 ? S 09:57:55 0:04 postgres: port 4000, gpadmin gpdb [local] con76134 [local] cmd3 CREATE DATABASE…………………………………
      • [stinger2] gpadmin 18625 0.3 2.726056455932 ? S 10:01:56 0:01 postgres: port 40000, gpadmin gpdb 10.5.202.12(18864) con76134 seg0 cmd4 NPPEXEC UTILITY………………………….
      • [stinger2] gpadmin 18669 0.0 0.1 3624 752 pts/2 S 10:02:36 0:00 grep con76134
      • [stinger3] gpadmin 22289 0.8 9.4531860196404

    64. How to turn on timing, and check how much time a query takes to execute?

    Ans:

      You can turn in timing per session before you run your SQL with the timing command.

      You can run explain analysis against your SQL statement to get the timing.

    65. What is the difference between a Database Administrator and a Big Data analyst?

    Ans:

      Database Administrator
      Big Data analyst
      Database Administrator is responsible for integrity, security and performance of the database he owns.Big data analyst is responsible for collecting data from different sources, cleaning the data, transforming the data, visualizing it and turning it into a business decision.

    Course Curriculum

    Get JOB Oriented AWS Certified Big Data Specialty Online Training for Beginners By MNC Experts

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    66. My SQL query is running very slow. It was running fine yesterday. What should I do?

    Ans:

      Check that your connection to the Greenplum cluster is still good if you are using a remote client. You can do this by running the SQL locally to the GP cluster.

      Check that the system tables and user tables involved are not bloated or skewed. Read jetpack or Greenplum toolkit documentation about how to do this.

      Check with your DBA that the Greenplum interconnect is still performing correctly.

      This can be done by checking for dropped packets on the interconnect “netstat -i” and by running gpcheckperf. It is also possible that a segment is experiencing hardware problems, which can be found in the output of dmesg or in.

      • cd $MASTER_DATA_DIRECTORY – Master directory.
      • pg_hba.conf and postgres.conf location and other GPDB internal directories.
      • cd $MASTER_DATA_DIRECTORY/pg_logs — Master

    67. What are the backup options available at OS level?

    Ans:

      Solaris: zfs snapshots at the file system level.

      All OS: gpcrondump / gp_dump.

    68. What is gpcrondump?

    Ans:

      wrapper utility for gp_dump, which can be called directly or from a crontab entry.

      Example: gpcrondump -x

    69. How do I clone my production database to the PreProd / QA environment?

    Ans:

      Prod and QA on the same GPDB cluster, use CREATE database template.

      If Prod and QA are on different clusters, use backup and restore utilities.

    70. What is an introduction Greenplum ETL Tool?

    Ans:

    Greenplum ETL Tool
    Greenplum ETL Tool

    71. What are the tools available in Greenplum to take backup and restores?

    Ans:

      For non-parallel backups:

    • Use postgres utilities (pg_dump, pg_dumpall for backup, and pg_restore for restore).
    • Another useful command for getting data out of the database is the COPY to.
    • For parallel backups:
    • gp_dump and gpcrondump for backups and gp_restore for restore process.

    72. How to generate DDL for a table?

    Ans:

      Use pg_dump utility to generate DDL.

      Example:

      • pg_dump -t njonna.accounts -s -f ddl_accounts.sql
      • Where:
      • -f ddl_accounts.sql is an output file.
      • -t njonna.accounts is table name with schema njonna.
      • -s dump only schema no data

    73. What is MPP database?

    Ans:

      An MPP database is a database that is optimized to be processed in parallel for many operations to be performed by many processing units at a time. MPP (massively parallel processing) is the coordinated processing of a program by multiple processors working on different parts of the program.

    74. Is Hadoop a data lake?

    Ans:

      To put it simply, Hadoop is a technology that can be used to build data lakes. A data lake is an architecture, while Hadoop is a component of that architecture. In other words, Hadoop is the platform for data lakes.

    75. Which library is used by Greenplum for in database analytics?

    Ans:

      Apache MADlib is an open-source library for scalable in-database analytics. The Greenplum MADlib extension provides the ability to run machine learning and deep learning workloads in a Greenplum Database.

    76. Can Greenplum store unstructured data?

    Ans:

      Pivotal Greenplum 5.0 entails several improvements for semi-structured/unstructured data processing including the GPText extension, the JSON data type, and improved XML datatype support.

    77. What is data lake storage?

    Ans:

      A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications. While a traditional data warehouse stores data in hierarchical dimensions and tables, a data lake uses a flat architecture to store data, primarily in files or object storage.

    78. What type of Database is Greenplum?

    Ans:

      Greenplum Database is a massively parallel processing (MPP) database server with an architecture specially designed to manage large-scale analytic data warehouses and business intelligence workloads.

    79. What is Greenplum Hadoop?

    Ans:

      It is a massively parallel processing (MPP) database server with an architecture specially designed to manage large-scale analytic data warehouses and business intelligence workloads. It is based on PostgreSQL open-source technology.

    80. What is greenplum UAP components?

    Ans:

    Greenplum UAP components
    Greenplum UAP components

    81. What is the difference between PostgreSQL and Greenplum?

    Ans:

      GreenplumPostgreSQL
      Greenplum is both a data warehouse and transactional or operational data store.Greenplum as a database tool employs a shared-nothing architecture as compared to PostgreSQL.

    82. What’s the placement Of Pg_hba/logfile/master_data_directory?

    Ans:

      • cd $MASTER_DATA_DIRECTORY – Master directory.
      • pg_hba.conf and postgres.conf location and different GPDB internal directories.
      • cd $MASTER_DATA_DIRECTORY/pg_logs — Master information log files location

    83. What are Greenplum Performance Monitor and the way To Install?

    Ans:

      It is a observation tool that collects statistics on system and question performance and builds historical knowledge.

    84. Are you able to justify the method of information Migration From Oracle To Greenplum?

    Ans:

      There square measure some ways. Simplest steps square measure Unload knowledge into csv files, produce tables in greenplum information equivalent to Oracle, produce external table, begin gpfdist inform to external table location, Load knowledge into greenplum. you’ll conjointly use gpload utility. Upload creates external table at runtime.

    85. What is Multi-version control?

    Ans:

      Multi-version concurrency control or MVCC is used to avoid unnecessary locking of the database. This removes the time lag for the user to log into his database. This feature or time lag occurs when someone else is accessing the content. All transactions are kept as a record.

    86. What will be the new characteristics of PostgreSQL 9.1?

    Ans:

      During the process of updating the project, one can never be certain that features will go in and which ones won’t make the cut. The project has precise and stringent standards for quality, and some patches may or may not match them before the set deadline.

    87. How To Turn On Timing, And Checking How Much Time A Query Takes To Execute?

    Ans:

      You can turn in timing per session before you run your SQL with the timing command.

      You can run explain analyze against your SQL statement to get the timing.

    88. Is Pivotal greenplum free?

    Ans:

      As mentioned throughout this post, Greenplum is an open source database so the community version is absolutely free to download and use.

    89. How To See The Value Of Guc?

    Ans:

      By connecting the GPDB database using psql query catalog or do show parameters.

      Example:

      • gpdb# select name,setting from pg_settings where name=’GUC’;
      • or
      • gpdb# show <'GUC_NAME>;

    90. What is greenplum physical architecture?

    Ans:

    Greenplum  physical architecture
    Greenplum physical architecture

    91. What are external tables in Greenplum?

    Ans:

      An external table is a Greenplum Database table backed with data that resides outside of the database. You create a readable external table to read data from the external data source and create a writable external table to write data to the external source.

    92. How do I create a external table in Greenplum?

    Ans:

      CREATE EXTERNAL WEB TABLE log_output (linenum int, message text) EXECUTE ‘/var/load_scripts/get_log_data.sh’ ON HOST FORMAT ‘TEXT’ (DELIMITER ‘|’); Create a writable external table named sales_out that uses gpfdist to write output data to a file named sales. out .

    Big Data Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    93. What is Gpfdist in Greenplum?

    Ans:

      gpfdist is Greenplum Database parallel file distribution program. It is used by readable external tables and gpload to serve external table files to all Greenplum Database segments in parallel.

    Are you looking training with Right Jobs?

    Contact Us

    Popular Courses

    Get Training Quote for Free