15+ Must-Know SAS [DI ] Interview Questions & Answers
SAS DI Interview Questions and Answers

15+ Must-Know SAS [DI ] Interview Questions & Answers

Last updated on 04th Jul 2020, Blog, Interview Questions

About author

Salimkumar (Sr SAP Director )

He is Possessing 11+ Years Of Experience in SAS. His Passion lies in Developing Entrepreneurs & Activities. Also, Rendered his intelligence to the Enthusiastic JOB Seekers.

(5.0) | 15212 Ratings 6397

SAS Di is basically SAS Data Integration studio , a tool like base sas editor for coding and reporting purpose. You can call it a drag and drop tool which is equipped with many inbuilt functions, macros , transformation, loaders etc.

1) What is Data Integration?

Ans:

  • The process of combining data from different resources.
  • The combined data is provided to the users with unified view.
  • Information from different enterprise domains are integrated – known as Enterprise Information Integration.
  • Useful for merging information from different technologies among enterprises.

2) What is transformation in SAS data integration?

Ans:

It is a metadata object which determines how to extract data, transform data and load data into data stores.

3) What is the difference between unique key and primary key?

Ans:

Unique key is one or more columns that can be used to uniquely identify a row in a table. A table can have one or more unique keys. Unique keys can contain null values. While on the other hand table can have only one primary key. One or more columns in a primary key cannot contain null values.

4) Explain about Pivot – Columns to Rows?

Ans:

  • Data Integrator produces a row in the output data set for every value in the designated pivot column.
  • More than one pivot column can be set as per the need of application’s data integration.
  • Pivot Sequence Column – Data Integrator increments a sequence number for every row created from a pivot column.
  • Non-Pivot column – The columns that need to appear in the target.
  • Pivot Set – A group of pivot columns, unique data field and header column.
  • Data Field Column – It contains the pivot data along with pivot columns values.
  • Header Column – Lists the name of the columns.

5) What are the benefits of data integration?

Ans:

Following are the benefits of data integration:

  • -Makes reporting, monitoring, placing customer information across the enterprise flexible and convenient.
  • -Data usage is efficient.
  • -Cost Effective.
  • -Risk adjusted profitability management as it allows accurate data extraction.
  • -Allows timely and reliable reporting, as data quality is the prime technology for business challenges.

6) Describe how to adjust the performance of Data Integrator?

Ans:

Following are the ways to perform this:

  • Using array fetch size.
  • Ordering the joins.
  • Extracted data minimizing.
  • Locale conversion minimization.
  • Setting target-based options to optimize the performance.
  • Improving throughput.
  • Data type conversion minimization.

7) What do you mean by data staging area?

Ans:

Staging area of the data warehouse is both the storage area and set of process commonly referred as extract transformation load. The data staging area is everything between the operational source systems and the data presentation area.

8) What is data governance?

Ans:

It is the robust, reliable, repeatable and controlled process both at point of input and through subsequent downstream control checks. This process exists to manage updates of business rules to maintain a level of consistency.

9) What is data access?

Ans:

It is the access by selected business users to raw (untransformed) data loads.

10) What is slowly changing dimension?

Ans:

This is the technique for tracking changes to dimensional table values in order to analyze trends. For example, a dimension table named customers might have columns for customer id, home address and income. Each time the address or income changes for a customer, a new row could be created for that customer in the dimensional table and old row could be retained.

11) What is snow flake schema?

Ans:

Snow flake schema is defined in which a single fact table is connected to multiple dimension tables. The dimension are structured to minimize update anomalies and to address single themes.

12) How can we minimize the space requirement of a huge data set in SAS for window?

Ans:

When we are working with large data sets, we can do the following steps to reduce space requirements:

  • Split the huge data set into smaller data sets
  • Clean up our working space as much as possible at each step
  • Use data set options (keep= or drop=) or statements (keep or drop) to limit to only the variables needed
  • Use IF statement or OBS= to limit the number of observations
  • Use WHERE= or WHERE or index to optimize the WHERE expression to limit the number of observations in a PROC Step and a DATA Step
  • Use length to limit the bytes of variables
  • Use a _null_ data set name when we don’t need to create a data set
  • Compress the data set using system options or data set options (COMPRESS=yes or COMPRESS=binary)

13) What is star schema?

Ans:

Star schema is defined as database in which single fact table is connected to multiple dimension tables. This is represented in a star schema.

14) What is SAS application server, database server, SAS OLAP server and SAS metadata server?

Ans:

SAS application server provides SAS services to a client. On the other hand database server provides relational database service to a client. Oracle, DB2, and Teradata are examples of relational databases.  SAS OLAP server provides access to multidimensional data. SAS metadata server provides metadata management services to one or more client application.

15) What is operational data and operational system?

Ans:

Operational data is used as source data for a data warehouse. While operational system is one or more programs that provide source data for a data warehouse.

16) What Is Change Analysis In Sas Di?

Ans:

Change analysis is the process of comparing one set of metadata to another set of metadata and identifying the differences between the two sets of metadata.

17) What Is The Use Of Sas Management Console?

Ans:

SAS management console application provides a single user interface for performing SAS administrative tasks.

18) Name Some Data Transformation Used In Sas Di?

Ans:

Types of the data transformation are append, apply lookup standardization, create match code transformation, data transfer, data validation, extract, fact table lookup, key effective data transformation, lookup, SAS rank, SAS sort, SAS splitter, SCD type 2 loader, SQL join, standardize transformation, Surrogate key generator , Transpose transformation, User written code transformation.

19) Describe About Metadata Object?

Ans:

It is a set of attributes that describe a table, a server, a user and another resource on a network.

20) Name The Scheduler For Scheduling Job And Explain The Scheduler?

Ans:

The scheduler used for scheduling job is control m while CONTROL-m also user to view process flow and dependencies so that they can optimize business processes easily and efficiently, even in a data center that includes multiple platform types (for example, Unix, Microsoft Windows, and MVS)

    Subscribe For Free Demo

    21) What Is Change Analysis In Sas Di ?

    Ans:

    Change analysis is the process of comparing one set of metadata to another set of metadata and identifying the differences between the two sets of metadata.

    22) Describe The Interaction Table In Sas Di?

    Ans:

    Table that describes the relationships between two or more tables. For example, an intersection table could describe the many-to-many relationships between a table of users and a table of groups.

    23) What Are The Prime Responsibilities Of Data Integration Administrator?

    Ans:

    • Scheduling and executing the batch jobs.
    • Configuring, starting and stopping the real-time services
    • Adapters configuration and managing them.
    • Repository usage, Job Server configuration.
    • Access Server configuration.
    • Batch job publishing.
    • Real-time services publishing through web services.

    24) Explain The Difference Between Alternate Key, Business Key, Foreign Key , Generated Key , Primary Key, Retained Key And Surrogate Key ?

    Ans:

    • Alternate key is term also known as unique key.
    • Business key is one or more columns in a dimension table that comprise the primary key in a source table in an operational system.
    • Foreign key is one or more columns that are associated with a primary key or unique key in another table. A table can have one or more foreign keys. A foreign key is dependent upon its associated primary or unique key. In other words, a foreign key cannot exist without that primary or unique key.
    • Generated keys is used to implement surrogate keys and retained keys, one or more columns that are used to uniquely identify a row in a table. A table can have only one primary key. One or more columns in a primary key cannot contain null values. Retained key is a numeric column in a dimension table that is the primary key of that table.
    • Surrogate key is a column which contains unique integer values that are generated sequentially when rows are added and updated. In the associated fact table, the surrogate key is included as a foreign key in order to connect to specific dimensions.

    25) Explain About Data Integrator Metadata Reports?

    Ans:

    • Browser-based analysis and reporting capabilities are provided by Metadata reports.
    • The DI Metadata Reports are generated on metadata that associates with Data Integration jobs.
    • Other BO applications those are associated with Data Integration.

    Three modules are provided by Metadata Reports. They are:

    • Operational Dashboards.
    • Auto Documentation.
    • Impact and Lineage analysis.

    26) Explain About Various Caches Available In Data Integrator?

    Ans:

    • NO_CACHE – It is used for not caching values.
    • PRE_LOAD_CACHE – Result column preloads and compares the column into the memory, prior to executing the lookup.
    • PRE_LOAD_CACHE is used when the table can exactly fit in the memory space.
    • DEMAND_LOAD_CACHE – Result column loads and compares the column into the memory when a function performs the execution.
    • DEMAND_LOAD_CACHE is suitable while looking up the highly repetitive values with small subset of data.

    27) What Is Hierarchy Flattening?

    Ans:

    • Construction of parent/child relationships hierarchy is known as Hierarchy Flattening.
    • A description of hierarchy in the vertical or horizontal format is produced.
    • The hierarchy pattern includes Parent column, Child Column, Parent Attributes and Child Attributes.
    • Hierarchy Flattening allows to understand the basic hierarchy of BI in a lucid manner.
    • As the flattening is done in horizontal or vertical format, the sub elements are easily identified.

    28) Is Data Integration And Etl Programming Is Same?

    Ans:

    • No, Data Integration and ETL programming are different.
    • Passing of data to different systems from other systems is known as data integration.
    • It may integrate data within the same application.
    • ETL, on the other hand, is to extract the data from different sources.
    • The primary ETL tool job is to transform the data and loads into other objects or tables.

    29) Describe About Physical Data Integration?

    Ans:

    Physical Data Integration is all about creating new system that replicates data from the source systems. This process is done to manage the data independent of the original system. Data Warehouse is the example of Physical Data Integration. The benefits of PDI include data version management, combination of data from various sources, like mainframes, flat files, databases.

    30) Why Is Sas Data Integration Studio Important?

    Ans:

    Companies are realizing that in order to succeed they need an integrated view of their data and SAS Data Integration Studio is the single tool that provides the flexibility, reliability and agility needed to respond to new data integration challenges. Regardless of the project, SAS Data Integration Studio users can respond with speed and efficiency, reducing the overall cost of data integration.

    Course Curriculum

    Gain In-Depth Knowledge On SAS DI Certification Course to Build Your Skills

    Weekday / Weekend BatchesSee Batch Details

    31) For Whom Is Sas Data Integration Studio Designed?

    Ans:

    • SAS Data Integration Studio empowers data integration managers and designers to work more efficiently, manage change effectively and deliver high-quality results faster.
    • Our design of course tutorials and interview questions is practical and informative. At TekSlate, we offer resources to help you learn various IT courses. We avail both written material and demo video tutorials. For in-depth knowledge and practical experience explore Online SAS DI Training.

    32) What is data dimension?

    Ans:

    It is the definition of the customer, product and organization data to be held in central location and to be accessed by all the with governance around change. Common set of dimension are required to support all business views of data and self service reporting. Business view has the capability to segment customer, product and organization data across any dimension.

    33) What is data reconciliation?

    Ans:

    It Is the correction of the data inconsistency If data is loaded incorrectly from source system and if there is inconsistency between source system and loaded data then source system is treated authoritative.

    34) What do you mean by exception reporting?

    Ans:

    It is the reporting for data load completion to include any data issues.

    35) What is multi dimensional reporting?

    Ans:

    It allows users to efficiently aggregate and analyze business metrics across core business dimensions.

    36) What do you mean by fact table in dw?

    Ans:

    Fact table represents the subject oriented and the focus of analysis. it generally contains elements f analysis such as sales, cost, quantity sold etc. these attributes can be summed up or averaged in order to understand the particular aspects. Fact table can also be sum up as union of the dimension tables that allows the fact measures to be analyzed from different scenarios.

    37) What do you mean by dimension tables?

    Ans:

    Dimension table are integral companion to a fact table. It contains the textual descriptions of the business. In a well designed dimensional model, dimension tables have many columns or attributes. These attributes describe the row in the dimensional table. Dimension attributes serve as the primary source of query constraints, groupings and report labels.

    38) Explain about Manual Integration and Application Based Integration.

    Ans:

    1. Manual Integration:
    • Also known as Common User Interface.
    • All the relevant information to access form the source system or web page interface is operated by the users.
    • Unified view of the data does not exist.
    1. Application Based Integration:
    • ABI requires specific applications for implementing all the integration efforts.
    • When the number of applications is limited, this approach is well manageable.

    39) Difference between Data integration And ETL programming?

    Ans:

    • Passing of data to different systems from other systems is known as data integration. It may integrate data within the same application.
    • ETL, on the other hand, is to extract the data from different sources. The primary ETL tool job is to transform the data and loads into other objects or tables.

    40) What is Uniform Data Access Integration?

    Ans:

    • The generated consolidated data need not require separate storage space.
    • Data history and version management is limited and applied only to the similar type of data.
    • Accessing to the user data overloads on the source systems.
    • UDAI places the data in the source systems.
    • A set of views are defined for providing access the unified view to the clients / customers.
    • Zero latency of data can be propagated from the source system.

    41) Describe deployment of SAS Data Integration studio jobs as a SAS stored process.

    Ans:

    Job can be deploy as a SAS stored process in SAS Data Integration Studio. Code is generated for the stored process and the code is saved to a file in a source repository. Metadata about the stored process is saved to the current metadata server. The stored process can be executed as required by requesting applications. Stored processes can be used for Web reporting, analytics, building Web applications, delivering result packages to clients or the middle tier, and publishing results to channels or repositories. Stored processes can also access any SAS data source or external file and create new data sets, files, or other data targets supported by the SAS System

    42) What is History Preserving?

    Ans:

    • History Preserving is for providing new row in the target instead of updating the existing row.
    • The columns are indicated for transforming the changes that are to be preserved.
    • New rows are created when the value of certain column changes.
    • Each of these rows is flagged as UPDATE.
    • The UPDATE flag is applied for the input data set.

    43) What is the function of output statement in a SAS Program?

    Ans:

    You can use the OUTPUT statement to save summary statistics in a SAS data set. This information can then be used to create customized reports or to save historical information about a process. You can use options in the OUTPUT statement to

    1. Specify the statistics to save in the output data set,
    2. Specify the name of the output data set, and
    3. Compute and save percentiles not automatically computed by the CAPABILITY procedure.

    44) What is the function of Stop statement in a SAS Program?

    Ans:

    Stop statement causes SAS to stop processing the current data step immediately and resume processing statement after the end of current data step.

    45) What is the difference between using drop = data set option in data statement and set statement?

    Ans:

    If you don’t want to process certain variables and you do not want them to appear in the new data set, then specify drop = data set option in the set statement. Whereas If want to process certain variables and do not want them to appear in the new data set, then specify drop = data set option in the data statement.

    46) What is the difference between reading data from an external file and reading data from an existing data set?

    Ans:

    The main difference is that while reading an existing data set with the SET statement, SAS retains the values of the variables from one observation to the next. Whereas when reading the data from an external file, only the observations are read. The variables will have to re-declared if they need to be used.

    47) How many data types are there in SAS? 

    Ans:

    There are two data types in SAS. Character and Numeric. Apart from this, dates are also considered as characters although there are implicit functions to work upon dates.

    48) What are the differences between PROC MEANS and PROC SUMMARY?

    Ans:

    • PROC MEANS produces subgroup statistics only when a BY statement is used and the input data has been previously sorted (using PROC SORT) by the BY variables.
    • PROC SUMMARY automatically produces statistics for all subgroups, giving you all the information in one run that you would get by repeatedly sorting a data set by the variables that define each subgroup and running PROC MEANS. PROC SUMMARY does not produce any information in your output. So you will need to use the OUTPUT statement to create a new DATA SET and use PROC PRINT to see the computed statistics.

    49) How does PROC SQL work? 

    Ans:

    PROC SQL is a simultaneous process for all the observations. The following steps happen when PROC SQL is executed:

    1. SAS scans each statement in the SQL procedure and check syntax errors, such as missing semicolons and invalid statements.
    2. SQL optimizer scans the query inside the statement. The SQL Optimizer decides how the SQL query should be executed in order to minimize run time.
    3. Any tables in the FROM statement are loaded into the data engine where they can then be accessed in memory.
    4. Code and Calculations are executed.
    5. Final Table is created in memory.
    6. Final Table is sent to the output table described in the SQL statement.

    50) Briefly explain Input and Put function?

    Ans:

    • Input function – Character to numeric conversion- Input(source,informat)
    • put function –  Numeric to character conversion- put(source,format)
    Course Curriculum

    Get Trained with SAS DI Training from Top-Rated Industry Experts

    • Instructor-led Sessions
    • Real-life Case Studies
    • Assignments
    Explore Curriculum

    51) Name few SAS functions?

    Ans:

    Scan, Substr, trim, Catx, Index, tranwrd, find, Sum.

    52) What is the work of tranwrd function?

    Ans:

    TRANWRD function replaces or removes all occurrences of a pattern of characters within a character string.

    53) How do you use the do loop if you don’t know how many times you should execute the do loop?

    Ans:

    We can use ‘do until’ or ‘do while’ to specify the condition.

    54) What is the difference between do while and do until?

    Ans:

    An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop never executes. Whereas DO UNTIL  executes at least once.

    55) If a variable contains only numbers, can it be a character data type?

    Ans:

    Yes, it depends on how you use the variable. There are some numbers we will want to use as a categorical value rather than a quantity. An example of this can be a variable called “Foreigner” where the observations have the value “0” or “1” representing not a foreigner and foreigner respectively. Similarly, the ID of a particular table can be in number but does not specifically represent any quantity. Phone numbers is another popular example.

    56) If a variable contains letters or special characters, can it be numeric data type?

    Ans:

    No, it must be character data type.

    57) What can be the size of largest dataset in SAS?

    Ans:

    • The number of observations is limited only by computer’s capacity to handle and store them.
    • Prior to SAS 9.1, SAS data sets could contain up to 32,767 variables. In SAS 9.1, the maximum number of variables in a SAS data set is limited by the resources available on your computer.

    58) Give some examples where PROC REPORT’s defaults are different than PROC PRINT’s defaults?

    Ans:

    • No Record Numbers in Proc Report
    • Labels (not var names) used as headers in Proc Report
    • REPORT needs NOWINDOWS option

    59) Give some examples where PROC REPORT’s defaults are same as PROC PRINT’s defaults?

    Ans:

    • Variables/Columns in position order.
    • Rows ordered as they appear in data set.

    60) What is the purpose of trailing @ and @@? How do you use them?

    Ans:

    The trailing @ is also known as a column pointer. By using the trailing @, in the Input statement gives you the ability to read a part of your raw data line, test it and then decide how to read additional data from the same record.

    • The single trailing @ tells the SAS system to “hold the line”.
    • The double trailing @@ tells the SAS system to “hold the line more strongly”.
    • An Input statement ending with @@ instructs the program to release the current raw data line only when there are no data values left to be read from that line. The @@, therefore, holds the input record even across multiple iteration of the data step. 

    61) What is the difference between Order and Group variable in proc report?

    Ans:

    • If the variable is used as group variable, rows that have the same values are collapsed.
    • Group variables produce list report whereas order variable produces summary report.

    62) What are the default statistics for means procedure?

    Ans:

    n-count, mean, standard deviation, minimum, and maximum

    63) How to limit decimal places for variable using PROC MEANS?

    Ans:

    By using MAXDEC= option

    64) What is the difference between CLASS statement and BY statement in proc means?

    Ans:

    • Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables.
    • BY group results have a layout that is different from the layout of CLASS group results.

    65) What is the difference between PROC MEANS and PROC Summary?

    Ans:

    The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.

    66) How to specify variables to be processed by the FREQ procedure?

    Ans:

    By using TABLES Statement.

    67) Describe CROSSLIST option in TABLES statement?

    Ans:

    Adding the CROSSLIST option to TABLES statement displays crosstabulation tables in ODS column format.

    68)  How to create list output for crosstabulations in proc freq?

    Ans:

    • To generate list output for crosstabulations, add a slash (/) and the LIST option to the TABLES statement in your PROC FREQ step.
    • TABLES variable-1*variable-2 <* … variable-n> / LIST;

    69) Where do you use PROC MEANS over PROC FREQ?

    Ans:

    We will use PROC MEANS for numeric variables whereas we use PROC FREQ for categorical variables. 

    70) Explain Input and Put functions in SAS.

    Ans:

    • The INPUT function is used to convert the value of a variable into another value according to a specified format. They are used to change a character value to a numeric value.
    • The PUT function is also used to change a numeric value to a character value. The function has no effects on the formats specified in the PUT statement.
    SAS DI Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

    71) How to import SAS dataset into SAS?   

    Ans:

    A SAS dataset is placed inside a library where the libname statement connects the library name to a directory. The datasets are read through the PROC SQL or Merge statements in the data steps.

    72) How to comment in SAS?

    Ans:

    Steps to comment code in SAS are as follows:

    • Highlight the several lines of code using the cursor.
    • Press Ctrl + / to comment out the lines of code.
    • To uncomment press Ctrl + Shift+ /.

    73) What is SAS grid?

    Ans:

    The SAS grid is an environment where the tasks are distributed among multiple systems on a network, which is managed by the SAS Grid Manager. The workload is distributed across the system grid that helps in job scheduling, load balancing of the work and acceleration of the processing.

    74) What is the data step in SAS?

    Ans:

    A group of SAS language statements that start with a DATA statement is called a data step. It is the basic step of creating a data set with Base SAS software. It contains other statements that are used to manipulate the SAS datasets and develop new data sets from raw data files. A data step can be used for evaluating variable values, obtaining information, managing files, writing files, etc.

    75) How to find missing values in SAS?

    Ans:

    The missing values in SAS can be found out by the following steps:

    • Specify such a format for the variables so that the missing variables have a single value and the now missing value have a different value.
    • In the Tables statement, mention the Missing and Misprint
    • In the Tables statement, use the _char_ and _numeric_ keywords for mentioning that the FREQ procedure analyses the statistics for all numeric and character values.

    76) How to import xlsx file in SAS?

    Ans:

    The excel files having an extension of flux can be converted into SAS using the PROC Import and the SAS Studio point and click method.

    77) What is SAS Model?

    Ans:

    The SAS (Statistical Analysis System) is a software model that is used for business analytics, business intelligence, predictive analysis, and data management. The SAS software suite has a Model that is used for managing analytical models through a web interfaced repository. Candidate models are also developed and validated through SAS Model manager.

    78) How to create a permanent library in SAS?

    Ans:

    Steps to create a permanent library in SAS are as follows:

    • For a new library, the read/write/execute permissions are required. As rights to the server are needed, it is advisable that the administrator creates the library.
    • A permanent library is created in the SAS management console., where the metadata is stored.
    • In the stored process or program the meta-library has to be referenced by the statement

    79) What is clinical SAS?

    Ans:

    SAS clinical is an application used in the clinical domain for conducting research and experimentation by pharmaceutical companies. Clinicians and doctors use SAS programming to collect and examine clinical data.

    80) What do you mean by fact table in dw ?

    Ans:

    Fact table represents the subject oriented and the focus of analysis. it generally contains elements f analysis such as sales, cost, quantity sold etc. these attributes can be summed up or averaged in order to understand the particular aspects. Fact table can also be sum up as union of the dimension tables that allows the fact measures to be analyzed from different scenarios.

    Are you looking training with Right Jobs?

    Contact Us
    Get Training Quote for Free