35+ Talend Interview Questions & Answers [ ETL TRICKS ]
Last updated on 25th Jun 2020, Blog, Interview Questions
In today’s Big data and the cloud-centric world, it becomes very important for the organizations to harness their enterprise information. Talend is an open source software integration platform that helps you in effortlessly turning this data into business insights. The ever-growing demand for Talend Certification today is proof of its worth in the market. Through this blog on what Talend is, I will give you an introduction to Talend ETL Tool.
1. What Is a Talend?
Talend is an open source software integration platform/vendor.
- It offers data integration and data management solutions.
- This company provides various integration software and services for big data, cloud storage, data integration, data management, master data management, data quality, data preparation, and enterprise applications.
- But Talend’s first product i.e. Talend Open Studio for Data Integration is more popularly referred as Talend.
2. What is Talend Open Studio?
Talend Open Studio is an open source project that is based on Eclipse RCP. It supports ETL oriented implementations and is generally provided for the on-premises deployment. This acts as a code generator which produces data transformation scripts and underlying programs in Java. It provides an interactive and user-friendly GUI which lets you access the metadata repository containing the definition and configurations for each process performed in Talend.
3. What is a project in Talend?
‘Project’ is the highest physical structure which bundles up and stores all types of Business Models, Jobs, metadata, routines, context variables or any other technical resources.
4. Describe a Job Design in Talend.
A Job is a basic executable unit of anything that is built using Talend. It is technically a single Java class which defines the working and scope of information available with the help of graphical representation. It implements the data flow by translating the business needs into code, routines, and programs.
5. What is a ‘Component’ in Talend?
A component is a functional piece which is used to perform a single operation in Talend. On the palette, whatever you can see all are the graphical representation of the components. You can use them with a simple drag and drop. At the backend, a component is a snippet of Java code that is generated as a part of a Job (which is basically a Java class). These Java codes are automatically compiled by Talend when the Job is saved.
6. Explain the various types of connections available in Talend.
Connections in Talend define whether the data has to be processed, data output, or the logical sequence of a Job. Various types of connections provided by Talend are:
Row: The Row connection deals with the actual data flow. Following are the types of Row connections supported by Talend:
- Multiple Input/Output
Iterate: The Iterate connection is used to perform a loop on files contained in a directory, on rows contained in a file or on the database entries.
Trigger: The Trigger connection is used to create a dependency between Jobs or Subjobs which are triggered one after the other according to the trigger’s nature. Trigger connections are generalized in two categories:
- Run if
- Run if
Link: The Link connection is used to transfer the table schema information to the ELT mapper component.
7. Why is Talend called a Code Generator?
Talend provides a user-friendly GUI where you can simply drag and drop the components to design a Job. When the Job is executed, Talend Studio automatically translates it into a Java class at the backend. Each component present in a Job is divided into three parts of Java code (begin, main and end). This is why Talend studio is called a code generator.
8. What are the various types of schemas supported by Talend?
Some of the major types of schemas supported by Talend are:
- Repository Schema: This schema can be reused across multiple jobs and any changes done will be automatically reflected to all the Jobs using it.
- Generic Schema: This schema is not tied to any particular source & is used as a shared resource across multiple types of data sources.
- Fixed Schema: These are the read-only schemas which will come predefined with some of the components.
9. Explain Routines.
Routines are the reusable pieces of Java code. Using routines you can write custom code in Java in order to optimize data processing, improve Job capacity, and extend Talend Studio features.
Talend supports two types of routines:
- System routines: These are the read-only codes which you can call directly in any Job.
- User routines: These are the routines which can be custom created by the users by either creating new ones or adapting the existing ones.
10. Can you define schema at runtime in Talend?
Schemas can’t be defined during runtime. As the schemas define the movement of data, it must be defined while configuring the components.
11. What are Context Variables and why are they used in Talend?
Context variables are the user-defined parameters used by Talend which are passed into a Job at the runtime. These variables may change their values as the Job promotes from Development to Test and Production environment. Context variables can be defined in three ways:
- Embedded Context Variables
- Repository Context Variables
- External Context Variables
12. Can you define a variable which can be accessed from multiple Jobs?
Yes, you can do that by declaring a static variable within a routine. Then you need to add the setter/getter methods for this variable in the routine itself. Once done, this variable will be accessible from multiple Jobs.
13. What is a Subjob and how can you pass data from parent Job to child Job?
A Subjob can be defined as a single component or a number of components which are joined by data-flow. A Job can have at least one Subjob. To pass a value from the parent Job to child Job you need to make use of context variables.
14. Define the use of ‘Outline View’ in TOS.
Outline View in Talend Open Studio is used to keep the track of return values available in a component. This will also include the user-defined values configured in a tSetGlobal component.
15. Explain tMap component. List down the different functions that you can perform using it.
tMap is one of the core components which belongs to the ‘Processing’ family in Talend. It is primarily used for mapping the input data to the output data. tMap can perform following functions:
- Add or remove columns
- Apply transformation rules on any type of field
- Filter input and output data using constraints
- Reject data
- Multiplex and demultiplex data
- Concatenate and interchange the data
16. What is a scheduler?
A scheduler is a software which selects processes from the queue and loads them into memory for execution. Talend does not provide a built-in scheduler.
17. Describe the ETL Process.
ETL stands for Extract, Transform and Load. It refers to a trio of processes which are required to move the raw data from its source to a data warehouse, a business intelligence system, or a big data platform.
- Extract: This step involves accessing the data from all the Storage Systems like RDBMS, Excel files, XML files, flat files etc.
- Transform: In this step, entire data is analyzed and various functions are applied on it to transform that into the required format.
- Load: In this step, the processed data, i.e. the extracted and transformed data, is then loaded to a target data repository which usually is the database, by utilizing minimal resources.
18. Can we use ASCII or Binary Transfer mode in SFTP connection?
No, the transfer modes can’t be used in SFTP connections. SFTP doesn’t support any kind of transfer modes as it is an extension to SSH and assumes an underlying secure channel.
19. How do you schedule a Job in Talend?
In order to schedule a Job in Talend first, you need to export the Job as a standalone program. Then using your OS’ native scheduling tools (Windows Task Scheduler, Linux, Cron etc.) you can schedule your Jobs.
20. Explain the purpose of tDenormalizeSortedRow.
tDenormalizeSortedRow belongs to the ‘Processing’ family of the components. It helps in synthesizing sorted input flow in order to save memory. It combines all input sorted rows in a group where the distinct values are joined with item separators.
21. Differentiate between “insert or update” and “update or insert”.
- insert or update: In this action, the first Talend tries to insert a record, but if a record with a matching primary key already exists, then it updates that record.
- update or insert: In this action, Talend first tries to update a record with a matching primary key, but if there is none, then the record is inserted.
22. Explain the usage of tContextLoad.
tContextLoad belongs to the ‘Misc’ family of components. This component helps in modifying the values of the active context on the fly. Basically, it is used to load a context from a flow. It sends warnings if the parameters defined in the input are not defined in the context and also if the context is not initialized in the incoming data.
23. Discuss the difference between XMX and XMS parameters.
XMS parameter is used to specify the initial heap size in Java whereas XMX parameter is used to specify the maximum heap size in Java.
24. What is the use of an Expression Editor in Talend?
From an Expression Editor, all the expressions like Input, Var or Output, and constraint statements can be viewed and edited easily. Expression Editor comes with a dedicated view for writing any function or transformation. The necessary expressions which are needed for the data transformation can be directly written in the Expression editor or you can also open the Expression Builder dialog box where you can just write the data transformation expressions.
25. Explain the error handling in Talend.
There are few ways in which errors in Talend can be handled:
- For simple Jobs, one can rely on the exception throwing process of Talend Open Studio, which is displayed in the Run View as a red stack trace.
- Each Subjob and component has to return a code which leads the additional processing. The Subjob Ok/Error and Component Ok/Error links can be used to direct the error towards an error handling routine.
- The basic way of handling an error is to define an error handling Subjob which should execute whenever an error occurs.
27. How can you execute a Talend Job remotely?
You can execute a Talend Job remotely from the command line. All you need to do is, export the job along with its dependencies and then access its instructions files from the terminal.
28. Can you exclude headers and footers from the input files before loading the data?
Yes, the headers and footers can be excluded easily before loading the data from the input files.
29. Explain the process of resolving ‘Heap Space Issue’.
‘Heap Space Issue’ occurs when JVM tries to add more data into the heap space area than the space available. To resolve this issue, you need to modify the memory allocated to the Talend Studio. Then you have to modify the relevant Studio .ini configuration file according to your system and need.
30. What is the purpose of the ‘tXMLMap’ component?
This component transforms and routes the data from single or multiple sources to single or multiple destinations. It is an advanced component which is sculpted for transforming and routing XML data flow. Especially when we need to process numerous XML data sources.
Best Talend Training Cover In-Depth Concepts By Top-Rated Instructors
- Instructor-led Sessions
- Real-life Case Studies
31. Differentiate between TOS for Data Integration and TOS for Big Data.
Talend Open Studio for Big Data is the superset of Talend For Data Integration. It contains all the functionalities provided by TOS for DI along with some additional functionalities like support for Big Data technologies. That is, TOS for DI generates only the Java codes whereas TOS for BD generates MapReduce codes along with the Java codes.
32. What are the various Big data technologies supported by Talend?
In TOS for BD, the Big Data family is really very large and few of the most used technologies are:
- Google Storage
- Sqoop etc.
33. How can you run multiple Jobs in parallel within Talend?
As Talend is a java-code generator, various Jobs and Subjobs in multiple threads can be executed to reduce the runtime of a Job. Basically, there are three ways for parallel execution in Talend Data Integration:
- tParallelize component
- Automatic parallelization
34. What are the mandatory configurations needed in order to connect to HDFS?
In order to connect to HDFS you must provide the following details:
- NameNode URI
- User name
35. Which service is mandatory for coordinating transactions between Talend Studio and HBase?
Zookeeper service is mandatory for coordinating the transactions between TOS and HBase.
36. What is the name of the language used for Pig scripting?
Pig Latin is used for scripting in Pig.
37. When do you use tKafkaCreateTopic component?
This component creates a Kafka topic which the other Kafka components can use as well. It allows you to visually generate the command to create a topic with various properties at topic-level.
38. Explain the purpose of the tPigLoad component.
Once the data is validated, this component helps in loading the original input data to an output stream in just one single transaction. It sets up a connection to the data source for the current transaction.
39. What component do you need to use to automatically close a Hive connection as soon as the main Job finishes execution?
Using a tPostJob and tHiveClose components you can close a Hive connection automatically.
40. In Talend Studio, where can you find the components needed to create a job?
- Run view
- Designer Workspace
41. In the component view, where can you change the name of a component from?
- Basic settings
- Advanced settings
42. The HDFS components can only be used with Big Data batch or Big Data streaming Jobs.
43. An analysis on Hive table content can be executed in which perspective of Talend Studio?
- Big Data
44. What does an asterisk next to the Job name signify in the design workspace?
- It is an active Job
- The Job contains unsaved changes
- The job is currently running
- The Job contains errors
2). The Job contains unsaved changes
45. Suppose you have designed a Big Data batch using the MapReduce framework. Now you want to execute it on a cluster using Map Reduce. Which configurations are mandatory in the Hadoop Configuration tab of the Run view?
- Name Node
- Data Node
- Resource Manager
- Job Tracker
4). Job Tracker
46. How to find configuration error messages for a component?
- Right-click the component and select “Show Problems”
- However over the error symbol within the Designer view
- Open the Errors view
- Open the Jobs view
2). However over the error symbol within the Designer view
47. What is the process of joining two input columns in the tMap configuration window?
- Dragging a column from the main input table to a column in another input table
- Right-clicking one column in the input table and selecting “Join”
- Selecting two columns in two distinct input tables, right-clicking, and selecting “Join”
- Selecting two columns in two distinct input tables dragging them to the output table
1). Dragging a column from the main input table to a column in another input table
48. To import a file from FTP, which of the following are the mandatory components?
- tFTPConnection, tFTPPut
- tFTPConnection, tFTPFileList, tFTPGet
- tFTPConnection, tFTPGet
- tFTPConnection, tFTPExists, tFTPGet
3). tFTPConnection, tFTPGet
49. Suppose you have three Jobs of which Jobs 1 and 2 are executed parallelly. Job 3 executes only after Jobs 1 and 2 complete their execution. Which of the following components can be used to set this up?
2). tPostJob and 4). Parallelize
50. For a tFileInputDelimited component, what is the default field separator parameter?
Learn Talend Course to Build Your Skills and Get Hired by TOP MNCsWeekday / Weekend BatchesSee Batch Details
51. While saving the changes to a tMap configuration, sometimes Talend asks you for confirmation to propagate changes. Why?
- Because your changes affect the output schema and the source component should have a matching schema
- Because your changes affect the output schema and the target component should have a matching schema
- Because your changes affect an input schema and the related source component should have a matching schema
- Because your changes have not been saved yet
2). Because your changes affect the output schema and the target component should have a matching schema
52. In Talend, how to add a Shape into a Business Model?
- Click and place it from the palette
- Drag it from the repository
- Click in the quick access toolbar
- Drag and drop it from the palette
4). Drag and drop it from the palette
53. How do you create a row link between two components?
- Drag the target component onto the source component
- Right-click the source component and then click on the target component
- Drag the source component onto the target component
- Right-click the source component, click the row followed by the row type and then the target component
4). Right-click the source component, click the row followed by the row type and then the target component
54. Talend Open Studio generates the Job documentation in which of the following formats?
55. We can directly change the generated code in Talend.
56. What is the default date pattern in Talend Open Studio?
57. MDM stands for
- Metadata Management
- Mobile Device Management
- Master Data Management
- Mock Data Management
3) Master Data Management
58. In order to encapsulate and pass the collected log data to the output, which components must be used along with tLogCatcher?
- tDie [Ans]
59. Which component do you need to use in order to read data line by line from an input flow and store the data entries into iterative global variables?
60. tMemorizeRows belongs to which component family in Talend?
61. _________ is a powerful input component which holds the ability to replace a number of other components of the File family.
62. Which component do you need in order to prevent an unwanted commit in MySQL database?
- Mysql Rollback
1) Mysql Rollback
63. A database connection defined in Repository can be reused by any Job within the project.
64. Using which component can you integrate personalized Pig code with a Talend program?
- tPi gCode
4) tPi gCode
65. tKafkaOutput component receives messages serialized into which data type?
66. Two which two component families do tHDFSProperties components belong to?
- Big Data and Misc
- Orchestration and Big Data
- File and Big Data
- Big Data and Internet
File and Big Data
67. This component is used to read data from cache memory for high-speed data access
68. Using which component you can calculate the processing time of one or more Subjobs in the main Job?
- Chronometer Start
2) Chronometer Start
69. tUnite component belongs to which of the following two families?
- File and Processing
- Misc and Messaging
- Orchestration and Messaging
- Orchestration and Processing
4) Orchestration and Processing
70. Using tJavaFlex how many parts of java-code can you add in your Job?
71. Which language Talend is written?
Talend application developed using Java language.
72. When was the Talend tool launched?
Talend Open Studio (TOS) was Launched in the year 2006
73. Can we save our personal settings in the DQ Portal?
No, it is not possible to save our setting in the DQ Portal.
74. Give some advantages of using the Talend
- Talend open studio tool can automate the tasks and offers faster development and deployment.
- Talend has everything that you might need to meet today’s marketing need as well as in the future.
- It is free, and it is backed up by the huge online community. They are mostly professionals or learners who share information, experiences, queries, etc.
75. Define Component concerning Talend open studio
A component is a functional unit which is used to perform a single operation in Talend. You can use them with the help of simple drag and drop functionality. The component can be a snippet of Java code which is generated as a part of a Job.
76. What is a Code Generator in Talend?
Talend offers a GUI, which allows you to drag and drop the components to design a Job. Simply, it translates these jobs into a Java class. That’s why it is known as a code generator.
77. Can we call job within job?
yes using the trunjob component. In the properties of trunjob component, the job name can be selected.
78. What is tMap?
tMap is an advanced component which can be integrated as a plugin to Talend Studio. This component can transform and route data from single or multiple sources to single or multiple destinations.
79. What Are the Operations of tMap?
tMap performs following operations:
- Data transformation on any fields
- Data multiplexing and demultiplexing
- Fields concatenation and interchange
- Rejection of Data
- Filtering or the field using constraints
80. What is the meaning of MDM with reference to Talend?
MDM stands for Master Data Management. With the help of MDM organization can build and manage a single, consistent and accurate view of the enterprise data. MDM helps to increase business value by improving operational efficiency, marketing effectiveness, planning and regulatory compliance.
81.How can we write the customized java code in Talend?
- First is using routines. Routines can be used in multiple job.
- Second is using components like tjava, tjavarow and tjavaflex in talend job. If we are writing code using components then that is only applicable for that specific job.
82. What is the Migration Task in Talend?
Migration Task ensures the fullness of a project which was developed using the older version of Talend.
83. What is the process of scheduling the Job in Talend?
First of all, you need to export the Job as a separate program. Then you should use OS’ native scheduling tools like Windows Task Scheduler, Cron, etc. to schedule your tasks.
84. Difference between tjava, tjavarow and tjavaflex component in talend?
- tjava – execute only once,
- tjavarow – execute for each row,
- tjavaflex – if we want to perform a specific task at a certain time for that row. tjavaflex contains three sections: Start (code written here will execute before processing any row), Main (code will execute for each row), End (code will execute after processing every row).
85. What is the use of the tLoqateAddressRow component in Talend?
This component helps us to rectify mailing addresses associated with customer data. It helps to ensure a single customer view and better delivery of their customer mailings.
86. How can you access global and context variables?
You need to click Ctrl space key, to access both global and context variables.
87. For sorting data which component we can use in Talend?
To sort data in Talend open studio, ExternalSortRow, and tShortRow.synthesizing commands are used.
88. Is it possible to change the background color of a job designer in Talend?
Yes, it is possible to design the background by selecting the preferences of the window menus, then by just clicking on the color menu you can design background color.
89. What is the importance of tLoqateAddressRow component in Talend?
It is a component of correct mailing address belonging to the specific customer data to make sure a single customer views their respective customer mailing.
90. Why use the Palette setting in talent?
The palette setting allows launching Talend Open Studio tool more quickly as only the current component is needed to load in the project.
91. Is it possible to change the generated code directly in Talend?
No, we cannot generate code directly in Talend open studio.
92. Why String Handling Routines used in Talend?
String Handling routines allow us to take operations and test on an alphanumeric expression which is dependent on Java methods.
93. What are Configuration Tabs?
It is located in the bottom half of the design workspace. In it, every tab displays certain elements in the design workspace.
94. How can you expand the performance of a Talend job which has a complex design?
To improve the performance of Talend job we can do following things:
- Remove redundant fields/columns using tFilterColumns component
- Remove Unwanted data/records using tFilterRows component
- Use Select Query to retrieve data from the database
- Use Database Bulk components
- Use Talend ELT Components when needed
- Split Talend Job into the smaller Subjobs
95. How can you export a job?
Select the job from repository view, right click and then export job. There you can specify if you want to export this job only or dependencies also.
96. Can you outline a schema at run time?
No, schemas should be defined at some point of layout, now not run time.
97. How can you add components in your talend job?
- Components can be selected from palette view and then drag and drop.
- In designer view you can type the component name and then select.
98. Where are you going to implement the logics for integration?
Designer view of talend job.
99. Can we modify the code of talend job in code view?
100. Can we write customized java code in Talend?