Apache Solr Interview Questions & Answers [GUIDE TO CRACK]
Last updated on 03rd Jul 2020, Blog, Interview Questions
Solr (pronounced “solar”) is an open-source enterprise-search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.
Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr’s external configuration allows it to be tailored to many types of applications without Java coding, and it has a plugin architecture to support more advanced customization.
1) What is Apache Solr?
Apache Solr is a standalone full-text search platform to perform searches on multiple websites and index documents using XML and HTTP. Built on a Java Library called Lucence, Solr supports a rich schema specification for a wide range and offers flexibility in dealing with different document fields. It also consists of an extensive search plugin API for developing custom search behavior.
2)What is Schema in Elasticsearch?
- A schema is a structure that describes multiple fields that provides the detailed overview of the document and its type and the way of handling the fields inside the document. The schema is used for mapping in Elasticsearch which describes the fields in JSON documents with its data types. This process is called schema mapping in Elasticsearch. An Elasticsearch server usually contains zero or more indexes. An index contains multiple types which will have multiple documents in them. The other feature of elastic search is that it can also be schema-less by making the documents to be indexed without providing schema clearly.
- If a mapping is not explicitly provided in elastic search, then a default mapping will be generated automatically while detecting fields during the process of indexing. This is the process of dynamic mapping generation. The mapping will be done in the form of JSON in elastic search and this will be the hierarchically structured format. Each level in the hierarchy will be having properties configuration to make it work flexibly as per requirement. This means each and every level and its child levels will be having each property set to the last level.
3) What are the features of Apache Solr?
- Allows Scalable, high performance indexing Near real-time indexing
- Standards-based open interfaces like XML, JSON and HTTP
- Flexible and adaptable faceting
- Advanced and Accurate full-text search
- Linearly scalable, auto index replication, auto fail over and recovery
- -Allows concurrent searching and updating
- Comprehensive HTML administration interfaces
- Provides cross-platform solutions that are index-compatible
4) What is Apache Lucene?
Supported by Apache Software Foundation, Apache Lucene is a free, open-source, high-performance text search engine library written in Java by Doug Cutting. Lucence facilitates full-featured searching, highlighting, indexing and spellchecking of documents in various formats like MS Office docs, HTML, PDF, text docs and others.
5) What is request handler?
When a user runs a search in Solr, the search query is processed by a request handler. SolrRequestHandler is a Solr Plugin, which illustrates the logic to be executed for any request.Solrconfig.xml file comprises several handlers (containing a number of instances of the same SolrRequestHandler class having different configurations).
6) What are the advantages and disadvantages of Standard Query Parser?
- Also known as Lucence Parser, the Solr standard query parser enables users to specify precise queries through a robust syntax. However, the parser’s syntax is vulnerable to many syntax errors unlike other error-free query parsers like DisMax parser.
- Apache Solr is a standalone full-text search platform to perform searches on multiple websites and index documents using XML and HTTP. Built on a Java Library called Lucence, Solr supports a rich schema specification for a wide range and offers flexibility in dealing with different document fields. It also consists of an extensive search plugin API for developing custom search behavior.
7) What file contains configuration for data directory?
- Solrconfig.xml file contains configuration for data directory.
- What file contains definition of the field types and fields of documents?
- schema.xml file contains definition of the field types and fields of documents.
8) What is Highlighting?
Highlighting Is nothing but the Fragmentation of documents corresponding to the user’s query that is included in the Query response. Afterwards, these fragments are displayed and placed in the special segment, that is used by the users and clients to present the snippets. The Solr contains a number of highlighting utilities and has control over various fields. The highlighting utilities can be called by Handlers of Request and can be reused with the standard query parsers.
9) What is SolrCloud?
The Apache Solr provides high scalable searching capabilities, which allows users to get a highly available cluster of Solr servers and provides fault tolerance. These capabilities of Apache Solr are known as SolrCloud.
10) Explain Faceting in Solr?
The Faceting refers to the categorization and arrangement of all search results that depends upon the index terms. The Faceting process makes the searching task more fluent as the users search for the exact results.
11) What are Type of Token Filters in Elasticsearch Analyzer?
Elasticsearch have number of built in Token filters which can use in custom filters.
12) What is the use of Tokenizer?
The Tokenizer is used to break a stream of text into a series of Tokens, where each Token is an arrangement of characters in the text. The Token that is developed is then passed to the Token Filters which can update, remove and add the Tokens. Afterwards, that field is indexed by the resulting Token stream.
13) What data is declared by Schema?
The data declared by a Schema:
- What fields are required.
- What types of fields are available.
- What field must be used as the primary/unique key.
- How to search and index each field.
14) What are the advantages and disadvantages of the Standard Query Parser?
The Solr standard query parser is also known as Lucene Parser, that helps users to determine precise queries with the help of a robust syntax. The parser’s syntax is very weak to many syntax errors unlike the other error-free query parsers like DisMax parser.
15) Define Dynamic Fields?
If the user forgets to define one or more fields, then the Dynamic Fields are a useful feature. They offer excellent flexibility to index fields that is not explicitly defined in the schema.
16) How to install Solr?
The three steps of Installation are:
- Server-related files, e.g. Tomcat or start.jar (Jetty)
- Solr webapp as a .war
- Solr Home which comprises the data directory and configuration files
17) What are the important configuration files of Solr?
- Solr supports two important configuration files
18) What are the most common elements in solrconfig.xml?
The most common elements in solrconfig.xml are:
- Search components
- Cache parameters
- Data directory location
- Request handlers
19) How to shut down Apache Solr?
- Solr is shut down from the same terminal where it was launched. Click Ctrl+C to shut it down.
- Give the syntax to start the server.
- $ bin/solr start is used to start the server.
20) Which file contains configuration for data directory?
solrconfig.xml file contains configuration for data directory.
21) Which file contains definition of the field types and fields of documents?
schema.xml file contains definition of the field types and fields of documents.
22) What is Character Filter in Elasticsearch Analyzer?
- A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like from the stream.
- An analyzer may have zero or more character filters, which are applied in order.
23) What is Token filters in Elasticsearch Analyzer?
- A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.
- Token filters are not allowed to change the position or character offsets of each token.
- An analyzer may have zero or more token filters, which are applied in order.
24) What is Shard?
In distributed environment, the data is partitioned between multiple Solr instances, where each chunk of that data can be called as a Shard. It contains a subset of the whole index.
25) What is zookeeper in Solr cloud?
Zookeeper is an Apache project that Solr Cloud uses for centralized configuration, coordination, to manage the cluster and to elect a leader.
26) What is Replica in Solr cloud?
In Solr Core, a copy of shard that runs in a node is known as a replica.
27) What is Leader in Solr cloud?
Leader is also a replica of shard, which distributes the requests of the Solr Cloud to the remaining replicas.
28) What is collection in Solr cloud?
A cluster has a logical index that is known as a collection.
29) What is node in Solr cloud?
In Solr cloud, each single instance of Solr is regarded as a node.
30) Which are the main configuration files in Apache Solr?
Following are the main configuration files in Apache Solr:
- Solr.xml – This file is in $SOLR_HOME directory and contains Solr Cloud related information.
- Schema.xml – It contains the whole schema.
- Solrconfig.xml – It contains the definitions and core-specific configurations related to request handling and response formatting.
- Core.properties – This file contains the configurations specific to the core.
Best Hands-on Practical Apache solr Certification Course to Build Your SkillsWeekday / Weekend BatchesSee Batch Details
31) How to start Solr using command prompt?
Following commands need to be used to start Solr:
[Hadoop@localhost ~]$ cd
[Hadoop@localhost ~]$ cd Solr/
[Hadoop@localhost Solr]$ cd bin/
[Hadoop@localhost bin]$ ./Solr start
32) What is a Tokenizer in ElasticSearch ?
A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. Inverted indexes are created and updates using these token values by recording the order or position of each term and the start and end character offsets of the original word which the term represents.
An analyzer must have exactly one Tokenizer.
33) What is the use of field type?
Field type defines how Solr would interpret data in a field and how that field can be queried.
34) What all information is specified in field type?
A field type includes four types of information: Name of field type Field attributes An implementation class name If the field type is Text Field, a description of the field analysis for the field type.
35) What is Field Analyzer?
Working with textual data in Solr, Field Analyzer reviews and checks the filed text and generates a token stream. The pre-process of analyzing of the input text is performed at the time of searching or indexing and at query time. Most Solr applications use Custom Analyzers defined by users. Remember, each Analyzer has only one Tokenizer. You can define an analyzer in the application using the below syntax:
36) What is copying field?
It is used to describe how to populate fields with data copied from another field.
37) Name different types of highlighters?
There are 3 highlighters in Solr:
- Standard Highlighter: provides precise matches even for advanced query parsers.
- Fast Vector Highlighter: Though less advanced than Standard Highlighter, it works better for more languages and supports Unicode break iterators.
- Postings Highlighter: Much more precise, efficient and compact than the above vector one but inappropriate for a more number of query terms
38) What is the use of stats.field?
It is used to generate statistics over the results of arbitrary numeric functions.
39) What command is used to see how to use the bin/Solr script?
Execute # bin/Solr –help to see how to use the bin/Solr script.
40) Which syntax is used to start & stop Solr?
$ bin/solr start
This will start Solr in the background, listening on port 8983. Script Help To see how to use the bin/solr script, execute:
$ bin/solr -help
For specific usage instructions for the start command, do
$ bin/solr start -help
Start Solr in the Foreground Since Solr is a server, it is more common to run it in the background, especially on Unix/Linux. However, to start Solr in the foreground, simply do:
$ bin/solr start -f
If you are running Windows, you can run:
bin\solr.cmd start -f
Start Solr with a Different Port
To change the port Solr listens on, you can use the -p parameter when starting, such as:
$ bin/solr start -p 8984
When running Solr in the foreground (using -f), then you can stop it using Ctrl-c. However, when running in the background, you should use the stop command, such as:
$ bin/solr stop -p 8983
The stop command requires you to specify the port Solr is listening on or you can use the -all parameter to stop all running Solr instances.
Check if Solr is Running
If you’re not sure if Solr is running locally, you can use the status command:
- $ bin/solr status
42) Name the basic Field types in Solr ?
43) What file contains configuration for the knowledge directory?
Solrconfig.xml file contains configuration for the knowledge directory.
44) What file contains the definition of the sphere varieties and fields of documents?
schema.xml file contains the definition of the sphere varieties and fields of documents.
45) What’s the utilization of a tokenizer?
It is wont to split a stream of text into a series of tokens, wherever every token may be a subsequence of characters within the text. The token made are then more matured Token Filters that may add, take away, or update the tokens. Later, that field is indexed by the ensuing token stream.
46) What’s the phonetic filter?
The phonetic filter creates tokens victimization one among the phonetic coding algorithms within the org.apache.commons.codec.language package.
47) What’s Highlighting?
Highlighting refers to the fragmentation of documents matching the user’s question enclosed within the question response. These fragments are then highlighted and placed in an exceedingly special section, that is employed by shoppers and users to gift the snippets. Solr consists of a variety of lightness utilities having management over completely different fields. The lightness utilities are referred to as by Request Handlers and reused with normal question parsers.
48) Name differing kinds of highlighters?
There are three highlighters in Solr:
- Standard Highlighter: provides precise matches even for advanced query parsers.
- FastVector Highlighter: although less advanced than normal Highlighter, it works higher for a lot of languages and supports Unicode break iterators.
- Postings Highlighter: way more precise, economical, and compact than the higher than vector one however inappropriate for a lot of variety of question terms.
49) What are the foremost common components in solrconfig.xml?
- Search elements
- Cache parameters
- knowledge directory location
- Request handlers
50) Explain the internal architecture of Apache solr?
- The architecture of the Apache Solr contains the following components.
- Request Handler – It handles all the requests sent to the Apache Solr. These requests are mainly queries sent to request or update.
- Search Component – This is the search component for doing spell checking, query, hit highlighting, faceting, etc.
- Query Parser – This component verifies the queries for syntactical errors.
- Response Writer – This component generates the required formatted output for the user queries.
- Analyzer – It examines the text fields and generates a token stream.
- Update Request Processor – This is a set of plugins that includes signature, logging, and indexing. All the update requests are sent to this item, and modifications are performed on it such as dropping, adding a field, etc.
Get JOB Oriented Apache Solr Training from Real Time Experts
- Instructor-led Sessions
- Real-life Case Studies
51) List few difference between Apache Solr and Lucene?
Apache Solr is a standalone predefined web app which follows Lucene. While Lucene is a low-level library of JAVA that is used for implementing searching, indexing, etc.
Functions of Solr are:-
- Hit highlighting
- XML/HTTP and JSON APIs
- Web administration interface
- Faceted Search and Filtering etc
Functions of Lucene are:-
- Efficient Search Algorithm
- Cross-Platform Solution
52) What are streaming expressions in Apache Solr?
The streaming expression is used to provide the best streaming processing language for Solr cloud.
Some functions are given below:-
- Fast interactive MapReduce
- Streaming NLP
- Aggregations Publish/subscribe messaging
- Request/response stream processin
- Anomaly detection
53) List few difference between Apache Solr and Apache Drill?
Few differences between Apache Solr and Apache Drill are:-
- Apache drill is an RDBMS while Solr is a search engine.
- Both are open source.
- Apache Drill is a free SQL Query engine while solr is a widely used search engine based on Apache Lucene.
54) Do we need different server to run Solr?
Yes however we don’t need any different server when you want to run Solr at the local level & you can run it very easily if someone not required high availability.
In different cases, the no. of servers are also different, like if you want to improve the response time bit required. For bit more servers required.
55) Do you know how to shut down Apache Solr correctly?
Well, this is easy to shut down Solr correctly. First of all, you should shut down the Solr at the same terminal it was started. You can use shortcut key CTRL + C to shut it down properly without any loss of data.
56) Which type of data is generally declared by the schema?
Schema generally declares how to index each field, which type of fields are available within a schema, which fields are necessary to define, and which filed can be used a primary key for the database.
57) What are the most common component elements in Apache Solr?
They are search components, Cache parameters, Request handlers and location of the data directory.
58) Explain Faceting?
It is the arrangement of search results into categories based on indexed terms.
59) Define Dynamic Field?
It is used to allow solr to index fields that you did not explicitly define in the schema.
60) To see how to use the bin/solr script which command is used?
Execute $ bin/solr –helpto see how to use the bin/solr script.
61) What is an Analyzer in ElasticSearch ?
While indexing data in Elastic Search, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is building block of character filters, tokenizers and token filters. Following types of Built-in Analyzers are available in Elasticsearch 5.6.
|Standard Analyzer||Divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lower cases terms, and supports removing stop words.|
|Simple Analyzer||Divides text into terms whenever it encounters a character which is not a letter. It lower cases all terms.|
|White space Analyzer||Divides text into terms whenever it encounters any white space character. It does not lowercase terms.|
|Stop Analyzer||It is like the simple analyzer, but also supports removal of stop words.|
|Keyword Analyzer||A “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.|
|Pattern Analyzer||Uses a regular expression to split the text into terms. It supports lower-casing and stop words.|
|Language Analyzer||Elasticsearch provides many language-specific analyzers like English or French.|
|Finger Print Analyzer||A specialist analyzer which creates a fingerprint which can be used for duplicate detection.|
62) What is inverted index in Elasticsearch ?
Inverted Index is backbone of Elasticsearch which make full-text search fast. Inverted index consists of a list of all unique words that occurs in documents and for each word, maintain a list of documents number and positions in which it appears.
For Example : There are two documents and having content as :
1: FacingIssuesOnIT is for ELK.
2: If ELK check FacingIssuesOnIT.
To make inverted index each document will split in words (also called as terms or token) and create below sorted index .
Term Doc_1 Doc_2
FacingIssuesOnIT | X | X
is | X |
for | X |
ELK | X | X
If | | X
check | | X
Now when we do some full-text search for String will sort documents based on existence and occurrence of matching counts .
Usually in Books we have inverted indexes on last pages. Based on the word we can thus find the page on which the word exists.
63) What is a Type in Elasticsearch ?
Type is logical category/grouping/partition of index whose semantics is completely up to user and type will always have same number of columns for each documents.
ElasticSearch => Indices => Types => Documents with Fields/Properties
64) What is a Document Type in Elaticsearch?
A document type can be seen as the document schema / mapping definition, which has the mapping of all the fields in the document along with its data types.
65) What is indexing in ElasticSearch ?
The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update/modification is attempted, a new version of the document is written to the index.
66) What is Replica in Elasticsearch?
Replica is copy of shard which store on different node or same node. A shard can have zero or more replica. If shard on one node then replica of shard will store on another node.
67) What are Benefits of Shards and Replica in Elasticsearch?
- Shards splits indexes in horizontal partition for high volumes of data.
- It perform operations parallel to each shards or replica on multiple node for index so that increase system performance and throughput.
- Recovered easily in case of fail-over of node because data replica exist on another node because replica always store on different node where shards exist.
Some Important Points:
- When we create index by default elasticseach index configure as 5 shards and 1 replica but we can configure it from config/elasticsearch.yml file or by passing shards and replica values in mapping when index create.
- Once index created we can’t change shards configuration but modify in replica. If need to update in shards only option is re-indexing.
- Each Shard itself a Lucene index and it can keep max 2,147,483,519 (= Integer.MAX_VALUE – 128) documents. For merging of search results and failover taken care by elasticsearch cluster.
68) What is Document in Elasticsearch?
Each Record store in index is called a document which store in JSON object. Document is Similar to row in term of RDBMS only difference is that each document will have different number of fields and structure but common fields should have same data type.
69) What is Elasticsearch Node?
- Node is a Elasticsearch server which associate with in a cluster. It’s store data , help cluster for indexing data and search query. It’s identified by unique name in Cluster if name is not provided in elasticsearch will generate random Universally Unique Identifier(UUID) on time of server start.
- A Cluster can have one or more Nodes .If first node start that will have Cluster with single node and when other node will start will add with that cluster.
Data Node Documents Storage
In above screen trying to represent data of two indexes like I1 and I2. Where Index I1 is having two type of documents T1 and T2 while index I2 is having only type T2 and these shards are distributes over all nodes in cluster. This data node is having documents of shard (S1) for Index I1 and shard (S3) for Index I2. It’s also keeping replica of documents of shards S2 of Index I2 and I1 which are store some other nodes in cluster.
70) What is Elasticsearch Cluster ?
- Cluster is a collection of one or more nodes which provide capabilities to search text on scattered data on nodes. It’s identified by unique name with in network so that all associated nodes will join together by cluster name.
- Operation Persistent : Cluster also maintain keep records of all transaction level changes for schema if anything get change in data for index and track of availability of Nodes in cluster so that make data easily available if any fail-over of any node.
In above screen Elasticsearch cluster “FACING_ISSUE_IN_IT” having three master and four data node.
71) What are the advantages of Elasticsearch?
- Elasticsearch is implemented on Java, which makes it compatible on almost every platform.
- Elasticsearch is Near Real Time (NRT), in other words after one second the added document is searchable in this engine.
- Elasticsearch cluster is distributed, which makes it easy to scale and integrate in any big organizations.
- Creating full backups of data are easy by using the concept of gateway, which is present in Elasticsearch.
- Elasticsearch REST uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
- Elasticsearch supports almost every document type except those that do not support text rendering.
- Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
72) What are the Disadvantages of Elasticsearch?
- Elasticsearch does not have multi-language support in terms of handling request and response data in JSON while in Apache Solr, where it is possible in CSV, XML and JSON formats.
- Elasticsearch have a problem of Split Brain situations, but in rare cases.