Apache SOLR introduction & installation ~ JERWIN ROY'S BLOG

In this post, we are going to discuss about Apache Solr and how to install them step by step.

Why Solr?

Lucene:

Search storage engine
Solr uses concepts derived from Lucene
Lucene is widely used in many projects among one of them is solr
Used to index & search with high performance.
Solr uses lucene as its backend

Solr:

Search server
Document oriented
Stores data & indexing
Searches including full-text search, stemming,hit-highlighting, faceted-search etc that cannot done by native databases
Vertically and horizontally scalable
Replication for high availability
Sharding for distributed search
Performs in memory,grouping,counting,similar products in single shot
Exposed over HTTP,REST like api.
The DataImportHandler provides a configuration driven way to import data from relational databases or XML files, into Solr in both “full import” and “incremental delta import” mode.

INVERTED INDEX:

Searches document by unique word
Similar to the index back of back

D1- I like Apache services.
D2- They include all kind of database & services support.
D3- I would recommend their services to my clients too.

Working:

Define a schema.
Deploy Solr.
Feed Solr documents for which your users will search.
Expose search functionality in your application.

Solr schema:

No schema
Has index that contain docs
Fields are used to index,search & store

Define a schema

The schema tells Solr about the contents of documents it will be indexing. In the online store example, the schema would define fields for the product name, description, price, manufacturer, and so on.

Defining fields:

Indexing:

Indexing is a technique of adding Document’s content to Solr Index so that we can search them easily. Apache Solr uses Apache Lucene Inverted Index technique to Index it’s documents. That’s why Solr provides very fast searching feature.

Field analyzers: (Analyzer=tokenizer+filters)

Used both during ingestion, when a document is indexed, and at query time.Analyzers may be a single class or series of tokenizer and filter classes.

Alternative words-finish,complete

misspelled-google,gogle

Tokenizers:

Break field data into lexical units, or tokens.

Pre- Stripping html tags

Post-Stemming(replace) tables as collection

Stop word filter (the,is,and)

Filters:(used for indexing)

Examine a stream of tokens and keep them, transform or discard them, or create new ones.
Tokenizers and filters may be combined to form pipelines, or chains, where the output of one is input to the next.
Such a sequence of tokenizers and filters is called an analyzer and the resulting output of an analyzer is used to match query results or build indices.

eg)ram,RAM,Ram

Deploying Solr

Pre requisites:

yum update

yum install java-1.8.0-openjdk.x86_64

java -version

Installation:

wget http://apache.org/dist/lucene/solr/6.6.1/solr-6.6.1.tgz

tar zxvf solr-6.6.1.tgz
cp /opt/solr-6.6.1/bin/install_solr_service.sh .
rm -rf solr-6.6.1
./install_solr_service.sh solr-6.6.1.tgz
ps -ef | grep solr

Feed Solr documents for which your users will search Creating a project(core): A Core is an Index of texts and fields available in all documents. One Solr Instance can contain one or more Solr Cores. /opt/solr-6.6.1/bin/solr create -c jerwin Create new document : We can create a new document to the core with 3 fields and value using the below command on the terminal using curl: curl http://localhost:8983/solr/jerwin/update -d ' [ {"id" : "db1", "company_name" : "Mafiree", "location" : "Nagercoil" } ]'

View:

The inserted data can be viewed with the id specified.

curl http://localhost:8983/solr/jerwin/get?id=db1

Output can be viewed in the browser:

Hope this gives you simple introduction about SOLR,let me know if you have any concerns via comments.

JERWIN ROY'S BLOG

Knowledge Sharing about Databases

Monday, November 27, 2017

Apache SOLR introduction & installation

0 comments:

Post a Comment

About me

Popular Posts

Blog Archive