Monday, November 27, 2017

Apache SOLR introduction & installation

In this post, we are going to discuss about Apache Solr and how to install them step by step.

Why Solr?



Lucene:

  • Search storage engine
  • Solr uses concepts derived from Lucene
  • Lucene is widely used in many projects among one of them is solr
  • Used to index & search with high performance.
  • Solr uses lucene as its backend
Solr:
  • Search server
  • Document oriented
  • Stores data & indexing
  • Searches including full-text search, stemming,hit-highlighting, faceted-search etc that cannot done by native databases
  • Vertically and horizontally scalable
  • Replication for high availability
  • Sharding for distributed search
  • Performs in memory,grouping,counting,similar products in single shot
  • Exposed over HTTP,REST like api.
  • The DataImportHandler provides a configuration driven way to import data from relational databases or XML files, into Solr in both “full import” and “incremental delta import” mode.

INVERTED INDEX:

  • Searches document by unique word
  • Similar to the index back of back

D1- I like Apache services.
D2- They include all kind of database & services support.
D3- I would recommend their services to my clients too.

Working:
  • Define a schema. 
  • Deploy Solr.
  • Feed Solr documents for which your users will search.
  • Expose search functionality in your application.
Solr schema:
  • No schema
  • Has index that contain docs
  • Fields are used to index,search & store

Define a schema
The schema tells Solr about the contents of documents it will be indexing. In the online store example, the schema would define fields for the product name, description, price, manufacturer, and so on. 

Defining fields:

Indexing:
Indexing is a technique of adding Document’s content to Solr Index so that we can search them easily. Apache Solr uses Apache Lucene Inverted Index technique to Index it’s documents. That’s why Solr provides very fast searching feature.

Field analyzers: (Analyzer=tokenizer+filters)

Used both during ingestion, when a document is indexed, and at query time.Analyzers may be a single class or series of tokenizer and filter classes.


Alternative words-finish,complete
misspelled-google,gogle

Tokenizers:
Break field data into lexical units, or tokens.
Pre- Stripping html tags
Post-Stemming(replace) tables as collection
         Stop word filter (the,is,and)

Filters:(used for indexing)
  • Examine a stream of tokens and keep them, transform or discard them, or create new ones. 
  • Tokenizers and filters may be combined to form pipelines, or chains, where the output of one is input to the next. 
  • Such a sequence of tokenizers and filters is called an analyzer and the resulting output of an analyzer is used to match query results or build indices.
eg)ram,RAM,Ram


Deploying Solr

Pre requisites:

yum update
yum install java-1.8.0-openjdk.x86_64
java -version

Installation:

tar zxvf solr-6.6.1.tgz
cp /opt/solr-6.6.1/bin/install_solr_service.sh .
rm -rf solr-6.6.1
./install_solr_service.sh solr-6.6.1.tgz
ps -ef | grep solr



Feed Solr documents for which your users will search Creating a project(core): A Core is an Index of texts and fields available in all documents. One Solr Instance can contain one or more Solr Cores. /opt/solr-6.6.1/bin/solr create -c jerwin Create new document : We can create a new document to the core with 3 fields and value using the below command on the terminal using curl: curl http://localhost:8983/solr/jerwin/update -d ' [ {"id" : "db1", "company_name" : "Mafiree", "location" : "Nagercoil" } ]'



View:

The inserted data can be viewed with the id specified.
curl http://localhost:8983/solr/jerwin/get?id=db1



Output can be viewed in the browser:



Hope this gives you simple introduction about SOLR,let me know if you have any concerns via comments.




0 comments:

Post a Comment