Thursday, February 15, 2018

Apache solr replication step by step


Need for Replication :
  • When there is a large search volume that cannot be handled by a single machine, so you need to distribute searches across multiple read-only copies of the index.
  • If there is a high volume of indexing which consumes machine resources and reduces search performance on the indexing machine, so you need to separate indexing and searching.
  • When we want to make a backup of the index

MASTER-SLAVE



  • Distributes complete copies of a master index to one or more slave servers. 
  • The master server continues to manage updates to the index. 
  • All querying is handled by the slaves. 
  • This enables Solr to scale to provide adequate responsiveness to queries against large search volumes.
Replication Terminology:

Index
A Lucene index is a directory of files. These files make up the searchable and returnable data of a Solr Core.

Distribution
The copying of an index from the master server to all slaves. 

Inserts and Deletes
  • As inserts and deletes occur in the index, the directory remains unchanged. Documents are always inserted into newly created files. 
  • Documents that are deleted are not removed from the files.
  • They are flagged in the file, deletable and are not removed from the files until the index is optimized.
Master and Slave
  • A Solr replication master is a single node which receives all updates initially and keeps everything organized.
  • Solr replication slave nodes receive no updates directly, instead all changes (such as inserts, updates, deletes, etc.) are made against the single master node. 
  • Changes made on the master are distributed to all the slave nodes which service all query requests from the clients.

Repeater
A node that acts as both a master and a slave.

Optimization
  • A process that compacts the index and merges segments in order to improve query performance.
  • Optimization should only be run on the master nodes. An optimized index may give query performance gains compared to an index that has become fragmented over a period of time with many updates. 
  • Distributing an optimized index requires a much longer time than the distribution of new segments to an un-optimized index

Snapshot
A directory containing hard links to the data files of an index. Snapshots are distributed from the master nodes when the slaves pull them, "smart copying" any segments the slave node does not have in snapshot directory that contains the hard links to the most recent index data files.

Configuring the Replication RequestHandler on a Master Server:
commit-Triggers replication whenever a commit is performed on the master index.
optimize-Triggers replication whenever the master index is optimized.
startup-Triggers replication whenever the master index starts up.

MASTER SLAVE CONFIGURATION

MASTER

The configuration of the master are configured in the below file.

vim /var/solr/data/fortis/conf/solrconfig.xml


vim /var/solr/data/fortis/core.properties

Make the below changes in the core.properties file 

enable.master=true
enable.slave=false


SLAVE

To create a new collection use the below command

/usr/local/solr-6.6.2/bin/solr create -c prod

The configuration of the master are configured in the below file.

vim /var/solr/data/fortis/conf/solrconfig.xml


Make the below changes in the core.properties file 

vim /var/solr/data/fortis/core.properties

enable.master=false
enable.slave=true

Now we have setup the master-slave replication in the apache solr.



0 comments:

Post a Comment