New Replica Type for SolrCloud – Perfect for Large Scale Sitecore Web Sites – Part I

New Replica Type for SolrCloud – Perfect for Large Scale Sitecore Web Sites – Part 1

This blog post describes a small test setup to verify an awesome new feature in Solr 7 that seems like a perfect fit for Sitecore; The PULL replica type. Using this feature with SolrCloud we can create collections for our Sitecore Indexes that are dedicated for handling queries, and will not spend resources on receiving updates instantly. So, these indexes will not be as fast updated as the default NRT replica types, and only receive updated index files from the leader. My hypothesis is that a SolrCloud setup where we use TLOG Replica Types to handle the updates from Sitecore CMs and a few PULL replicate types for the Sitecore CD servers must be optimal for high performance web sites.

The test setup

For this small test I have it all setup locally on a single machine. This includes Sitecore, Zookeeper and Solr.

Step 1: Setup an Zookeeper Ensemble

Setup a ZooKeeper Cluster. Not 100% necessary for this test, but for use in a real production environment individual Zookeeper ensembles is highly recommended.

Link to good help here: https://lucene.apache.org/solr/guide/6_6/setting-up-an-external-zookeeper-ensemble.html#setting-up-an-external-zookeeper-ensemble

So I have a minimum Zookeeper Ensemble running:

Zookeeper Ensemble
Zookeeper Ensemble

Step 2: Setup SolrCloud with Sitecore Index

I prepared a Sitecore schema following the usual guides for the previous Solr versions, fixing minor complaints such as the “long” type used for _version_ didn’t exist anymore etc. But all in all, minor stuff. I created a new folder for the new Solr configuration prepared for Sitecore 8.2:

sitecore82 configset structure
My sitecore82 configset structure

Step 3: Start SolrCloud and prepare it for Sitecore Indexes

Before starting SolrCloud I want to upload my default Sitecore Schema to Zookeeper:

Command Prompt: Uploading Sitecore Config to Zookeeper
Command Prompt: Uploading Sitecore Config to Zookeeper

Starting SolrCloud I point to my Zookeeper Ensemble using the z parameter. Before I do that, I start Solr in Cloud mode, and I want to have all my nodes structured nicely on my local drive. I created a folder “sitecore82\node1” under my Solr7 folder, and copy the default solr.xml. Then I can start Solr on Cloud mode using:

Command Promt: Start Solr Cloud
Command Promt: Start Solr Cloud

Step 4: Create the Collections

I decided to add to more nodes to my SolrCloud, simply by copying the current node and rename folders, modifying some configs to get logs into the right folders:

SolrCloud: 3 nodes local
SolrCloud: 3 nodes local

With this I can start node2 and node 3 locally as well – same command as with the first node – just another port and pointing the nodes folder:

>bin\solr start -c -z "localhost:2181,localhost:2182,localhost:2183" -s sitecore82\node2 -p 9002 -noprompt

>bin\solr start -c -z "localhost:2181,localhost:2182,localhost:2183" -s sitecore82\node3 -p 9003 -noprompt

Some errors shows up as Solr tries writing to the same general log files. I ignored that for this test. I configured to that each node keeps it’s own log file.

CREATING the collections

Since I uploaded the configset “Sitecore82” to Zookeeper, it is now visible in Solr Admin for creating collections:

Create Collection Dialog
Create Collection Dialog

BUT I did not use this for creating the collections. Reason being that I can’t set the replica type. Pretty important, since this what I set out to test. So I did a small script using Solr Collections API so I could specify all the parameters as part of a URL (for details: https://lucene.apache.org/solr/guide/7_0/collections-api.html or let me know if you need my simple script for creating all Sitecore collections needed)

Note: As I wanted to have 5 nodes running and know exactly what type of replicas that resides on each node – I created scripts that also handles that part.

Finally, I got all Sitecore Indexes up as collections. Each collection in 5 replicas on 5 nodes. 3 replicas as TLOGs and 2 as PULL:

SolrCloud: 5 replicas all indexes
SolrCloud: 5 replicas all indexes (showing anly 2…)

Step 5: Document routing and Query routing in Solr

By default, SolrCloud decides on what nodes that the different types of replica’s will be created. As mention I handled that by creating the 3 TLOG replicas first, and then used ADDREPLICA command afterward.

Question: So how do I setup an architecture where my Sitecore CD’s servers always query the PULL replica’s? (recommended reading: https://lucene.apache.org/solr/guide/7_0/distributed-requests.html)

By default, Solr redirect queries around the cluster. But in this case, we want to query only the specific nodes, where we placed our PULL replicas. That’s what we want for our Sitecore CD servers. Querying specific nodes in the Solr cluster will also be good for performance.

Two options are available:

  • Use parameter “preferLocalShards”.

This is will make Solr only looking for shards that a local on the server we query. So this should be used with a load balancer so we can query the nodes where we have our PULL replicas running.

  • Search specific shards.

As part for the query to Solr, we can tell Solr to query specific nodes – that could also be an option

The challenge now is to have Sitecore do one of the above things. Without having to do special things on the Sitecore end, it looks like option 1 will work. Adding this as a default parameter to the default query handler that Sitecore use. This adds a complexity of having an additional load balancer, but that is something we is part of any production setup anyway.

Step 6: The final test setup

The small test setup looks like this:

Sitecore & Solr 7 Architecture
Sitecore & Solr 7 Architecture

In next part of this post I will connect the Sitecore instances, and share the test and result from using this new replica type.

 

New Replica Type for SolrCloud – Perfect for Large Scale Sitecore Web Sites – Part I
Tagged on:                 

Leave a Reply

Your email address will not be published. Required fields are marked *