Coveo for Sitecore – Part 3 : Full text search and Sitecore items

Since Sitecore separates content from presentation, enabling full text search in Sitecore is a challenge, regardless of the search tool being used.

By default, Coveo for Sitecore searches for keywords in all free-text fields and in the document’s body (coveo index field). To have the free text search look in more fields, Coveo provides three options

  • Indexing Documents with Basic HTML Content: This is the easiest way to add more fields to free text search. By adding BasicHtmlContentInBodyProcessor to the coveoPostItemProcessingPipeline, you can add fields that will be used by free text search, excerpt and quick view. However, this technique limits you to the fields defined on the current item. Which means it still does not index data sources.
  • Indexing Documents with HTML Content Processor: This technique executes a page request and downloads the page content. This solves the problem of indexing data sources, but it comes at a cost. If a Sitecore item is secured (cannot be accessed anonymously), Coveo will not be able to get the quick view for that item. This leaves us with the last option.
  • Configure more fields for free-text search in fieldMap, by setting the includeForFreeTextSearch attribute to true (see fieldMap section of Understanding the Coveo Search Provider’s Configuration File): As you might have guessed, it has the same limitations as the BasicHtmlContentInBodyProcessor.

This leaves us with one last option, custom code i.e. extend Coveo. Here is my solution to the problem.

Step 1: Set the IndexReferrerItemsOnUpdate value to true. This is usually defined in the Coveo.Search.Provider.config

<!-- Set this to true if you want referrer items to be indexed on an item update -->
<IndexReferrerItemsOnUpdate>true</IndexReferrerItemsOnUpdate>

 

Step 2: Create a custom coveoPostItemProcessingPipleine processor and include all the field values from the data sources in the Coveo item binary data

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Reflection;
using System.Text;
using Coveo.Framework.Log;
using Coveo.Framework.Processor;
using Coveo.SearchProvider.Pipelines;
using Coveo.SearchProvider.Processors;
using Sitecore;
using Sitecore.ContentSearch;
using Sitecore.Data.Fields;
using Sitecore.Data.Items;
using Sitecore.Links;

namespace CoveoForSitecore.Extensions.Pipelines.CoveoPostItemProcessing
{
 public class DataSourceHtmlSourceContentInBodyProcessor : IProcessor<CoveoPostItemProcessingPipelineArgs>
 {
 private static readonly ILogger _sLogger = CoveoLogManager.GetLogger(MethodBase.GetCurrentMethod().DeclaringType);
 private const string SitecoreSystemFieldBeginning = "_";
 public const string BasicHtmlDateFormat = "dddd, MMMM d, yyyy";

 public bool IncludeFieldNames { get; set; }

 public bool IncludeTextFieldsOnly { get; set; }
 /// <summary>
 /// Sitecore field names: name of fields that should be added to the document's body
 /// also include field names for data source items
 /// </summary>
 public string FieldsToInclude { get; set; }
 /// <summary>
 /// Sitecore template names: list all templates that should be processed. 
 /// Include datasource item template names
 /// </summary>
 public string TemplatesToInclude { get; set; }

 private IEnumerable<string> FieldsToIncludeValues
 {
 get
 {
 return SplitValues(FieldsToInclude);
 }
 }

 private IEnumerable<string> TemplatesToIncludeValues
 {
 get
 {
 return SplitValues(TemplatesToInclude);
 }
 }

 public DataSourceHtmlSourceContentInBodyProcessor()
 {
 IncludeFieldNames = true;
 IncludeTextFieldsOnly = false;
 }
 public void Process(CoveoPostItemProcessingPipelineArgs args)
 {
 _sLogger.TraceEntering("Process");
 _sLogger.Debug("Entering the DataSourceHtmlSourceContentInBodyProcessor.");
 try
 {
// get the current item being indexed
 var item = (Item)(args.Item as SitecoreIndexableItem);
 if (item != null && args.CoveoItem.BinaryData == null)
 {
//prepare HTML content builder
 var htmlContentBuilder = new HtmlContentBuilder();
 foreach (var pField in GetIncludedFields(args.Item.Fields))
 {
 if (pField.FieldType == typeof(DateField))
 {
 var s = pField.Value.ToString();
 DateTime result;
 if (DateTime.TryParse(s, CultureInfo.CurrentCulture,
 DateTimeStyles.None, out result))
 htmlContentBuilder.AddElement(pField.Name,
 result.ToString(BasicHtmlDateFormat), IncludeFieldNames);
 else
 _sLogger.Debug(
 "The date field with the value {0} will be ignored since it's an unknown date format.",
 s);
 }
 else
 htmlContentBuilder.AddField(pField, IncludeFieldNames);
 }
//get all items linked through presentation components (data sources)
 var dataSources =
 Globals.LinkDatabase.GetReferences(item)
 .Where(link => IsLayoutLink(link, item))
 .Select(link => link.GetTargetItem())
 .Where(targetItem => targetItem != null)
 .Distinct();
// read field values from data source items and add it to the html content
 foreach (var dsItem in dataSources)
 {
 if (!IsTemplateIncluded(dsItem.TemplateName)) continue;
 var indexableItem = (SitecoreIndexableItem)dsItem;
 foreach (var pField in GetIncludedFields(indexableItem.Fields))
 {
 if (pField.FieldType == typeof(DateField))
 {
 var s = pField.Value.ToString();
 DateTime result;
 if (DateTime.TryParse(s, CultureInfo.CurrentCulture,
 DateTimeStyles.None, out result))
 htmlContentBuilder.AddElement(pField.Name,
 result.ToString(BasicHtmlDateFormat), IncludeFieldNames);
 else
 _sLogger.Debug(
 "The date field with the value {0} will be ignored since it's an unknown date format.",
 s);
 }
 else
 htmlContentBuilder.AddField(pField, IncludeFieldNames);
 }
 }
 args.CoveoItem.BinaryData = Encoding.UTF8.GetBytes(htmlContentBuilder.GetHtml());

 }
 }
 catch (Exception ex)
 {
 _sLogger.Error(ex.Message, args);
 }
 _sLogger.TraceExiting("Process");
 }

 private bool IsTemplateIncluded(string pTemplateName)
 {
 _sLogger.TraceEntering("IsTemplateIncluded");
 bool flag =
 TemplatesToIncludeValues.Any(x => x.Equals(pTemplateName, StringComparison.OrdinalIgnoreCase));
 _sLogger.TraceExiting("IsTemplateIncluded");
 return flag;
 }
 private IEnumerable<IIndexableDataField> GetIncludedFields(IEnumerable<IIndexableDataField> pFields)
 {
 _sLogger.TraceEntering("GetIncludedFields");
 var source = pFields.Where(x => !x.Name.StartsWith(SitecoreSystemFieldBeginning));
 if (FieldsToIncludeValues.Any())
 source = source.Where(x => FieldsToIncludeValues.Contains(x.Name.ToLowerInvariant()));
 if (IncludeTextFieldsOnly)
 source = source.Where(x => x.FieldType == typeof(TextField));
 _sLogger.TraceExiting("GetIncludedFields");
 return source;
 }


 protected virtual bool IsLayoutLink(ItemLink link, Item sourceItem)
 {
 return link.SourceFieldID == FieldIDs.LayoutField && link.SourceDatabaseName == sourceItem.Database.Name;
 }


 private IEnumerable<string> SplitValues(string pValues)
 {
 _sLogger.TraceEntering("SplitValues");
 IEnumerable<string> enumerable = Enumerable.Empty<string>();
 if (!string.IsNullOrEmpty(pValues))
 enumerable =
 pValues.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries)
 .Select(x => x.ToLowerInvariant().Trim());
 _sLogger.TraceExiting("SplitValues");
 return enumerable;
 }
 }
}

Step 3: Add the processor to coveoPostItemProcessingPipeline

<coveoPostItemProcessingPipeline>
<!-- this procesor replaces coveo basic html processor to include datasource fields-->
<processor type="CoveoForSitecore.Extensions.Pipelines.CoveoPostItemProcessing.DataSourceHtmlSourceContentInBodyProcessor, CoveoForSitecore.Extensions">
<IncludeFieldNames>false</IncludeFieldNames>
<IncludeTextFieldsOnly>false</IncludeTextFieldsOnly>
<TemplatesToInclude>Base Page,Content Page,Resource Page,Account Page,Home Page,News Page,Callout,FAQ,Base Layer,Callouts Grid Layer,Image Layer,Image Layer With Orb,Latest News Layer</TemplatesToInclude>
<FieldsToInclude>Name,DisplayName,Heading,Body,Description,Question,Answer</FieldsToInclude>
</processor>
</coveoPostItemProcessingPipeline>

Step 4: Re-index and test your search page.

 

Coveo for Sitecore – Part 3 : Full text search and Sitecore items

4 thoughts on “Coveo for Sitecore – Part 3 : Full text search and Sitecore items

  • December 8, 2015 at 8:56 am
    Permalink

    I’ve tried the code but it’s never hit because the args.CoveItem.BinaryData property is never null. Why is that?

    Reply
  • December 10, 2015 at 3:25 pm
    Permalink

    Hey Daniel,

    It could be because of another processor in coveoPostItemProcessingPipeline before this one. You may need to tweak the code a little to append to binary data if that is the case.

    Thanks,
    Niket

    Reply
  • December 11, 2015 at 6:45 pm
    Permalink

    I’ve implemented this code, and it is indexing the content in the related items (the Data Sources), but not on the page that is pulling in the data source. In other words, when I do a search for text that is in a data source it is returning the template that has that data and not my item that has the rendering. Is there something I’m missing?

    Reply
  • December 15, 2015 at 5:50 pm
    Permalink

    Hey Nick,

    can you share your configuration file and the search query you are running?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *