Tuesday, September 12, 2017

Indexing from external sources - Part 1

Introduction

I am sure that major part of you guys were using indexes in Sitecore and you were configuring it by yourself. It may be problematic at first time, but in every next time it is easier and finally you are going to do it automatically like a robot!
Last time I was faced task where I had to prepare search mechanism (AGAIN!), so the first though was that I need to index content and prepare service to search - piece of cake! But after reading acceptance criteria, I've noticed that I have to search not only by sitecore content, but also by huge XML with items provided by 3rd party service. Then I realised that it will be something new, so I was looking for the best solution
I was aware that I have to create new crawler but I didn't know how I can do it. Very helpful for me was this article - many thanks to author! (If you read this - I owe you a beer! :D )

Input source

Let's say that we have XML file which we want to index and it looks like this:
<?xml version="1.0"?>
<Products>
  <Product>
      <Id>1</Id>
      <Description>Lorem Ipsum</Description>
  </Product>
  <Product>
      <Id>2</Id>
      <Description>Dolor Sit Etem</Description>
  </Product>
    <Product>
      <Id>3</Id>
      <Description>Sed do eiusmod tempor</Description>
  </Product>
</Products>

Custom Crawler Configuration

To have a possibilty of indexing data from outside of the sitecore, we must create custom crawler. Let's start from adding it within our index configuration. For the demo purpose I've created new index configuration.
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
        <indexes hint="list:AddIndex">
          <index id="custom_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="name">$(id)</param>
            <param desc="core">$(id)</param>
            <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
            <configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration">
              <indexAllFields>true</indexAllFields>
              <fieldMap ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration/fieldMap"/>
              <documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
              </documentOptions>
            </configuration>
            <strategies hint="list:AddStrategy">
              <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
            </strategies>
            <locations hint="list:AddCrawler">
              <!--here we have to add our custom crawler-->
              <crawler type="SitecoreBlog.Search.Crawlers.CustomCrawler, SitecoreBlog.Search">
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>
As you can see, I've added crawler in our configuration, so now it is time to add implementation.

Custom Crawler Implementation

using System.Collections.Generic;
using Sitecore.ContentSearch;
using SitecoreBlog.Search.Model;

namespace SitecoreBlog.Search.Crawlers
{
    public class CustomCrawler : FlatDataCrawler<IndexableProduct>
    {
        protected override IndexableProduct GetIndexableAndCheckDeletes(IIndexableUniqueId indexableUniqueId)
        {
            return null;
        }

        protected override IndexableProduct GetIndexable(IIndexableUniqueId indexableUniqueId)
        {
            return null;
        }

        protected override bool IndexUpdateNeedDelete(IndexableProduct indexable)
        {
            return false;
        }

        protected override IEnumerable<IIndexableUniqueId> GetIndexablesToUpdateOnDelete(IIndexableUniqueId indexableUniqueId)
        {
            return null;
        }

        protected override IEnumerable<IndexableProduct> GetItemsToIndex()
        {
            var list =  new List<IndexableProduct>() { new IndexableProduct(new Product()
            {
                Description = "lorem ipsum"
            }),
    
            };

            return list;
        }
    }
}
To achieve our goal we have to add inheritance in our crawler from FlatDataCrawler and use generic type with definition of indexable item. In our case it will be IndexableProduct. In method GetItemsToIndex we have to return collection of items which we want index, so it is perfect place to return elements from provided XML.

Indexable Product

Here we are collecting properties from Product model and checking if they contains IndexInfo attribute.
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Reflection;
using Sitecore.ContentSearch;
using SitecoreBlog.Search.Attributes;
using SitecoreBlog.Search.Model;

namespace SitecoreBlog.Search.Crawlers
{
    public class IndexableProduct : IIndexable
    {
        private readonly Product _product;

        public IndexableProduct(Product product)
        {
            _product = product;
        }

        public void LoadAllFields()
        {
            Fields = _product.GetType()
                .GetProperties()
                .Where(fi => fi.GetCustomAttribute<IndexInfo>() != null)
                .Select(fi => new IndexableProductDataField(_product, fi));
        }

        public IIndexableDataField GetFieldById(object fieldId)
        {
            return Fields.FirstOrDefault(f => f.Id.Equals(fieldId));
        }

        public IIndexableDataField GetFieldByName(string fieldName)
        {
            return Fields.FirstOrDefault(f => f.Name.Equals(fieldName));
        }

        public IIndexableId Id => new IndexableId<string>(Guid.NewGuid().ToString());

        public IIndexableUniqueId UniqueId => new IndexableUniqueId<IIndexableId>(Id);

        public string DataSource => "Product";

        public string AbsolutePath => "/";

        public CultureInfo Culture => new CultureInfo("en");

        public IEnumerable<IIndexableDataField> Fields { get; private set; }
    }
}

Indexable Product Data Field

In this place, properties are prepared to be indexed. Property Product model gets name from IndexInfo attribute and sets it as a field name of indexable data field. It means that field in indexed document will have name from attribute in model.
using System;
using System.Reflection;
using Sitecore.ContentSearch;
using SitecoreBlog.Search.Attributes;
using SitecoreBlog.Search.Model;

namespace SitecoreBlog.Search.Crawlers
{
    public class IndexableProductDataField : IIndexableDataField
    {
        private readonly Product _product;
        private readonly PropertyInfo _fieldInfo;

        public IndexableProductDataField(Product concreteObject, PropertyInfo fieldInfo)
        {
            _product = concreteObject;
            _fieldInfo = fieldInfo;
        }

        public Type FieldType => _fieldInfo.PropertyType;

        public object Id => _fieldInfo.Name.ToLower();

        public string Name
        {
            get
            {
                var info = _fieldInfo.GetCustomAttribute<IndexInfo>();
                return info.Name;
            }
        }

        public string TypeKey => string.Empty;

        public object Value => _fieldInfo.GetValue(_product);
    }
}

Index Info Attribute

This attribute will help us to determinate how name of property will look within the index.

using System;

namespace SitecoreBlog.Search.Attributes
{
    [AttributeUsage(AttributeTargets.Property)]
    public class IndexInfo : Attribute
    {
        public string Name { get; private set; }

        public IndexInfo(string name)
        {
            Name = name;
        }
    }
}
The usage this attribute in our model object properties let us to add those properties to index and define theirs names. In case of lack this attribute in property, property will not be indexed. Let's see usage of this attribute in model

Product Model

In this step we will define model which will be used to prepare and store documents ready to indexing
using SitecoreBlog.Search.Attributes;

namespace SitecoreBlog.Search.Model
{
    public class Product
    {
        [IndexInfo("productid")]
        public int Id { get; set; }

        [IndexInfo("description")]
        public string Description { get; set; }
    }
}
As you can see, we have used here attribute which was presented before. So in results, in our Solr core, we will have documents with two fields: "document_id" and "text"

Results

When the all steps presented above are done, there is a need to rebuild index and our XML file should be indexed in Solr core. As a prove I am attaching screenshot from my Solr panel.



In the next post I am going to show you how index data from few sources into one core and configure search mechanism to work with all this data.

Stay tuned! :)




Share:

5 comments:

  1. Hi,
    I have issue with "product.xml file reading.

    Could you explain me the use of these properties,
    (1) public string DataSource => "Product";
    (2) public string AbsolutePath => "/";

    And also, while doing the indexing, I'm getting these errors,

    (1)
    Crawler: Add failed - 7ff5c9d5-ecaa-4df7-9bbd-2c8edc421676
    Exception: System.ArgumentNullException
    Message: Value cannot be null.
    Parameter name: source
    Source: System.Core
       at System.Linq.Enumerable.Select[TSource,TResult](IEnumerable`1 source, Func`2 selector)
       at Sitecore.ContentSearch.AbstractDocumentBuilder`1.AddItemFields()
       at Sitecore.ContentSearch.AbstractDocumentBuilder`1.BuildDocument()
       at Sitecore.ContentSearch.SolrProvider.SolrIndexOperations.IndexVersion(IIndexable indexable, IProviderUpdateContext context)
       at Sitecore.ContentSearch.SolrProvider.SolrIndexOperations.ApplyPermissionsThenIndex(IProviderUpdateContext context, IIndexable version)
       at Sitecore.ContentSearch.Crawler`1.DoAdd(IProviderUpdateContext context, T indexable)
       at Sitecore.ContentSearch.FlatDataCrawler`1.<>c__DisplayClass11_1.b__0(T indexable, ParallelLoopState loopState)


    (2)
     ERROR Object reference not set to an instance of an object.
    Exception: System.NullReferenceException
    Message: Object reference not set to an instance of an object.
    Source: Sitecore.ContentSearch.SolrProvider
       at Sitecore.ContentSearch.SolrProvider.SolrIndexSummary.get_NumberOfDocuments()
       at Sitecore.ContentSearch.Client.Forms.IndexingManagerWizard.BuildIndexCheckbox(String name, String header, ListString selected, ListString indexMap)

    10532 16:14:47 ERROR Object reference not set to an instance of an object.
    Exception: System.NullReferenceException
    Message: Object reference not set to an instance of an object.
    Source: Sitecore.ContentSearch.SolrProvider
       at Sitecore.ContentSearch.SolrProvider.SolrIndexSummary.get_NumberOfDocuments()
       at Sitecore.ContentSearch.Client.Forms.IndexingManagerWizard.BuildIndexCheckbox(String name, String header, ListString selected, ListString indexMap)

    10532 16:14:47 ERROR Object reference not set to an instance of an object.
    Exception: System.NullReferenceException
    Message: Object reference not set to an instance of an object.
    Source: Sitecore.ContentSearch.SolrProvider
       at Sitecore.ContentSearch.SolrProvider.SolrIndexSummary.get_NumberOfDocuments()
       at Sitecore.ContentSearch.Client.Forms.IndexingManagerWizard.BuildIndexCheckbox(String name, String header, ListString selected, ListString indexMap)

    10532 16:14:47 ERROR Object reference not set to an instance of an object.
    Exception: System.NullReferenceException
    Message: Object reference not set to an instance of an object.
    Source: Sitecore.ContentSearch.SolrProvider
       at Sitecore.ContentSearch.SolrProvider.SolrIndexSummary.get_NumberOfDocuments()
       at Sitecore.ContentSearch.Client.Forms.IndexingManagerWizard.BuildIndexCheckbox(String name, String header, ListString selected, ListString indexMap)

    ReplyDelete
    Replies
    1. Hello, sorry, but I haven't notice your message before. Regarding to reading XML file, could you debug your code and check if file entries were collected correctly

      Delete
  2. Could you please explain how could we index nested objects?

    ReplyDelete
    Replies
    1. I am sorry for a such late response. Index documents must be flat, so in my opinion you should avoid such approach. But if you really want to index something as nested property of main document, you can index it as another flat document and link them using ids.

      Delete