SharePoint 2013: Search Architecture in SPC202

http://social.technet.microsoft.com/wiki/contents/articles/15989.sharepoint-2013-search-architecture-in-spc202.aspx

Presentation broken down into 4 parts:

Feeding Chain(crawler and content processing)
Index core(index components)
Query Chain
Analytics Component

A few steps back: In SharePoint 2010 we had a built-in search service, but also shipped FAST search service in addition to that. With SharePoint 2013, all different components are merged into a single cohesive search service.
In addition to SharePoint, Search component is also shared inside Exchange. Search with FAST technologies now spans everything from Outlook Web Access through eDiscovery through intranet search scenarios. Immense investments into Internet search as well, with such features as Product Catalogue and Cross-Site Publishing. We are now using search without even thinking about it in many areas that were not traditionally search-driven in the past. It's much more than the original "Search Box".

Feeding Chain

Crawl Component: Crawler in SP2013 is much more specialized. Extensible through BCS, Local Disk Cache, OOB Connectors, Configurations stored in Admin database. Big Change: New crawl mode called Continuous crawl mode. Runs continuously due to parallel processing. Crawl components scale individually now, no longer interlocked scaling regarding crawl components->crawl db's.

Content Processing Component: Stateless node, analyzes content for indexing, uses "Processing flows", schema mappings, stores links and anchors in Links db(analytics), extensible via web service call-outs, configurations stored in Admin database. Each CrawlerFlow processes one document at a time. Note: Security crawls should be done faster now, due to using Update Groups.

Index Core

Index component: Index is a state-full component.

New concepts: Partitions and Replicas - no more Columns and Rows. All nodes perform indexing. Journal Shipping from primary to replicas. Each partition can have many replicas. Different from SharePoint 2010(but typical for FAST), index is stored on local disk and not in the database, i.e how SharePoint 2010 stores it in Property db.

Improvements to index freshness: as content comes in, it goes straight to memory and is immediately searchable. Eventually flushed to disk. In previous FAST products, documents would have to first be stored on disk prior to being searchable.

Understanding Index schemas: Crawled property -> Managed property, Schema administration can now be done on a Site Collection level. Adding "Searchable" and Retrievable" to managed property properties.

Query Chain

Web Front-End: REST/ODATA API, CSOM and SSOM, Portals and Publishing, Search Center, ContentWebPart, RefinerWebPart, Result Templates. Programming model that allows you to serve both in the cloud and on-premise. Can take advantage of these API's and develop applications running on Surface/phones/tablets.

Query Processing Component: Stateless node, processing query flows, query analyzer, linguistics/dictionaries, result sources, schema mappings, query rules, query federation, configuration stored in Admin database.

Notes

Query federation: Possible for on-premise/cloud federation.

Query router uses various Search Provider Flows(i.e Best Bets, People Search, Exchange Search, Local SharePoint Search,, Remote SharePoint Search, Personal Favorites, etc). These are exposed to end-users and administrators via Result Sources/Query Rules and such.

Query Rules: Captures Search Intent. Composed of 3 top level elements:

Query Conditions: What queries should be handled
Query Actions: What happens when a rule matches
Publishing Options: Is the rule active and for how long.

Analytics Service

Analytics Processing Component: Map/Reduce, learns by usage, Search Analytics, Usage Analytics, enriches index by updating index items, Usage reports in Analytics database.

Making Recommendations is made easy. Recommendations by user behavior.
Built-in recommendations:
Event stream analysis
Item-to-item recommendations
Stored in "recommendedfor" managed property. Up to 12 configurable, weighted events.
Note: Example: If i'm building an application using Java, can I take my events and feed them into the Analytics Engine. Yes, through the REST API or CSOM.