By Tamar Sadeh, Director of Marketing

In today’s world, users’ expectations for a quick and easy search process, combined with an information landscape as large and complex as that covered by the Primo Central Index, render sophisticated relevance-ranking algorithms crucial to the success of the discovery process. In addition to the traditional assessment of the degree to which a retrieved item matches a user’s query, relevance-ranking algorithms need to take into account factors that relate to the academic significance of the retrieved item and to the context of the query: who submitted the query and what information need led the user to submit that query.

In March 2011, Ex Libris initiated a relevance-ranking project to enrich and optimize the original Primo® relevance-ranking algorithms. The algorithms that have thus far resulted from this project constitute the Ex Libris ScholarRank™ technology.

The project team includes members of the Ex Libris research and development staff and information-retrieval specialists. In addition, input from researchers who are located all over the world and work in various disciplines has helped the team establish metrics for the evaluation of the improvements that are made to the algorithms. In-depth information about the relevance-ranking project is available in a white paper, which you can obtain from your account manager.

What Is Primo ScholarRank?

Although relevance-ranking algorithms are not new in the context of information retrieval (IR) systems, the Ex Libris R&D team realized early on in the development of the Primo discovery and delivery solution that for optimal application to scholarly data, traditional IR algorithms would have to be adjusted and enhanced considerably. The current Primo relevance-ranking project is equipping the algorithms with new capabilities, which take into account a user’s background and information needs as well as the global scholarly significance of materials. The latter aspect is expressed as a measure of various factors, such as the number of citations that a publication has generated and usage information that reflects scholars’ interest in the publication. Along with these enhancements, the project team is adding a self-learning mechanism that feeds data back to the algorithms and helps the system constantly improve the order of search results over time. Together, all these features constitute the ScholarRank relevance-ranking technology; some of the features are already deployed by the Primo solution, and others will be implemented in 2012.

To determine the position of an item on a result list, ScholarRank is designed to take into account the following three elements:

  • The degree to which the item matches the query
  • A score representing the item’s scholarly value (referred to as the ScholarRank value score)
  • Information about the user and the user’s research need at the specific point in time

The match between a query and an item is calculated according to IR methods that have been adapted to the structure of the specific type of information (metadata, abstract, or full text). Not only do the proximity and order of the query terms in a result record have an impact on the ranking, but the field in which the query terms appear also has an effect; for example, if the terms appear in an item’s title, the item is likely to be more relevant to the user than an item for which the query words appear only in the full text. Furthermore, specific types of materials are typically more likely to satisfy user needs; for example, when all else is equal, a journal article is ranked higher than a newspaper article and a recent publication is ranked higher than an older one.

The ScholarRank value score represents an evaluation of an item’s academic significance regardless of the degree to which the item matches the query. To calculate the value score, the Primo ScholarRank technology relies on usage metrics derived from the bX article recommender database and other data, such as the item’s citation information.

The Primo ScholarRank technology also considers certain characteristics of a user to provide personalized ranking. Applying information about the user’s area of research, ScholarRank boosts materials related to the user’s discipline when the topic that is inferred from the query is ambiguous. Information about the user’s academic degree enables ScholarRank to boost materials that would be considered appropriate for that level; for instance, for a query submitted by a researcher who holds a Ph.D., in‑depth items would be among the highest ranked.

Finally, a user’s specific information need (a particular item or materials on a particular subject) is factored into the relevance-ranking equation. By analyzing a query, the Primo ScholarRank technology “infers” the user’s need and adapts to the type of search (a known-item search, narrow-topic search, broad-topic search, or author-related search). For example, in a broad-topic search, reference materials or review articles are likely to be more relevant to the user than an article dealing with a specific aspect of the subject matter.

Looking Ahead

Awareness of the huge impact of relevance ranking on the success of the discovery process has brought the ScholarRank technology to the forefront of research at Ex Libris.

The goal of the work invested in the Primo relevance-ranking algorithms is to enable academic users to find the exact scholarly materials that they need—and find them quickly. By shortening users’ discovery time, Primo improves their productivity, draws more traffic to the library site, and helps achieve optimal use of library collections. As a result, Primo enables libraries to better serve their community and their institution’s mission and to gain the prominence that they deserve in the provision of scholarly information.

The research and development work on the ScholarRank technology is an ongoing effort and will continue to introduce enhancements. Additional methods of personalizing relevance ranking will be added to the algorithms, as well as more features drawn from relationships between researchers, authors, and scholarly materials.

The Primo ScholarRank Technology: Bringing the Most Relevant Results to the Top of the List的更多相关文章

  1. 斯坦福CS课程列表

    http://exploredegrees.stanford.edu/coursedescriptions/cs/ CS 101. Introduction to Computing Principl ...

  2. Information retrieval信息检索

    https://en.wikipedia.org/wiki/Information_retrieval 信息检索 (一种信息技术) 信息检索(Information Retrieval)是指信息按一定 ...

  3. 微软职位内部推荐-Sr DEV Lead, Bing Search Relevance

    微软近期Open的职位: Contact Person: Winnie Wei (wiwe@microsoft.com )Sr DEV Lead, Bing Search RelevanceLocat ...

  4. 美国政府关于Google公司2013年度的财务报表红头文件

    请管理员移至新闻版块,谢谢! 来源:http://www.sec.gov/ 财务报表下载↓ 此文仅作参考分析. 10-K 1 goog2013123110-k.htm FORM 10-K   UNIT ...

  5. cassandra + lucene集成

    Stratio’s Cassandra Lucene Index Stratio’s Cassandra Lucene Index, derived from Stratio Cassandra, i ...

  6. 翻译 | Placing Search in Context The Concept Revisited

    翻译 | Placing Search in Context The Concept Revisited 原文 摘要 [1] Keyword-based search engines are in w ...

  7. (转)Awesome Courses

    Awesome Courses  Introduction There is a lot of hidden treasure lying within university pages scatte ...

  8. 每日英语:Tech Firms Flock to Vietnam

    Opening up a Korean restaurant among the rice fields and limestone karsts north of Hanoi might seem ...

  9. cassandra的全文检索插件

    https://github.com/Stratio/cassandra-lucene-index Stratio’s Cassandra Lucene Index Stratio’s Cassand ...

随机推荐

  1. golang protobuf

    1. proto文件编写的时候,如果用uint32或uint64类型,那么不能用required,必须用optional. 如果用错了,会出现错误:unmarshaling error: proto: ...

  2. Mac中的快捷键

    Mac中主要有四个修饰键,分别是Command,Control,Option和Shift,这四个键分别有自己的图案 基本快捷键 Command是Mac里最重要的修饰键,在大多数情况下相当于Window ...

  3. COM实践经验

    1. COM不能单独建立,必须有一个Delphi工程的实体,EXE或者DLL都行 2. 自动生成Project1_TLB.pas文件 3. 自动生成Unit2.pas文件,其中最重要的包含内容有: i ...

  4. VS中的预先生成事件和后期生成事件

    原文:VS中的预先生成事件和后期生成事件 在C#开发中,有时候需要在程序编译之前或之后做一些操作. 要达到这个目的,可以使用Visual Studio中的预先生成事件和后期生成事件. 下图是一个简单例 ...

  5. LD1-K(求差值最小的生成树)

    题目链接 /* *题目大意: *一个简单图,n个点,m条边; *要求一颗生成树,使得其最大边与最小边的差值是所有生成树中最小的,输出最小的那个差值; *算法分析: *枚举最小边,用kruskal求生成 ...

  6. COJ 2024 仙境传奇(五)——一个天才的觉醒 素数筛

    整理模板,同时测了一下memset,for,fill到底谁快... 结果:memset最快,其次是for,fill最慢QAQ.... #include<iostream> #include ...

  7. Moss的使用

  8. C#实现数据结构——线性表(上)

    什么是线性表 数据结构中最常用也最简单的应该就是线性表,它是一种线性结构(废话,不是线性结构怎么会叫线性表?当然不是废话,古人公孙龙就说白马非马,现代生物学家也说鲸鱼不是鱼). 那什么是线性结构? 按 ...

  9. HDU2058

    The sum problem Time Limit: 5000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) ...

  10. hdu3656Fire station(DLX重复覆盖 + 二分)

    题目请戳这里 题目大意:一个城市n个点,现在要建m个消防站,消防站建在给定的n个点中.求建m个消防站后,m个消防站要覆盖所有的n个点的覆盖半径最小. 题目分析:重复覆盖问题,DLX解决.不过要求覆盖半 ...