在solr中查询具有不同字段的多个集合

k10s72fa 于 5个月前发布在 Solr

关注(0)|答案(3)|浏览(69)

给定以下（单核）查询：

http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b/select?indent=true&q=*:*&rows=100&start=0&wt=json

字符串
第一个查询返回“numFound”：40000”第二个查询返回“numFound”：10000”
我试着把这些放在一起：

http://localhost/solr/a/select?indent=true&shards=localhost/solr/a,localhost/solr/b&q=*:*&rows=100&start=0&wt=json

型
现在我得到“numFound”：50000”。唯一的问题是“a”比“b”有更多的列。所以多个集合请求只返回a的值。
可以用不同的字段查询多个集合吗？或者它们必须是相同的？我应该如何更改我的第三个URL来获得这个结果？

solr

来源：https://stackoverflow.com/questions/19313910/query-multiple-collections-with-different-fields-in-solr

3条答案

按热度按时间

xqkwcwgp1#

你需要的是--我称之为--一个 * 统一核心 *。这个模式本身没有内容，它只是作为一种 Package 器来统一那些你想从两个核心显示的字段。在那里，你需要

一个schema.xml文件，它包含了您希望在统一结果中包含的所有字段
一个查询处理程序，它为您结合了两个不同的核心，

从the Solr Wiki page about DistributedSearch中预先提取的一个重要限制
文档必须有一个唯一的键，并且唯一的键必须被存储（在schema.xml中stored=“true”）唯一的键字段在所有分片中必须是唯一的。如果遇到具有重复唯一键的文档，Solr将尝试返回有效的结果，但行为可能是不确定的。
例如，我有 shard-1 和字段id、title、description，还有 shard-2 和字段id、title、abstractText。

shard-1模式

<schema name="shard-1" version="1.5">

  <fields>
    <field name="id"
          type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description"
          type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

字符串

shard-2模式

<schema name="shard-2" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

型
为了统一这些模式，我创建了第三个模式，我称之为 shard-unification，它包含所有四个字段。

<schema name="shard-unification" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

型
现在，我需要利用这个组合模式，因此我在solr-unification核心的solrd.xml中创建了一个查询处理程序

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

型
就这样，现在shard-1和shard-2中需要一些index-data，要查询统一结果，只需使用适当的shards param查询shard-unification即可。

http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2

型
这将返回如下结果

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":1,
        "title":"title 1",
        "description":"description 1",
        "score":1.0},
      {
        "id":2,
        "title":"title 2",
        "abstractText":"abstract 2",
        "score":1.0}]
  }}

型

获取文档的源分片

如果你想将原始分片提取到每个文档中，你只需要在fl中指定[shard]。无论是作为查询的参数还是在请求处理器的默认值中，见下文。方括号是强制性的，它们也会出现在结果响应中。

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score,[shard]</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

型

工作样本

如果你想看一个运行的例子，在github和execute the ShardUnificationTest上 checkout my solrsample project。我现在也包括了shard-fetching。

赞(0）回复(0）举报 5个月前

ergxz8rk2#

在Solr中应该使用分片
当索引变得太大而不适合单个系统时，或者当单个查询的执行时间太长时，
因此列的编号和名称应该始终相同。这在本文档中指定（前面的引用也来自于此）：http://wiki.apache.org/solr/DistributedSearch
如果你让你的查询保持原样，并让两个分片具有相同的字段，这应该会像预期的那样工作。
如果你想了解更多关于Solr中分片如何工作的信息，也可以看看这个文档：http://wiki.apache.org/solr/SolrCloud

赞(0）回复(0）举报 5个月前

x4shl7ld3#

collection参数允许您指定一个集合或多个集合，查询将在这些集合上执行。这允许您一次查询多个集合，并且以分布式方式工作的Solr功能将跨集合工作。
这个文档是针对9.4的，但我在6.6上测试过，它可以工作。
https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#collection-parameter

赞(0）回复(0）举报 5个月前

我来回答

在solr中查询具有不同字段的多个集合

3条答案

获取文档的源分片

工作样本

相关问题

热门标签

最新问答