搜索
让我们看看一个典型的搜索请求,它直接写成一个 dict
from elasticsearch import Elasticsearch
client = Elasticsearch("https://127.0.0.1:9200")
response = client.search(
index="my-index",
body={
"query": {
"bool": {
"must": [{"match": {"title": "python"}}],
"must_not": [{"match": {"description": "beta"}}],
"filter": [{"term": {"category": "search"}}]
}
},
"aggs" : {
"per_tag": {
"terms": {"field": "tags"},
"aggs": {
"max_lines": {"max": {"field": "lines"}}
}
}
}
}
)
for hit in response['hits']['hits']:
print(hit['_score'], hit['_source']['title'])
for tag in response['aggregations']['per_tag']['buckets']:
print(tag['key'], tag['max_lines']['value'])
这种方法的问题是它非常冗长,容易出现语法错误,例如错误的嵌套,难以修改(例如添加另一个过滤器),而且写起来绝对不有趣。
让我们使用 Python DSL 重写这个例子
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
client = Elasticsearch("https://127.0.0.1:9200")
s = Search(using=client, index="my-index") \
.filter("term", category="search") \
.query("match", title="python") \
.exclude("match", description="beta")
s.aggs.bucket('per_tag', 'terms', field='tags') \
.metric('max_lines', 'max', field='lines')
response = s.execute()
for hit in response:
print(hit.meta.score, hit.title)
for tag in response.aggregations.per_tag.buckets:
print(tag.key, tag.max_lines.value)
正如你所见,该库负责
根据名称创建适当的
Query
对象(例如“match”)将查询组合成一个复合
bool
查询将
term
查询放在bool
查询的过滤器上下文中提供对响应数据的便捷访问
没有到处都是的波浪号或方括号
持久化
让我们看看一个简单的 Python 类,它代表博客系统中的文章
from datetime import datetime
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text, connections
# Define a default Elasticsearch client
connections.create_connection(hosts="https://127.0.0.1:9200")
class Article(Document):
title = Text(analyzer='snowball', fields={'raw': Keyword()})
body = Text(analyzer='snowball')
tags = Keyword()
published_from = Date()
lines = Integer()
class Index:
name = 'blog'
settings = {
"number_of_shards": 2,
}
def save(self, ** kwargs):
self.lines = len(self.body.split())
return super(Article, self).save(** kwargs)
def is_published(self):
return datetime.now() > self.published_from
# create the mappings in elasticsearch
Article.init()
# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()
article = Article.get(id=42)
print(article.is_published())
# Display cluster health
print(connections.get_connection().cluster.health())
在这个例子中,你可以看到
提供默认连接
使用映射配置定义字段
设置索引名称
定义自定义方法
覆盖内置的
.save()
方法,以便在持久化生命周期中进行挂钩检索对象并将其保存到 Elasticsearch 中
访问底层客户端以使用其他 API
你可以在 持久化 章节中了解更多信息。
预构建的分面搜索
如果你已经定义了 Document
,你可以非常轻松地创建一个分面搜索类来简化搜索和过滤。
注意
此功能处于实验阶段,可能会发生变化。
from elasticsearch_dsl import FacetedSearch, TermsFacet, DateHistogramFacet
class BlogSearch(FacetedSearch):
doc_types = [Article, ]
# fields that should be searched
fields = ['tags', 'title', 'body']
facets = {
# use bucket aggregations to define facets
'tags': TermsFacet(field='tags'),
'publishing_frequency': DateHistogramFacet(field='published_from', interval='month')
}
# empty search
bs = BlogSearch()
response = bs.execute()
for hit in response:
print(hit.meta.score, hit.title)
for (tag, count, selected) in response.facets.tags:
print(tag, ' (SELECTED):' if selected else ':', count)
for (month, count, selected) in response.facets.publishing_frequency:
print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count)
你可以在 分面搜索 章节中找到更多详细信息。
根据查询更新
让我们继续讨论博客文章的简单示例,并假设每篇文章都有一个点赞数。在这个例子中,假设我们想要将所有匹配特定标签且不匹配特定描述的文章的点赞数增加 1。如果将其写成一个 dict
,我们将有以下代码
from elasticsearch import Elasticsearch
client = Elasticsearch()
response = client.update_by_query(
index="my-index",
body={
"query": {
"bool": {
"must": [{"match": {"tag": "python"}}],
"must_not": [{"match": {"description": "beta"}}]
}
},
"script"={
"source": "ctx._source.likes++",
"lang": "painless"
}
},
)
使用 DSL,我们现在可以这样表达这个查询
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, UpdateByQuery
client = Elasticsearch()
ubq = UpdateByQuery(using=client, index="my-index") \
.query("match", title="python") \
.exclude("match", description="beta") \
.script(source="ctx._source.likes++", lang="painless")
response = ubq.execute()
正如你所见,Update By Query
对象提供了 Search
对象提供的许多节省,此外还允许根据以相同方式分配的脚本更新搜索结果。
从 elasticsearch-py
迁移
你无需将整个应用程序移植到 Python DSL 才能获得它的好处,你可以从现有 dict
创建一个 Search
对象,使用 API 修改它,然后将其序列化回 dict
body = {...} # insert complicated query here
# Convert to Search object
s = Search.from_dict(body)
# Add some filters, aggregations, queries, ...
s.filter("term", tags="python")
# Convert back to dict to plug back into existing code
body = s.to_dict()