Elasticsearch安装及介绍
安装Elasticsearch
Elasticsearch的安装很简单,直接下载官方包,运行即可
curl -L -O http://download.elasticsearch.org/PATH/TO/VERSION.zip unzip elasticsearch-$VERSION.zip cd elasticsearch-$VERSION ./bin/elasticsearch
使用curl ‘http://localhost:9200/?pretty’获取数据
{
"name" : "node-0",
"cluster_name" : "dongtai-es",
"cluster_uuid" : "aIah5IcqS1y9qiJ4LphAbA",
"version" : {
"number" : "6.1.2",
"build_hash" : "5b1fea5",
"build_date" : "2018-01-10T02:35:59.208Z",
"build_snapshot" : false,
"lucene_version" : "7.1.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
基本概念介绍
Elasticsearch本质是一个分布式数据库,每个服务器上称为一个Elastic实例,这一组实例构成了ES集群
Document
ES中存储数据记录称为Document,Document使用JSON格式表示,同一个 Index 里面的Document没有要求有相同的结构,但最好一致,这样能提高效率,如下:
{
"first_name": "Jane2",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": "music"
}
Index
ES文档的索引,定义了文档放在哪里,你可以理解为对于数据库名
Type
文档表示的对象类别,可以理解为数据库中的表,比如我们有一个机票数据,可以按舱位来分组(头等舱,经济舱),根据规则,Elastic 6.x 版只允许每个 Index 包含一个 Type,7.x 版将会彻底移除 Type
ID
文档的唯一标识,可以理解为表中的一条记录,根据index、type、_ID可以确定一个文档
基本用法
添加一个文档
PUT index/employee/6
{
"first_name": "chuan",
"last_name": "zhang",
"age": 32,
"about": "hi I'm hai na bai chuan",
"interests": "read book"
}
上面PUT后面分别对应 index、Type、Id
获取文档
获取所有文档
ES默认会返回10条,可以指定size来改变
GET index/employee/_search
{
"size": 3
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "index",
"_type": "employee",
"_id": "5",
"_score": 1,
"_source": {
"first_name": "Jane1",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": "music"
}
},
{
"_index": "index",
"_type": "employee",
"_id": "10",
"_score": 1,
"_source": {
"first_name": "chuan",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": "music"
}
}
]
}
}
按条件匹配
比如我需要查找firstname是’chuan‘的所有文档,如果要搜索firstname中多个关键字,中间用空格隔开就可以了,比如”firstname”: “chuan smith” 表示查询firstname包含chuan或smith的
GET index/employee/_search
{
"query": {
"match": {
"first_name": "chuan"
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "index",
"_type": "employee",
"_id": "6",
"_score": 0.6931472,
"_source": {
"first_name": "chuan",
"last_name": "zhang",
"age": 32,
"about": "hi I'm hai na bai chuan",
"interests": "read book"
}
}
]
}
}
如果要使用and查询,比如我要查询firstname为chuan,lastname为chuan的文档记录
GET index/employee/_search
{
"query": {
"bool": {
"must": [
{ "match": { "first_name": "chuan" } },
{ "match": { "last_name": "zhang" } }
]
}
}
}
如果要根据范围查询或过滤,类似MySQL中’>’,或 ‘!=’操作,比如需要查询上面条件中,年龄大于30的文档
{
"query": {
"bool": {
"must": [
{ "match": { "first_name": "chuan" } },
{ "match": { "last_name": "zhang" } }
]
"filter": {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}
更新文档
更新记录使用PUT请求,重新发送一次数据,ES会根据id来修改,如果我需要更新id为6的about字段
PUT index/employee/6
{
"first_name": "chuan",
"last_name": "zhang",
"age": 32,
"about": "i modify my about",
"interests": "read book"
}
{
"_index": "index",
"_type": "employee",
"_id": "6",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 12,
"_primary_term": 2
}
_version为版本号,由原来的1变成了2,result为”update“,以上为全量更新,如果需要局部更新,可以使用POST请求
POST index/employee/6/_update
{
"doc":{
"about": "i modify my about"
}
}
删除文档
如下,我需要删除id为6的文档
DELETE index/employee/6
{
"found" : true,
"_index" : "index",
"_type" : "employee",
"_id" : "6",
"_version" : 3
}
可以看到version也增加了1,如果对于的id没有找到,found将返回false
数据分析
ES同样有类似MySQL group by的用法,ES称为聚合,比如我想统计employee中最受欢迎的兴趣,注意:ES在5.x之后对排序、聚合操作用单独的数据结构(fielddata)缓存到内存里了,需要单独开启。
PUT index/_mapping/employee/
{
"properties": {
"interests": {
"type": "text",
"fielddata": true
}
}
}
GET index/employee/_search
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "music",
"doc_count": 3
},
{
"key": "forestry",
"doc_count": 1
},
{
"key": "read",
"doc_count": 1
}
]
}
}