Elasticsearch安装及介绍
安装Elasticsearch
Elasticsearch的安装很简单,直接下载官方包,运行即可
curl -L -O http://download.elasticsearch.org/PATH/TO/VERSION.zip unzip elasticsearch-$VERSION.zip cd elasticsearch-$VERSION ./bin/elasticsearch
使用curl ‘http://localhost:9200/?pretty’获取数据
{ "name" : "node-0", "cluster_name" : "dongtai-es", "cluster_uuid" : "aIah5IcqS1y9qiJ4LphAbA", "version" : { "number" : "6.1.2", "build_hash" : "5b1fea5", "build_date" : "2018-01-10T02:35:59.208Z", "build_snapshot" : false, "lucene_version" : "7.1.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
基本概念介绍
Elasticsearch本质是一个分布式数据库,每个服务器上称为一个Elastic实例,这一组实例构成了ES集群
Document
ES中存储数据记录称为Document,Document使用JSON格式表示,同一个 Index 里面的Document没有要求有相同的结构,但最好一致,这样能提高效率,如下:
{ "first_name": "Jane2", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": "music" }
Index
ES文档的索引,定义了文档放在哪里,你可以理解为对于数据库名
Type
文档表示的对象类别,可以理解为数据库中的表,比如我们有一个机票数据,可以按舱位来分组(头等舱,经济舱),根据规则,Elastic 6.x 版只允许每个 Index 包含一个 Type,7.x 版将会彻底移除 Type
ID
文档的唯一标识,可以理解为表中的一条记录,根据index、type、_ID可以确定一个文档
基本用法
添加一个文档
PUT index/employee/6 { "first_name": "chuan", "last_name": "zhang", "age": 32, "about": "hi I'm hai na bai chuan", "interests": "read book" }
上面PUT后面分别对应 index、Type、Id
获取文档
获取所有文档
ES默认会返回10条,可以指定size来改变
GET index/employee/_search { "size": 3 }
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "index", "_type": "employee", "_id": "5", "_score": 1, "_source": { "first_name": "Jane1", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": "music" } }, { "_index": "index", "_type": "employee", "_id": "10", "_score": 1, "_source": { "first_name": "chuan", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": "music" } } ] } }
按条件匹配
比如我需要查找firstname是’chuan‘的所有文档,如果要搜索firstname中多个关键字,中间用空格隔开就可以了,比如”firstname”: “chuan smith” 表示查询firstname包含chuan或smith的
GET index/employee/_search { "query": { "match": { "first_name": "chuan" } } }
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.6931472, "hits": [ { "_index": "index", "_type": "employee", "_id": "6", "_score": 0.6931472, "_source": { "first_name": "chuan", "last_name": "zhang", "age": 32, "about": "hi I'm hai na bai chuan", "interests": "read book" } } ] } }
如果要使用and查询,比如我要查询firstname为chuan,lastname为chuan的文档记录
GET index/employee/_search { "query": { "bool": { "must": [ { "match": { "first_name": "chuan" } }, { "match": { "last_name": "zhang" } } ] } } }
如果要根据范围查询或过滤,类似MySQL中’>’,或 ‘!=’操作,比如需要查询上面条件中,年龄大于30的文档
{ "query": { "bool": { "must": [ { "match": { "first_name": "chuan" } }, { "match": { "last_name": "zhang" } } ] "filter": { "range" : { "age" : { "gt" : 30 } } } } } }
更新文档
更新记录使用PUT请求,重新发送一次数据,ES会根据id来修改,如果我需要更新id为6的about字段
PUT index/employee/6 { "first_name": "chuan", "last_name": "zhang", "age": 32, "about": "i modify my about", "interests": "read book" }
{ "_index": "index", "_type": "employee", "_id": "6", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 12, "_primary_term": 2 }
_version为版本号,由原来的1变成了2,result为”update“,以上为全量更新,如果需要局部更新,可以使用POST请求
POST index/employee/6/_update { "doc":{ "about": "i modify my about" } }
删除文档
如下,我需要删除id为6的文档
DELETE index/employee/6
{ "found" : true, "_index" : "index", "_type" : "employee", "_id" : "6", "_version" : 3 }
可以看到version也增加了1,如果对于的id没有找到,found将返回false
数据分析
ES同样有类似MySQL group by的用法,ES称为聚合,比如我想统计employee中最受欢迎的兴趣,注意:ES在5.x之后对排序、聚合操作用单独的数据结构(fielddata)缓存到内存里了,需要单独开启。
PUT index/_mapping/employee/ { "properties": { "interests": { "type": "text", "fielddata": true } } } GET index/employee/_search { "aggs": { "all_interests": { "terms": { "field": "interests" } } } }
"aggregations": { "all_interests": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "music", "doc_count": 3 }, { "key": "forestry", "doc_count": 1 }, { "key": "read", "doc_count": 1 } ] } }