数码资讯

logstash在向elasticsearch输出数据时的动态映射模板问题

2023-07-28 阅读 880

选购提示 关注价格、性能、续航、售后和真实使用场景，理性比较后再下单。

使用logstash-input-jdbc插件同步mysql数据到elasticsearch，系统会使用一个默认的动态映射模板，模板名字为logstash。在启动logstash过程中你会看到如下信息

Using mapping template from {:path=>nil}

Attempting to install template{:manage_template=>{"template"=>"logstash-*","version"=>50001,"settings"=>{"index.refresh_interval"=>"5s"},"mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true,"norms"=>false},"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message","match_mapping_type"=>"string", "mapping"=>{"type"=>"text","norms"=>false}}},{"string_fields"=>{"match"=>"*","match_mapping_type"=>"string","mapping"=>{"type"=>"text","norms"=>false,"fields"=>{"keyword"=>{"type"=>"keyword"}}}}}],"properties"=>{"@timestamp"=>{"type"=>"date","include_in_all"=>false},"@version"=>{"type"=>"keyword","include_in_all"=>false},"geoip"=>{"dynamic"=>true,"properties"=>{"ip"=>{"type"=>"ip"},"location"=>{"type"=>"geo_point"},"latitude"=>{"type"=>"half_float"},"longitude"=>{"type"=>"half_float"}}}}}}}}

Installing elasticsearch template to_template/logstash

你看第一行path=>nil表示没有找到自定义模板，那就使用默认模板，并且最后将模板存储在elasticsearch模板路径中，以logstash命名。模板内容：

{

"template":"logstash-*",

"version": 50001,

"settings": {

"index.refresh_interval":"5s"

"mappings": {

"_default_": {

"_all": {

"enabled": true,

"norms": false

"dynamic_templates": [

{

"message_field":{

"path_match":"message",

"match_mapping_type": "string",

"mapping": {

"type":"text",

"norms":false

}

{

"string_fields":{

"match": "*",

"match_mapping_type": "string",

"mapping": {

"type":"text",

"norms":false,

"fields":{

"keyword": {

"type": "keyword"

}

"properties": {

"@timestamp": {

"type":"date",

"include_in_all":false

"@version": {

"type":"keyword",

"include_in_all":false

"geoip": {

"dynamic": true,

"properties": {

"ip": {

"type":"ip"

"location": {

"type":"geo_point"

"latitude": {

"type":"half_float"

"longitude":{

"type":"half_float"

}

他会帮我们自动映射同步过来的字段，但是有一个不好的地方是大部分text类型都分词，而我自己的需求更多是不分词，所以要自定义映射；刚开始我没意识到模板的优先级，我是没改模板配置，一切都是默认，只不过在启动logstash之前，我先用curl -XPUT命令在es集群上创建了不分词的映射，但是发现同步完数据后并没生效，这才意识到logstash的output插件优先级高于你在集群上创建的映射。所以接下来修改模板并覆盖默认的。

首先用命令删除默认的模板：

curl –XDELETE–u elastic ‘192.168.11.31:8011/_template/logstash’

然后新建一个文件es-template.json，名字随便起，在默认模版的内容上修改一下，粘进去，我这里将text类型全部不分词，一下是模板内容

{
  "template": "my_index",
  "settings" : {
    "index.refresh_interval" :"5s"
  },
  "mappings" : {  
    "_default_" : {  
      "_all" : {"enabled":false, "omit_norms" : true},  
      "dynamic_templates" : [ {  
        "message_field" : {  
          "match" :"message",  
          "match_mapping_type" :"string",  
          "mapping" : {  
            "type" :"string", "index" : "not_analyzed","omit_norms" : true,  
            "fielddata" : {"format" : "disabled" } 
          } 
        } 
      }, { 
        "string_fields" : {  
          "match" :"*",  
          "match_mapping_type" :"string",  
          "mapping" : {  
            "type" :"string", "index" : "not_analyzed","omit_norms" : true,  
            "fielddata" : {"format" : "disabled" }, 
            "fields" : {  
              "raw" :{"type": "string", "index" :"not_analyzed", "ignore_above" : 256}  
            } 
          } 
        } 
      } ] 
    }  
  }  
}

然后是logstash的启动文件jdbc.conf里面output模块配置：

if[type] =="my_type"{
        elasticsearch {
           hosts => ["192.168.110.31:8011","192.168.110.31:8012","192.168.110.31:8013"]
           user => "elastic"
           password => "abc123qwer"
           index => "my_index"
           document_id => "%{id}"
           #manage_template =>"false"
           template =>"/home/lvyuan/elasticsearch/logstash-5.5.3/template/es-template.json"
           template_name =>"my_index"
           template_overwrite =>"true"
 
        }
 }

启动前先删除以前创建的索引和模板，启动后发现没生效的话，一定要先删除索引和模板（是存储到_template下的模板，不是这个模板物理文件），然后再修改再运行看看。

curl -XDELETE-u elastic 'http://192.168.110.31:8011/_template/my_index'

curl -XDELETE-u elastic 'http://192.168.110.31:8011/my_index'

因为我的初衷是elasticsearch替代mysql的sql语句查询，并不想全文搜索，所以分词还可能影响我的功能，例如有一个字段在mysql中是存储一段既有大写有小写间杂的字母序列（eg：HTZG5jjhffdwe）,当采用默认的映射模板（会分词）时，会将这个字母序列先全部转为小写再存入token中，这样的话，用termQuery(不分词，精确匹配)肯定找不到，有人会说可以用matchPhraseQuery，这个的确可以查到；但是如果我想用前缀匹配时prefixQuery（不分词）就查不到了，用以小写的“htzg5”开头的前缀可以匹配到，但是用大写的就匹配不到，token表里全是小写的，肯定匹配不到。所以说具体情况具体分析，不是所有情况下都应该分词。你可以试一下：

http://localhost:8011/_analyze?pretty&analyzer=standard&text=HTZG5jjhffdEX7w52r37880 全转为小写存入token

:{"tokens":[{"token":"htzg5jjhffdex7w52r37880","start_offset":0,"end_offset":22,"type":"<ALPHANUM>","position":0}]}

参考地址：http://blog.csdn.net/asia_kobe/article/details/51192848

http://www.cnblogs.com/NextNight/p/6860283.html

http://www.cnblogs.com/cocowool/p/elk_dynamic_templates.html

https://elasticsearch.cn/article/21

http://m.blog.csdn.net/u012516166/article/details/75106184

声明：本文内容用于数码产品信息整理与选购参考，具体价格、库存、售后政策以官方渠道和电商页面实时信息为准。