写给大忙人的ELK最新版6.2.4学习笔记-Logstash和Filebeat解析(jav...|江阴雨辰互联

2023年7月17日发(作者：)

写给⼤忙⼈的ELK最新版6.2.4学习笔记-Logstash和Filebeat解析（jav。。。接前⼀篇，继续对ELK。logstash官⽅最新⽂档/guide/en/logstash/current/。假设有⼏⼗台服务器，每台服务器要监控系统⽇志syslog、tomcat⽇志、nginx⽇志、mysql⽇志等等，监控OOM、内存低下进程被kill、nginx错误、mysql异常等等，可想⽽知，这是多么的耗时耗⼒。logstash采⽤的是插件化体系架构，⼏乎所有具体功能的实现都是采⽤插件，已安装的插件列表可以通过bin/logstash-plugin list --verbose列出。或者访问/guide/en/logstash/current/、/guide/en/logstash/current/。logstash配置⽂件格式分为输⼊、过滤器、输出三部分。除了POC⽬的外，基本上所有实际应⽤中都需要filter对⽇志进⾏预处理，⽆论是nginx⽇志还是log4j⽇志。output中的stdout同理。input { log4j { port => "5400" } beats { port => "5044" }}filter { # 多个过滤器会按声明的先后顺序执⾏ grok { match => { "message" => "%{COMBINEDAPACHELOG}"} } geoip { source => "clientip" }}output { elasticsearch { action => "index" hosts => "127.0.0.1:9200" # 或者 ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"] ,⽀持均衡的写⼊ES的多个节点，⼀般为⾮master节点 index => "logstash-%{+YYYY-MM}" } stdout {

codec=> rubydebug

} file { path => "/path/to/target/file" }}logstash⽀持的常⽤输⼊包括syslog（参考RFC3164）、控制台、⽂件、redis、beats。logstash⽀持的常⽤输出包括es、控制台、⽂件。logstash⽀持的常⽤过滤器包括grok、mutate、drop、clone、geoip。查看logstash各种命令⾏选项[root@elk1 bin]# ./logstash --helpOpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=NUsage: bin/logstash [OPTIONS]Options: -n, -- NAME Specify the name of this logstash instance, if no value is given it will default to the current hostname. (default: "elk1") -f, -- CONFIG_PATH Load the logstash config from a specific file or directory. If a directory is given, all files in that directory will be concatenated in lexicographical order and then parsed as a single config file. You can also specify wildcards (globs) and any matched files will be loaded in the order described above. -e, -- CONFIG_STRING Use the given string as the configuration data. Same syntax as the config file. If no input is specified, then the following is used as the default input: "input { stdin { type => stdin } }" and if no output is specified, then the following is used as the default output: "output { stdout { codec => rubydebug } }" If you wish to use both defaults, please use the empty string for the '-e' flag. (default: nil) --modules MODULES Load Logstash modules. Modules can be defined using multiple instances '--modules module1 --modules module2', or comma-separated syntax '--modules=module1,module2' Cannot be used in conjunction with '-e' or '-f' Use of '--modules' will override modules declared in the '' file. -M, --le MODULES_VARIABLE Load variables for module template. Multiple instances of '-M' or '--le' are supported. Ignored if '--modules' flag is not used. Should be in the format of '-M "MODULE___LE_NAME=VALUE"' as in '-M "ame=fieldvalue"' --setup Load index template into Elasticsearch, and saved searches,

index-pattern, visualizations, and dashboards into Kibana when running modules. (default: false) -- CLOUD_ID Sets the elasticsearch and kibana host settings for module connections in Elastic Cloud. Your Elastic Cloud User interface or the Cloud support team should provide this. Add an optional label prefix ':' to help you identify multiple . e.g. 'staging:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy' -- CLOUD_AUTH Sets the elasticsearch and kibana username and password for module connections in Elastic Cloud e.g. 'username:' -- ID Sets the ID of the pipeline. (default: "main") -w, --s COUNT Sets the number of pipeline workers to run. (default: 1) --experimental-java-execution (Experimental) Use new Java execution engine. (default: false) -b, -- SIZE Size of batches the pipeline is to work in. (default: 125) -u, -- DELAY_IN_MS When creating pipeline batches, how long to wait while polling for the next event. (default: 50) --_shutdown Force logstash to exit during shutdown even if there are still inflight events in memory. By default, logstash will refuse to quit until all received events have been pushed to the outputs. (default: false) -- PATH This should point to a writable directory. Logstash will use this directory whenever it needs to store data. Plugins will also have access to this path. (default: "/usr/local/app/logstash-6.2.4/data") -p, --s PATH A path of where to find plugins. This flag can be given multiple times to include multiple paths. Plugins are expected to be in a specific directory hierarchy: 'PATH/logstash/TYPE/' where TYPE is 'inputs' 'filters', 'outputs' or 'codecs' and NAME is the name of the plugin. (default: []) -l, -- PATH Write logstash internal logs to the given file. Without this flag, logstash will emit logs to standard output. (default: "/usr/local/app/logstash-6.2.4/logs") -- LEVEL Set the log level for logstash. Possible values are: - fatal - error - warn - info - debug - trace (default: "info") -- Print the compiled config ruby code out as a debug log (you must also have --=debug enabled). WARNING: This will include any 'password' options passed to plugin configs as plaintext, and may result in plaintext passwords appearing in your logs! (default: false) -i, --interactive SHELL Drop to shell instead of running as normal. Valid shells are "irb" and "pry" -V, --version Emit the version of logstash and its friends, then exit. -t, --_and_exit Check configuration for valid syntax and then exit. (default: false) -r, --tic Monitor configuration changes and reload whenever it is changed. NOTE: use SIGHUP to manually reload the config (default: false) --al RELOAD_INTERVAL How frequently to poll the configuration location for changes, in seconds. (default: 3000000000) -- HTTP_HOST Web API binding host (default: "127.0.0.1") -- HTTP_PORT Web API http port (default: 9600..9700) -- FORMAT Specify if Logstash should write its own logs in JSON form (one event per line) or in plain text (using Ruby's Object#inspect) (default: "plain") --gs SETTINGS_DIR Directory containing file. This can also be set through the LS_SETTINGS_DIR environment variable. (default: "/usr/local/app/logstash-6.2.4/config") --verbose Set the log level to info. DEPRECATED: use --=info instead. --debug Set the log level to debug. DEPRECATED: use --=debug instead. --quiet Set the log level to info. DEPRECATED: use --=quiet instead. -h, --help print help各配置的含义也可以参考/guide/en/logstash/current/⽐较实⽤的是：-f 指定配置⽂件--_and_exit 解析配置⽂件正确性--tic ⾃动监听配置修改⽽⽆需重启，跟nginx -s reload⼀样，挺实⽤的ELK均采⽤YAML语⾔（/item/YAML/1067697?fr=aladdin）编写配置⽂件。YAML有以下基本规则：

1、⼤⼩写敏感

2、使⽤缩进表⽰层级关系

3、禁⽌使⽤tab缩进，只能使⽤空格键

4、缩进长度没有限制，只要元素对齐就表⽰这些元素属于⼀个层级。

5、使⽤#表⽰注释

6、字符串可以不⽤引号标注JVM参数在config/s中设置。配置⽂件中output和filter部分均⽀持主要常见的逻辑表达式⽐如if/else if，以及各种⽐较、正则匹配。配置⽂件中还可以访问环境变量，通过${HOME}即可，具体可以参考/guide/en/logstash/current/。Beats Input插件在开始看具体Input插件之前，我们看下哪些选项是所有插件都⽀持的。其中主要的是id，如果⼀个logstash实例⾥⾯开了多个相同类型的插件，可以⽤来区分。通过Beats插件加载数据源已经是ELK 6.x的主要推荐⽅式，所以我们来详细看下Beats插件的配置（/guide/en/logstash/current/）。input { beats { port => 5044 }}其中port是参数是必填的，没有默认值。除了ssl配置外，其他⼏乎都是可选的。host默认是"0.0.0.0"，代表监听所有⽹卡，除⾮有特殊安全要求，也是推荐的做法。核⼼解析插件Grok Filter通常来说，各种⽇志的格式都⽐较灵活复杂⽐如nginx访问⽇志或者并不纯粹是⼀⾏⼀事件⽐如java异常堆栈，⽽且还不⼀定对⼤部分开发或者运维那么友好，所以如果可以在最终展现前对⽇志进⾏解析并归类到各个字段中，可⽤性会提升很多。grok过滤器插件就是⽤来完成这个功能的。grok和beat插件⼀样，默认可⽤。从⾮源头上来说，⽇志体系好不好，很⼤程度上依赖于这⼀步的过滤规则做的好不好，所以虽然繁琐，但却必须掌握，跟nginx的重写差不多。Logstash⾃带了约120个模式，具体可见/logstash-plugins/logstash-patterns-core/tree/master/patterns。grok的语法为：%{SYNTAX:SEMANTIC}类似于java:String pattern = ".*runoob.*";boolean isMatch = s(pattern, content);

其中的pattern就相当于SYNTAX，SEMANTIC为content，只不过因为解析的时候没有字段名，所以content是赋给匹配正则模式的⽂本的字段名，这些字段名会被追加到event中。例如对于下列http请求⽇志：55.3.244.1 GET / 15824 0.043使⽤ %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} 匹配的话，除了原message外，事件中会新增下列额外字段：client: 55.3.244.1method: GETrequest: /tes: 15824duration: 0.043完整的grok例⼦如下：input { file { path => "/var/log/" }}filter { grok { match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" } }}注：如果重启，logstash怎么知道读取到的什么位置了，在filebeat部分，我们会讲到。grok的主要选项是match和overwrite，前者⽤来解析message到相应字段，后者⽤来重写message，这样原始message就可以被覆盖，对于很多的⽇志来说，原始的message重复存储⼀份没有意义。 /guide/en/logstash/6.2/#plugins-filters-grok-overwrite虽然Grok过滤器可以⽤来进⾏格式化，但是对于多⾏事件来说，并不适合在filter或者input（multiline codec，如果希望在logstash中处理多⾏事件，可以参考/guide/en/logstash/current/）中处理，因为使⽤ELK的平台通常⽇志使⽤beats input插件，此时在logstash中进⾏多⾏事件的处理会导致数据流混乱，所以需要在事件发送到logstash之前就处理好，也就是应该在filebeat中预处理。对于来⾃于filebeat模块的数据，logstash⾃带了针对他们的解析模式，参考/guide/en/logstash/current/，具体到filebeat的时候详解。ES Output插件主要的选项包括：action，默认是index，索引⽂档(logstash的事件)（ES架构与核⼼概念参考）。host，声明ES服务器地址端⼝index，事件写⼊的ES index，默认是logstash-%{+}，按天分⽚index，⼀般来说我们会按照时间分⽚，时间格式参考/joda-time/apidocs/org/joda/time/format/。filebeat从ELK 6.x开始，log4j输⼊插件已经不再建议使⽤，推荐的替代是filebat。filebeat⼯作原理参考/guide/en/beats/filebeat/6.2/lebeat由两个主要组件组成， prospectors和harvesters，他们⼀起协作tail⽂件并将事件发送给声明的输出。harvester的职责是以⾏为单位读取⽂件，发送给输出，每个⽂件由不同的harvester读取。prospector的职责是管理harvester并找到要读取的⽂件。Filebeat当前⽀持log和stdin这两种prospector，每种prospector可以定义多次。Filebeat在注册表(通过参数ry_file声明，默认是${}/registry)中记录了每个⽂件的状态，状态记录了上⼀次harvester的读取偏移量。prospector则记录了每个找到的⽂件的状态。Filebeat确保所有的事件都被发送⾄少⼀次。filebeat的配置⽂件同样采⽤YAML格式。ctors:- type: log paths: - /var/log/*.log # 声明⽇志⽂件的绝对路径 fields: type: syslog # 声明增加⼀个值为syslog的type字段到事件中sh: hosts: ["localhost:5044"]filebeat⽀持输出到Elasticsearch或者Logstash，⼀般来说通⾏的做法都是到Logstash，所以到ES的相关配置略过。filebeat的命令⾏选项可以参考/guide/en/beats/filebeat/6.2/，配置⽂件所有配置项参考/guide/en/beats/filebeat/6.2/。默认情况下，filebeat运⾏在后台，要以前台⽅式启动，运⾏./filebeat -e。要使⽤Filebeat，我们需要在配置⽂件的ctors下声明prospector，prospector不限定只有⼀个。例如：ctors:- type: log paths: - /var/log/apache/httpd-*.log- type: log paths: - /var/log/messages - /var/log/*.log其他有⽤的选项还包括include_lines（仅读取匹配的⾏）、exclude_lines（不读取匹配的⾏）、exclude_files（排除某些⽂件）、tags、fields、fields_under_root、close_inactive（⽇志⽂件多久没有变化后⾃动关闭harvester，默认5分钟）、scan_frequency（prospector为harvester扫描新⽂件的频率，注意，因close_inactive⾃动关闭的也算新⽂件，默认为10s，不要低于1s）等具体可见/guide/en/beats/filebeat/6.2/。解析多⾏消息对于采⽤ELK作为应⽤⽇志来说，多⾏消息的友好展⽰是必不可少的，否则ELK的价值就⼤⼤打折了。要正确的处理多⾏消息，需要在中设置multiline规则以声明哪些⾏属于⼀个事件。主要是由n、、这三个参数决定。⽐如，对于java⽇志⽽⾔，可以使⽤：n: '^[': : after或者：n: '^[[:space:]]+(at|.{3})b|^Caused by:': : after这样，下⾯的⽇志就算⼀个事件了。[beat-logstash-some-name-832-2015.11.28] IndexNotFoundException[no such index] at ameExpressionResolver$e(:566) at teIndices(:133) at teIndices(:77) at lock(:75)详细的配置可以参考/guide/en/beats/filebeat/6.2/。Filebeat⽀持的输出包括Elasticsearch、Logstash、Kafka、Redis、File、Console，都挺简单，可以参考/guide/en/beats/filebeat/6.2/。Filebeat模块提供了⼀种更便捷的⽅式处理常见的⽇志格式，⽐如apache2、mysql等。从性质上来说，他就像spring boot，约定优于配置。具体可以参考/guide/en/beats/filebeat/6.2/。Filebeat模块要求Elasticsearch 5.2以及之后的版本。

发布者：admin，转转请注明出处：http://www.yc00.com/xiaochengxu/1689545999a265078.html