2023年7月17日发(作者:)
Grok解析elk⽇志
Grok是迄今为⽌使蹩脚的、⽆结构的⽇志结构化和可查询的最好⽅式。Grok在解析 syslog logs、apache and other webserverlogs、mysql logs等任意格式的⽂件上表现完美。
⼆、⼊门例⼦
下⾯是⼀条tomcat⽇志:
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/HTTP/1.1" 200 203023 "http:///presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; IntelMac OS X 10_9_1) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/32.0.1700.77 Safari/537.36"
- 1
- 2
- 3
- 4
从filebeat中输出到logstash,配置如下:
input { beats { port => "5043" }}filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}"} }}output { stdout { codec => rubydebug }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 12
- 13
{ "request" => "/presentations/logstash-monitorama-2013/images/", "agent" => ""Mozilla/5.0 (Macintosh; Intel Mac OS X10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"", "offset" => 325, "auth" => "-", "ident" => "-", "input_type" => "log", "verb" => "GET", "source" => "/path/to/file/", "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42+0000] "GET /presentations/logstash-monitorama-2013/images/ HTTP/1.1" 200 203023"/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"", "type" => "log", "tags" => [ [0] "beats_input_codec_plain_applied" ], "referrer" => ""/presentations/logstash-monitorama-2013/"", "@timestamp" => 2016-10-11T21:04:36.167Z, "response" => "200", "bytes" => "203023", "clientip" => "83.149.9.216", "@version" => "1", "beat" => { "hostname" => "", "name" => "" }, "host" => "", "httpversion" => "1.1", "timestamp" => "04/Jan/2015:05:13:42 +0000"}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 28
再⽐如,下⾯这条⽇志:
55.3.244.1 GET/ 15824 0.043
- 1
这条⽇志可切分为5个部分,IP(55.3.244.1)
、⽅法(GET)
、请求⽂件路径(/)
、字节数(15824)
、访问时长(0.043)
,对这条⽇志的解析模式(正则表达式匹配)如下:
%{IP:client} %{WORD:method} %{URIPATHPARAM:request } %{NUMBER:bytes} %{NUMBER:duration }
- 1
写到filter中:
filter { grok { match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"} }}
- 1
- 2
- 3
- 4
- 5
解析后:
client: 55.3.244.1method: GETrequest: /index.htmlbytes: 15824duration: 0.043
- 1
- 2
- 3
- 4
- 5
三、解析任意格式⽇志
解析任意格式⽇志的步骤:
- 先确定⽇志的切分原则,也就是⼀条⽇志切分成⼏个部分。
- 对每⼀块进⾏分析,如果Grok中正则满⾜需求,直接拿来⽤。如果Grok中没⽤现成的,采⽤⾃定义模式。
- 学会在Grok Debugger中调试。
下⾯给出例⼦,来两条⽇志:
2017-03-0700:03:44,373 4191949560 [ :330:DEBUG] entering doFilter()2017-03-16 00:00:01,641 133383049 [ :234:INFO ] 上报内容准备写⼊⽂件
- 1
- 2
- 3
切分原则:
2017-03-1600:00:01,641:时间133383049:编号UploadFileModel.java:java类名234:代码⾏号INFO:⽇志级别entering doFilter():⽇志内容
- 1
- 2
- 3
- 4
- 5
- 6
- 7
前五个字段⽤Grok中已有的,分别是TIMESTAMP_ISO8601
、NUMBER
、JAVAFILE
、NUMBER
、LOGLEVEL
,最后⼀个采⽤⾃定义正则的形式,⽇志级别的]之后的内容不论是中英⽂,都作为⽇志信息处理,使⽤⾃定义正则表达式⼦的规则如下:
(?<field_name>the pattern here)
- 1
最后⼀个字段的内容⽤info表⽰,正则如下:
(?<info>([sS]*))
1
上⾯两条⽇志对应的完整的正则如下,其中s*
⽤于剔除空格。
s*%{TIMESTAMP_ISO8601:time}s*%{NUMBER:num} [s*%{JAVAFILE:class}s*:s*%{NUMBER:lineNumber}s*:%{LOGLEVEL:level}s*]s*(?<info>([sS]*))
- 1
正则解析容易出错,强烈建议使⽤Grok Debugger调试,姿势如下。
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvbmFwb2F5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/
评论列表(0条)