ELK收集Apache访问日志实战案例

ELK收集Apache访问日志实战案例

2023年7月17日发(作者:)

ELK收集Apache访问⽇志实战案例ELK收集⽇志常⽤的有两种⽅式,分别是:1. 不修改源⽇志的格式,⽽是通过logstash的grok⽅式进⾏过滤、清洗,将原始⽆规则的⽇志转换为规则的⽇志。2. 修改源⽇志输出格式,按照需要的⽇志格式输出规则⽇志,logstash只负责⽇志的收集和传输,不对⽇志做任何的过滤清洗。这两种⽅式各有优缺点,第⼀种⽅式不⽤修改原始⽇志输出格式,直接通过logstash的grok⽅式进⾏过滤分析,好处是对线上业务系统⽆任何影响,缺点是logstash的grok⽅式在⾼压⼒情况下会成为性能瓶颈,如果要分析的⽇志量超⼤时,⽇志过滤分析可能阻塞正常的⽇志输出。因此,在使⽤logstash时,能不⽤grok的,尽量不使⽤grok过滤功能。第⼆种⽅式缺点是需要事先定义好⽇志的输出格式,这可能有⼀定⼯作量,但优点更明显,因为已经定义好了需要的⽇志输出格式,logstash只负责⽇志的收集和传输,这样就⼤⼤减轻了logstash的负担,可以更⾼效的收集和传输⽇志。另外,⽬前常见的web服务器,例如apache、nginx等都⽀持⾃定义⽇志输出格式。因此,在企业实际应⽤中,第⼆种⽅式是⾸选⽅案。以上次部署的 ELK+Filebeat+Kafka+ZooKeeper 集群架构进⾏收集:apache业务服务器(可以有多个)上部署Filebeat采集⽇志信息。将收集到的数据信息传输到kafka集群,对数据做⼀定的缓冲和存储。logstash从kafka集群拉取并消费数据,对数据做⼀定的简单处理和分析,最后交给es集群进⾏存储和索引,最终展⽰在kibana上。操作系统统⼀采⽤Centos7.4版本,各个服务器⾓⾊如下表所⽰:IP地址192.168.126.90192.168.126.91192.168.126.92192.168.126.93192.168.126.94192.168.126.95192.168.126.96192.168.126.97主机名filebeatserverkafkazk1kafkazk2kafkazk3logstashserveres1es2es3⾓⾊apache业务服务器+FilebeatKafka+ZookeeperKafka+ZookeeperKafka+ZookeeperLogstashES Master、ES DataNodeES Master、KibanaES Master、ES DataNode所属集群业务服务器集群Kafka Broker集群Kafka Broker集群Kafka Broker集群Logstash集群Elasticsearch集群Elasticsearch集群Elasticsearch集群硬件配置2U0.5G 20G2U1.5G 20G2U1.5G 20G2U1.5G 20G2U1.5G 20G2U2G 20G2U2G 20G2U2G 20G下表说明了安装软件对应的名称和版本号,其中,ELK三款软件推荐选择⼀样的版本,这⾥选择的是6.5.4版本。软件名称JDKfilebeatlogstashzookeeperkafkaelasticsearchkibana版本JDK 1.8.0_161filebeat-6.5.4logstash-6.5.4zookeeper-3.4.12kafka_2.10-0.10.0.1elasticsearch-6.5.4kibana-6.5.4说明Java环境解析器前端⽇志收集器⽇志收集、过滤、转发资源调度、协作消息通信中间件⽇志存储、索引⽇志展⽰、分析ELK+Filebeat+Kafka+ZooKeeper环境已经部署好。apache的⽇志格式与⽇志变量apache⽀持⾃定义输出⽇志格式,但是,apache有很多⽇志变量字段,所以在收集⽇志前,需要⾸先确定哪些是我们需要的⽇志字段,然后将⽇志格式定下来。要完成这个⼯作,需要了解apache⽇志字段定义的⽅法和⽇志变量的含义,在apache配置⽂件中,对⽇志格式定义的配置项为LogFormat,默认的⽇志字段定义为如下内容:[root@filebeatserver ~]# yum install httpd -y[root@filebeatserver ~]# grep -w 'combined' /etc/httpd/conf/

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined CustomLog "logs/access_log" combinedapache的⽇志格式与⽇志变量⾃定义apache⽇志格式这⾥定义将apache⽇志输出为json格式,下⾯仅列出apache配置⽂件中⽇志格式和⽇志⽂件定义部分,定义好的⽇志格式与⽇志⽂件如下:[root@filebeatserver ~]# vim /etc/httpd/conf/Format "{"@timestamp":"%{%Y-%m-%dT%H:%M:%S%z}t","client_ip":"%{X-Forwarded-For}i","direct_ip": "%a","request_time":%T,"status":%>s,"url":"%U%q","method":"%m","http_host":"%{Host}i","server_ip":"%A","http_referer":"%{Referer}i","http_user_agent":"%{User-agent}i","body_bytes_sent":"%B","total_bytes_sent":"%O"}" access_log_jsonCustomLog logs/ access_log_json这⾥通过LogFormat指令定义了⽇志输出格式,在这个⾃定义⽇志输出中,定义了13个字段,定义⽅式为:字段名称:字段内容,字段名称是随意指定的,能代表其含义即可,字段名称和字段内容都通过双引号括起来,⽽双引号是特殊字符,需要转移,因此,使⽤了转移字符“”,每个字段之间通过逗号分隔。此外,还定义了⼀个时间字段 @timestamp,这个字段的时间格式也是⾃定义的,此字段记录⽇志的⽣成时间,⾮常有⽤。CustomLog指令⽤来指定⽇志⽂件的名称和路径。需要注意的是,上⾯⽇志输出字段中⽤到了body_bytes_sent和total_bytes_sent发送字节数统计字段,这个功能需要apache加载mod_模块,如果没有加载这个模块的话,需要安装此模块并在⽂件中加载⼀下即可。验证⽇志输出[root@filebeatserver ~]# ifconfig ens32 | awk 'NR==2 {print $2}'192.168.126.90[root@filebeatserver ~]# tailf /etc/httpd/logs/

{"@timestamp":"2021-08-13T13:13:05+0800","client_ip":"-","direct_ip": "192.168.126.1","request_time":0,"status":403,"url":"/","method":"GET","http_host":"192.168.126.90","server_ip":"192.168.126.90","http_referer":"-","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","body_bytes_sent":"4897","total_bytes_sent":"5205"}这是没有经过反向代理进⾏访问的结果配置⼀层反向代理服务器进⾏访问[root@kafkazk1 ~]# ifconfig ens32 | awk 'NR==2 {print $2}'192.168.126.91[root@kafkazk1 ~]# vim /etc/httpd/conf/

ProxyPass / 192.168.126.90ProxyPassReverse / 192.168.126.90/[root@kafkazk1 ~]# systemctl restart httpd[root@filebeatserver ~]# tailf /etc/httpd/logs/

{"@timestamp":"2021-08-13T13:22:48+0800","client_ip":"192.168.126.1","direct_ip": "192.168.126.91","request_time":0,"status":403,"url":"/","method":"GET","http_host":"192.168.126.90","server_ip":"192.168.126.90","http_referer":"-","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","body_bytes_sent":"4897","total_bytes_sent":"5205"}配置两层代理服务器进⾏访问[root@kafkazk2 ~]# ifconfig ens32 | awk 'NR==2 {print $2}'192.168.126.92[root@kafkazk2 ~]# vim /etc/httpd/conf/oxyPass / 192.168.126.91ProxyPassReverse / 192.168.126.91/[root@kafkazk2 ~]# systemctl restart httpd[root@filebeatserver ~]# tailf /etc/httpd/logs/

{"@timestamp":"2021-08-13T14:04:20+0800","client_ip":"192.168.126.1, 192.168.126.92","direct_ip": "192.168.126.91","request_time":0,"status":403,"url":"/","method":"GET","http_host":"192.168.126.90","server_ip":"192.168.126.90","http_referer":"-","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","body_bytes_sent":"4897","total_bytes_sent":"5205"}对⽐这三次的⽇志,可以看到,client_ip和direct_ip输出的异同,client_ip字段对应的变量为“%{X-Forwarded-For}i”,它的输出是代理叠加⽽成的IP列表,⽽direct_ip对应的变量为%a,表⽰不经过代理访问的直连IP,当⽤户不经过任何代理直接访问apache时,client_ip和direct_ip应该是同⼀个IP。配置filebeatfilebeat是安装在apache服务器上的,配置好的⽂件的内容:[root@filebeatserver ~]# cd /usr/local/filebeat/[root@filebeatserver filebeat]# vim :- type: log enabled: true paths: - /etc/httpd/logs/ fields: log_topic: apachelogsname: "192.168.126.90": enabled: true hosts: ["192.168.126.91:9092", "192.168.126.92:9092", "192.168.126.93:9092"] version: "0.10" topic: '%{[fields][log_topic]}' _robin: reachable_only: true worker: 2 required_acks: 1 compression: gzip max_message_bytes: : debug#

这个配置⽂件中,是将apache的访问⽇志/etc/httpd/logs/内容实时的发送到kafka集群topic为apachelogs中。需要注意的是filebeat输出⽇志到kafka中配置⽂件的写法。#

启动[root@filebeatserver filebeat]# nohup ./filebeat -e -c &[1] 4591nohup: ignoring input and appending output to ‘’测试[root@filebeatserver filebeat]# tailf 2021-08-13T14:21:57.368+0800 DEBUG [publish] pipeline/:308 Publish event: { "@timestamp": "2021-08-13T06:21:57.368Z", "@metadata": { "beat": "filebeat", "type": "doc", "version": "6.5.4" }, "prospector": { "type": "log" }, "input": { "type": "log" }, "beat": { "name": "192.168.126.90", "hostname": "filebeatserver", "version": "6.5.4" }, "host": { "name": "192.168.126.90" }, "source": "/etc/httpd/logs/", "offset": 21983, "message": "{"@timestamp":"2021-08-13T14:21:54+0800","client_ip":"-","direct_ip": "192.168.126.1","request_time":0,"status":404,"url":"/noindex/css/fonts/Bold/","method":"GET","http_host":"192.168.126.90","server_ip":"192.168.126.90","http_referer":"192.168.126.90/noindex/css/","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","body_bytes_sent":"238","total_bytes_sent":"453"}", "fields": { "log_topic": "apachelogs" }}同时验证kafka集群能否收集并消费到这些⽇志数据[root@kafkazk1 ~]# /usr/local/kafka/bin/ --zookeeper 192.168.126.91:2181,192.168.126.92:2181,192.168.126.93:2181 --topic

apachelogs{"@timestamp":"2021-08-13T06:21:57.368Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.5.4","topic":"apachelogs"},"source":"/etc/httpd/logs/","offset":21983,"message":"{"@timestamp":"2021-08-13T14:21:54+0800","client_ip":"-","direct_ip": "192.168.126.1","request_time":0,"status":404,"url":"/noindex/css/fonts/Bold/","method":"GET","http_host":"192.168.126.90","server_ip":"192.168.126.90","http_referer":"192.168.126.90/noindex/css/","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","body_bytes_sent":"238","total_bytes_sent":"453"}","fields":{"log_topic":"apachelogs"},"prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"192.168.126.90","hostname":"filebeatserver","version":"6.5.4"},"host":{"name":"192.168.126.90"}}能够成功获取到⽇志信息配置logstash下⾯直接给出logstash事件配置⽂件 kafka_apache_into_ 的内容:[root@logstashserver ~]# cat /usr/local/logstash/config/kafka_apache_into_put { kafka { bootstrap_servers => "192.168.126.91:9092,192.168.126.92:9092,192.168.126.93:9092" topics => ["apachelogs"] codec => json { charset => "UTF-8" } add_field => { "[@metadata][tagid]" => "apacheaccess_log" } }}filter { if [@metadata][tagid] == "apacheaccess_log" { mutate { gsub => ["message", "x", "x"] #这⾥的message就是message字段,也就是⽇志的内容。这个插件的作⽤是将message字段内容中UTF-8单字节编码做替换处理,这是为了应对URL有中⽂出现的情况。 } if ( 'method":"HEAD' in [message] ) { #如果message字段中有HEAD请求,就删除此条信息。 drop {} } json { #这是启⽤json解码插件,因为输⼊的数据是复合的数据结构,只是⼀部分记录是json格式的。 source => "message" #指定json格式的字段,也就是message字段。 add_field => { "[@metadata][direct_ip]" => "%{direct_ip}"} #这⾥添加⼀个字段,⽤于后⾯的判断。 remove_field => "@version" #从这⾥开始到最后,都是移除不需要的字段,前⾯九个字段都是filebeat传输⽇志时添加的,没什么⽤处,所以需要移除。 remove_field => "prospector" remove_field => "beat" remove_field => "source" remove_field => "input" remove_field => "offset" remove_field => "fields" remove_field => "host" remove_field => "message" #因为json格式中已经定义好了每个字段,那么输出也是按照每个字段输出的,因此就不需要message字段了,这⾥是移除message字段。 } mutate { split => ["client_ip", ","] #这是对client_ip这个字段按逗号进⾏分组切分,因为在多级代理情况下,client_ip获取到的IP可能是IP列表,如果是单个ip的话,也会进⾏分组,只不过是分⼀个组⽽已。 } mutate { replace => { "client_ip" => "%{client_ip[0]}" } #将切分出来的第⼀个分组赋值给client_ip,因为client_ip是IP列表的情况下,第⼀个IP才是客户端真实的IP。 } if [client_ip] == "-" { #这是个if判断,主要⽤来判断当client_ip为"-"的情况下,当direct_ip不为"-"的情况下,就将direct_ip的值赋给client_ip。因为在client_ip为"-"的情况下,都是直接不经过代理的访问,此时direct_ip的值就是客户端真实IP地址,所以要进⾏⼀下替换。 if [@metadata][direct_ip] not in ["%{direct_ip}","-"] { #这个判断的意思是如果direct_ip⾮空。 mutate { mutate { replace => { "client_ip" => "%{direct_ip}" } } } else { drop{} } } mutate { remove_field => "direct_ip" #direct_ip只是⼀个过渡字段,主要⽤于在某些情况下将值传给client_ip,因此传值完成后,就可以删除direct_ip字段了。 } }}#

如果需要调试可以将信息输出到屏幕,⽆误后在输出到后端es集群#output {# if [@metadata][tagid] == "apacheaccess_log" {# stdout {# codec => "rubydebug"# }# }#}output { if [@metadata][tagid] == "apacheaccess_log" { #⽤于判断,跟上⾯input中[@metadata][tagid]对应,当有多个输⼊源的时候,可根据不同的标识,指定到不同的输出地址。 elasticsearch { hosts => ["192.168.126.95:9200","192.168.126.96:9200","192.168.126.97:9200"] index => "logstash_apachelogs-%{+}" #指定apache⽇志在elasticsearch中索引的名称,这个名称会在Kibana中⽤到。索引的名称推荐以logstash开头,后⾯跟上索引标识和时间。 } }}#

启动[root@logstashserver ~]# nohup /usr/local/logstash/bin/logstash -f /usr/local/logstash/config/kafka_apache_into_ &配置Kibanafilebeat收集数据到kafka,然后logstash从kafka拉取数据,如果数据能够正确发送到elasticsearch,我们就可以在Kibana中配置索引了。[root@es2 ~]# ifconfig | awk 'NR==2 {print $2}'192.168.126.96#

启动[root@es2 ~]# nohup /usr/local/kibana/bin/kibana &⽤浏览器访问apache服务,让其产⽣⽇志后,⼀般需要过⼀⼩会才能在es集群中⽣成索引。也可以使⽤ES-HEAD插件查看es集群是否⽣成索引可以看到,如果发现es集群中⽣成了对应的索引,那么便可以继续以下步骤查看收集到的⽇志信息

发布者:admin,转转请注明出处:http://www.yc00.com/news/1689542901a264753.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信