2023年7月17日发(作者:)
ELK实践(三)北京历年空⽓质量数据分析⼀、了解数据及建模北京空⽓质量数据,
数据建模:PUT air_quality{ "mappings": { "doc": { "dynamic": false, "properties": { "@timestamp": { "type": "date" }, "city": { "type": "keyword", "ignore_above": 256 }, "parameter": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "keyword", "ignore_above": 256 }, "value": { "type": "long" } } } }, "settings": { "index": { "number_of_shards": "1", "number_of_replicas": "0" } }}⼆、导⼊数据本次实验使⽤Filebeat + Ingest Nodde架构。
filebeat 配置:ctors:- type: log enabled: true paths: - /home/wfs/data/*.csv exclude_lines: ["^A ","^The","^Site","^,"]csearch: hosts: ["192.168.20.20:9200"] pipeline: "airquality" index: "air_quality" username: "elastic" password: "123456": "airquality"n: "airquality*"d: false数据:# ls /home/wfs/data/Beijing_2008_HourlyPM2.5_ Beijing_2014_HourlyPM25_jing_2009_HourlyPM25_ Beijing_2015_HourlyPM25_jing_2010_HourlyPM25_ Beijing_2016_HourlyPM25_jing_2011_HourlyPM25_ Beijing_2017_HourlyPM25_jing_2012_HourlyPM2.5_
Beijing_2013_HourlyPM2.5_eline配置:PUT /_ingest/pipeline/airquality{ "processors": [ { "grok": { "field": "message", "patterns": [ ] } }, { "set": { "field": "_id", "value": "{{city}}-{{date}}" } }, { "date": { "field": "date", "target_field": "@timestamp", "formats": [ "MM/dd/yyyy HH:mm", "yyyy-MM-dd HH:mm" ], "timezone": "Asia/Shanghai" } }, { "remove": { "field": "message" } }, "%{DATA:city},%{DATA:parameter},%{DATA:date},%{NUMBER:year},%{NUMBER:month},%{NUMBER:day},%{NUMBER:hour},%{NUMBER:value},% }, { "remove": { "field": "beat" } }, { "remove": { "field": "offset" } }, { "remove": { "field": "source" } }, { "remove": { "field": "date" } }, { "convert": { "field": "year", "type": "integer" } }, { "convert": { "field": "month", "type": "integer" } }, { "convert": { "field": "day", "type": "integer" } }, { "convert": { "field": "hour", "type": "integer" } }, { "remove": { "field": "duration" } }, { "remove": { "field": "unit" } }, { "convert": { "field": "value", "type": "integer" } } ], "on_failure": [ { "set": { "field": "e", "field": "e", "value": "{{ __failure_message }}" } } ]}数据导⼊完成后诶下图所⽰:
可以看到数据是以⼩时为间隔采集的,为了便于分析,可以借助python,将⼩时数据聚合到天的维度。from datetime import datetimefrom elasticsearch import Elasticsearches = Elasticsearch(['192.168.20.20:9200'])search_query = { "query": { "range": { "value": { "gte": 1 } } }, "aggs": { "days": { "date_histogram": { "field": "@timestamp", "interval": "day", "time_zone": "+08:00" }, "aggs": { "pm25": { "stats": { "field": "value" } } } } }, "size": 0}res = (index='air_quality', body=search_query)index_name = 'air_quality_days'index_type = 'doc'(index=index_name, ignore=[400, 404])for info in res['aggregations']['days']['buckets']: cur_date = me(info['key_as_string'], '%Y-%m-%dT%H:%M:%S.%f+08:00') new_doc = { "@timestamp": info['key_as_string'], 'year': cur_, 'month': cur_, 'day': cur_, "value_max": info['pm25']['max'], "value_avg": info['pm25']['avg'], "value_min": info['pm25']['min'], } (index=index_name, doc_type=index_type, id=new_doc['@timestamp'], body=new_doc) print(new_doc)可以看到,上边的search_query实质上是对value⼤于1的按天进⾏分桶,并使⽤status返回当天PM25的系列统计值,完全等价于如下DSL:GET air_quality/_search{ "size":0, "query": { "range": { "value": { "gt": 1 } } }, "aggs": { "days": { "date_histogram": { "field": "@timestamp", "interval": "day", "time_zone": "+08:00" }, "aggs": { "PM25": { "stats": { "field": "value" } } } } }}然后对上述聚合分析循环遍历,创建新的以天为维度的索引air_quality_days:
三、数据实战分析通过数据,我们⾸先可以从整体上看下⼗年来空⽓质量是否有好转:
1.空⽓质量分析 – 每年蓝天占⽐饼图:
这⾥使⽤了脚本动态⽣成到了rate_level字段,
在ManageMent–>Index Patterns中配置:
脚本内容:double val=doc['value_max'].value;String rtn="";if(val<50){rtn="1-Good"}else if(val<100){rtn="2-Moderate"}else if(val<150){rtn="3-Unhealthy for Sensitive Groups"}else if(val<200){rtn="4-Unhealthy"}else if(val<300){rtn="5-Very Unhealthy"}else{rtn="6-Hazardous"}return rtn;2.空⽓质量分析 – AQI质量随时间分布占⽐:
配置⽐较简单:Options中设置:Chart Type:Bar Stacked:Percent
3.空⽓质量分析 – 每年蓝天占⽐(VB):
以value_max的值等于100为判断依据,⼩于100定为Good,Panel Options 中将书剑间隔Interval设置为1y即可 4.空⽓质量分析 – 每⽉蓝天占⽐(VB):
同理,将时间间隔修改为1M:
然后在具体到某⼀时间段内,看⼀下空⽓质量是否有改善。⽐如通过数据对⽐2016年冬季较2015年同⼀时刻的空⽓质量情况。
1.空⽓质量分析 – 2016 vs 2015 冬季雾霾天数占⽐(TL).es(index=air_quality_days,q='value_max:>150',offset=-1y).divide(.es(index=air_quality_days,offset=-1y)).multiply(100).label(2015).lines(fill=1,width=2.空⽓质量分析 – 2016 VS 2015 PM25最⼤值⽐较 (VB)
将 Offset series time设置为1y即可得到2015年的数据,通过Fill(0-1)和Line Width控制线的样式和透明度 3.空⽓质量分析 – 2016、2015年冬季雾霾天数(VB Metric)
4.空⽓质量分析 – 2016、2015年冬季每天空⽓质量情况 最后附上两个dashboard:
发布者:admin,转转请注明出处:http://www.yc00.com/news/1689545263a264983.html
评论列表(0条)