How can I perform a SUM window function with a time range but handle duplicate timestamps row-wise in SQL - Stack Overflow

I have a scenario where I need to calculate a running total using the SUM window function in SQL. The i

I have a scenario where I need to calculate a running total using the SUM window function in SQL. The issue arises because some rows have duplicate timestamps, and the RANGE clause in the window function groups all rows with the same timestamp together, causing incorrect calculations.

Here’s an example of the SQL I’m trying to use:

SUM(volume) OVER (
    PARTITION BY ID
    ORDER BY td.timestamp
    RANGE BETWEEN INTERVAL '60' SECOND PRECEDING AND CURRENT ROW
) AS total_volume

Problem:

  • When there are duplicate timestamps, the RANGE function groups all entries with the same timestamp into the same window, leading to unexpected results.
  • I need to process the rows individually (row-wise) within the same timestamp.

Constraints:

  • I can't add any slight noise in timestamp column, as it will change my time window. It is to be calculate in precision.

Is there a way to adjust the SQL to process rows correctly within the same timestamp range while adhering to the time window logic?

Input

timestamp Volume
2024-11-16 08:00:00 10
2024-11-16 08:00:00 20
2024-11-16 08:01:00 30
2024-11-16 08:02:00 40
2024-11-16 08:02:00 50

I have a scenario where I need to calculate a running total using the SUM window function in SQL. The issue arises because some rows have duplicate timestamps, and the RANGE clause in the window function groups all rows with the same timestamp together, causing incorrect calculations.

Here’s an example of the SQL I’m trying to use:

SUM(volume) OVER (
    PARTITION BY ID
    ORDER BY td.timestamp
    RANGE BETWEEN INTERVAL '60' SECOND PRECEDING AND CURRENT ROW
) AS total_volume

Problem:

  • When there are duplicate timestamps, the RANGE function groups all entries with the same timestamp into the same window, leading to unexpected results.
  • I need to process the rows individually (row-wise) within the same timestamp.

Constraints:

  • I can't add any slight noise in timestamp column, as it will change my time window. It is to be calculate in precision.

Is there a way to adjust the SQL to process rows correctly within the same timestamp range while adhering to the time window logic?

Input

timestamp Volume
2024-11-16 08:00:00 10
2024-11-16 08:00:00 20
2024-11-16 08:01:00 30
2024-11-16 08:02:00 40
2024-11-16 08:02:00 50

Current Result (Using RANGE and Grouping by Timestamp)

timestamp RollVolume
2024-11-16 08:00:00 30
2024-11-16 08:00:00 30
2024-11-16 08:01:00 30
2024-11-16 08:02:00 90
2024-11-16 08:02:00 90

Expected Output

timestamp RollVolume
2024-11-16 08:00:00 10
2024-11-16 08:00:00 30
2024-11-16 08:01:00 30
2024-11-16 08:02:00 40
2024-11-16 08:02:00 90

Here, The RollVolume is calculated row by row within each timestamp, instead of grouping rows with identical timestamps.

Share Improve this question asked Nov 16, 2024 at 16:25 Saurabh GhadgeSaurabh Ghadge 11 bronze badge 11
  • in you query you have used partition by ID, but your input data does not have ID, can you correct the input data format and also check if the expected output is correct? – samhita Commented Nov 16, 2024 at 18:40
  • Please tag which DBMS you're using. (SQL Server, MySQL, PostgreSQL, Oracle, etc) – MatBailie Commented Nov 16, 2024 at 20:53
  • Note: if two rows have the same timestamp, they both happen within 0s of each other. SQL data sets have no implicit ordering, which means that in your data the volume=10 row doesn't occur "before" the volume=20 row (or vice versa). You'd have to assert something like the lowest volume row happens first. – MatBailie Commented Nov 16, 2024 at 20:57
  • Hey @samhita , the input data I provided intentionally does not include an ID column, as the operation is meant to be carried out within a specific partition (e.g., ID) and the data given is for only for one specific ID. This means that the rolling calculation should respect both the time-based window (last 60 seconds) and the implicit partitioning by ID. – Saurabh Ghadge Commented Nov 17, 2024 at 2:44
  • Hey @MatBailie The data represents events that occur at the same exact timestamp, and their order cannot be determined inherently or by any secondary attribute rather we assume that rows those rows itself has some order . Introducing an assumption (e.g., the lowest volume occurs first) would be arbitrary and could lead to inaccurate calculation. My requirement is to calculate the rolling total strictly based on timestamps within a 60-second window, processing the rows row by row as they appear in the dataset without introducing assumptions about implicit ordering. – Saurabh Ghadge Commented Nov 17, 2024 at 2:48
 |  Show 6 more comments

2 Answers 2

Reset to default 1

The simplest option is to run a cumulative sum from start to finish, without the time RANGE, then use a second cumulative sum to deduct the unwanted rows, with the time RANGE...

This ensures you can use an id column to enforce an ordering without running into errors when trying to use RANGE BETWEEN.

It also ensures the you only include rows "> 60s ago" rather than rows ">= 60s ago".

Performance wise, it only scans the data once, avoiding the cost of correlated sub-queries.

CREATE TABLE example (
  id    BIGINT GENERATED ALWAYS AS IDENTITY,
  x     INT, 
  ts    TIMESTAMP,
  val   INT
) 
CREATE TABLE
INSERT INTO
  example (x, ts, val)
VALUES
  (1, '2024-11-16 08:00:00',    10),
  (1, '2024-11-16 08:00:00',    20), 
  (1, '2024-11-16 08:01:00',    30),
  (1, '2024-11-16 08:02:00',    40), 
  (1, '2024-11-16 08:02:00',    50) 
INSERT 0 5
SELECT
  *, 
  SUM(val)
    OVER (
      PARTITION BY x
          ORDER BY ts, id
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  )
  -
  COALESCE(
    SUM(val)
      OVER (
         PARTITION BY x
             ORDER BY ts
        RANGE BETWEEN UNBOUNDED PRECEDING AND INTERVAL '60' SECOND PRECEDING
    )
    ,
    0
  )
    AS rolling_total 
FROM
  example

id x ts val rolling_total
1 1 2024-11-16 08:00:00 10 10
2 1 2024-11-16 08:00:00 20 30
3 1 2024-11-16 08:01:00 30 30
4 1 2024-11-16 08:02:00 40 40
5 1 2024-11-16 08:02:00 50 90
SELECT 5

fiddle

As mentioned in the comments, deterministic row ordering is required for accurate results. The below uses PostgreSQL's ctid which represents the physical location of each row in a table but can change with table updates.

https://dbfiddle.uk/xxm_Ujpm

WITH ordered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY timestamp, Volume, ctid) AS rn
    FROM input
)
WITH ordered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY timestamp, Volume, ctid) AS rn
    FROM input
)
WITH ordered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY timestamp, Volume, ctid) AS rn
    FROM input
)
SELECT
    o1.timestamp,
    (
        SELECT SUM(o2.Volume)
        FROM ordered o2
        WHERE o2.timestamp > o1.timestamp - INTERVAL '60 seconds'
          AND o2.timestamp <= o1.timestamp
          AND ( 
              o2.timestamp < o1.timestamp 
              OR (o2.timestamp = o1.timestamp AND o2.rn <= o1.rn) 
          )
    ) AS RollVolume
FROM ordered o1
ORDER BY o1.timestamp, o1.rn;

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745652728a4638361.html

相关推荐

  • 面试官:从三万英尺角度谈一下Ceph架构设计(1)

    把面试官当陪练,在找工作中才会越战越勇大家好我是小义同学,这是大厂面试拆解——项目实战系列的第3篇文章,如果有误,请指正。本文主要解决的一个问题,Ceph为例子 如何描述项目的架构。一句话描述:主要矛盾发生变化10年前的技术和方案,放到10

    1小时前
    00
  • 如何打造高效AI智能体?

    作者|Barry Zhang, Anthropic地址|出品|码个蛋(ID:codeegg)整理|陈宇明最近看到了 Anthropic 那篇著名的《Building effective agents》作者之一 Barry Zhang 在 2

    1小时前
    00
  • 电脑开机会默认一件GHOST

    关于电脑开机会自己重装系统 前段时间电脑一开机就遇到会自己ghost的问题&#xff0c;而且一直再重复同样的操作&#xff0c;我点击restart的时候到开启页面又会自动ghost&#xff0c;而且此页面停留

    1小时前
    00
  • 长读长测序揭示结直肠癌异常可变剪接图谱与新型治疗靶点

    徐州医科大学肿瘤研究所董东郑骏年教授团队在Genome Medicine杂志发表题为“Long-read sequencing reveals the landscape of aberrant alternative splicing

    1小时前
    00
  • 国产车载通信测试方案:车规级CAN SIC芯片测试技术解析

    随着智能网联汽车的快速发展,车辆内部电子控制单元(ECU)数量激增,动力总成、高级驾驶辅助系统(ADAS)、车身控制等功能对车载通信网络的稳定性与速率提出了更高要求。传统CAN FD总线在复杂拓扑中面临信号振铃、通信速率受限(实际速率通常低

    1小时前
    00
  • Prometheus配置docker采集器

    Prometheus 配置 Docker 采集器Prometheus 是一个开源的监控系统和时间序列数据库,广泛用于容器化环境中。通过监控 Docker 容器,用户可以实时获取服务性能、资源使用情况等信息。本文将介绍如何为 Docker 容

    56分钟前
    00
  • 开源在线考试系统

    看到调问已经开始扩展在线考试的场景,试了一下,发现在线考试的基本能力都已经支持了。主要是考试中的各种计分功能,包括对每道题的选项设置分值计算、考试时间限制等,和官方了解了一下,考试中的其他各项能力也在逐步完善,有需求可以随时

    53分钟前
    00
  • 如何快速判断 Flutter 库是否需要适配鸿蒙?纯 Dart 库无需适配!

    在鸿蒙开发中,选择合适的 Flutter 库至关重要。纯 Dart 库因其跨平台特性,无需适配即可直接使用。但对于新手来说,如何判断一个库是否为纯 Dart 库呢?本文将为你提供清晰的判断方法和实用技巧。一、检查 pubspec.yaml

    48分钟前
    00
  • MongoDB “升级项目” 大型连续剧(2)

    上期写的是非必要不升级,想到这个事情,有一些事情的仔细琢磨琢磨,为什么数据库升级的事情在很多公司都是一个困扰,从一个技术人的观点,升级是一件好事,功能提升了,性能提升了,开发效率和一些数据库使用的痛点也被解决了,为什么就不愿意升级呢?如果只

    47分钟前
    00
  • CUT&amp;amp;Tag 数据处理和分析教程(7)

    过滤某些项目可能需要对比对质量分数进行更严格的过滤。本文细讨论了bowtie如何分配质量分数,并举例说明。MAPQ(x) = -10 * log10log10(P(x is mapped wrongly)) = -10 * log10(p)

    45分钟前
    10
  • module &#x27;torch.

    踩坑Ascend, 安装 pytorch 2.5.1 和 pytorch_npu 2.5.1, import torch 报错.执行 python -c "import torch;import torch_npu;"时

    37分钟前
    10
  • 【Docker项目实战】使用Docker部署IT工具箱Team·IDE

    一、Team·IDE介绍1.1 Team·IDE简介Team IDE 是一款集成多种数据库(如 MySQL、Oracle、金仓、达梦、神通等)与分布式系统组件(如 Redis、Zookeeper、Kafka、Elasticsearch)管理

    36分钟前
    00
  • maxwell遇到的一则问题

    结论和原因maxwell的元数据库里面没有存储全部的schema数据(就是少数据了),导致相关表的DDL校验失败。PS:我这里maxwell的作用只是采集库表修改情况的统计粗粒度指标,因为之前maxwell在运行报错的时候,直接修改了pos

    31分钟前
    00
  • windows新建open ai密钥

    api链接 openai的api需要付费才能使用但好像系统变量不知道为啥用不了打印出来&#xff0c;获取到的是None可以用了

    26分钟前
    00
  • 最后讲一遍:ChatGPT 快速生成国内外研究现状的方法

    在科研工作中,梳理国内外研究现状有助于明确研究方向,发现研究空白,为后续研究提供理论支持与创新思路。本文将详细介绍如何借助 ChatGPT 高效生成国内外研究现状,帮助您在有限时间内构建全面、专业的文献综述框架,提升学术写作效率与质量。St

    24分钟前
    00
  • 人工智能适合什么人学

    一、引言:人工智能浪潮下的新机遇在当今科技飞速发展的时代,人工智能(AI)无疑是最为耀眼的技术明星之一。从智能语音助手到自动驾驶汽车,从医疗诊断辅助到金融风险预测,人工智能正以前所未有的速度改变着我们的生活和工作方式。随着全球领先的终身学习

    23分钟前
    00
  • ​2025 轻松部署 Odoo 18 社区版

    随着 Odoo 18 社区版的发布,越来越多的企业希望借助这款开源 ERP 系统实现数字化转型。本文将深入解析传统部署方式的底层逻辑,并揭示如何通过自动化工具实现零门槛快速部署。一、手工部署 Odoo 18 技术全解 Docker 环境搭建

    18分钟前
    00
  • Java&amp;Activiti7实战:轻松构建你的第一个工作流

    本文已收录在Github,关注我,紧跟本系列专栏文章,咱们下篇再续!

    17分钟前
    00
  • windows切换系统版本

    powershell 管理员身份打开 输入 irm massgrave.devget | iex 输入数字 对应后面写着 change windows edition新的会话框中选择想要的版本即可 获取windows 密钥 官方提供的

    13分钟前
    00
  • 设计模式:工厂方法模式(Factory Method)(2)

    当年做一个项目时,还不懂什么是设计模式,仅仅是按照经验完成了需求。回头看看,就是暗合桥接模式。但是,在整个需求实现过程中,甲方需要我在已经设计好的标准业务逻辑中添加非标的需求,因为,在他们眼里,从业务角度来看,是自然的拓展。如果当年我知道还

    10分钟前
    00

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信