如何在sql server中循环数据并将上一批与当前批进行比较？

hsvhsicv 于 2021-08-09 发布在 Java

关注(0)|答案(1)|浏览(212)

每次我们收到一批新的数据时，我们都会将其加载到一个表中，该表存储我们以前收到的每个数据集。每批新产品都有一个新的 batch_id . 只有另外两列是 item_id 以及 cost . 我想建立一个报告，它将描述添加了多少新项（基于项的id）、删除了多少项以及最新批处理中项的净变化。考虑到这一点，我想比较所有批次，但只有这样，更高的 batch_id 比较 batch_id 就在那之前。澄清一下，如果我有3个 batch_id 然后我要第3批和第2批比较，第2批和第1批比较。第一批没有什么可比性。我可以用下面的脚本来实现这一点，但只针对我指定的批处理（结果中的第二行），对于成百上千的批处理，这会让人麻木。有没有办法在SQLServer中使其更具动态性？我在下面提供了一个示例数据集和预期结果。

WITH b1
     AS (SELECT *
         FROM example_cumulative
         WHERE batch_id = 2),
     b2
     AS (SELECT *
         FROM example_cumulative
         WHERE batch_id = 3)
     SELECT SUM(CASE
                    WHEN b1.item_id IS NULL
                    THEN 1
                    ELSE 0
                END) [new items], 
            SUM(CASE
                    WHEN b2.item_id IS NULL
                    THEN 1
                    ELSE 0
                END) [removed items], 
            COUNT(b2.item_id) - COUNT(b1.item_id) [net change in item volume]
     FROM b1
          FULL OUTER JOIN b2 ON b1.item_id = b2.item_id;

预期结果

CREATE TABLE example_cumulative
(batch_id INT NOT NULL, 
 item_id  INT NOT NULL, 
 cost     FLOAT NOT NULL
);
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,10,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,12,106)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,13,142)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,14,152)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,15,154)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,11,140)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,12,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,16,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,16,195)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,17,102)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,18,101)

sql Join sql-server tsql full-outer-join

来源：https://stackoverflow.com/questions/61902364/how-to-loop-through-data-and-compare-previous-batch-to-current-batch-in-sql-serv

1条答案

按热度按时间

56lgkhnf1#

你可以延长你的电流 full join 逻辑如下，通过在连续 batch_id s、像这样：

select
    coalesce(e1.batch_id, e0.batch_id + 1) batch_id,
    sum(case when e0.item_id is null then 1 else 0 end) new_items,
    sum(case when e1.item_id is null then 1 else 0 end) removed_items,
    count(e1.item_id) - count(e0.item_id) net_change
from example_cumulative e1
full join example_cumulative e0 
    on  e1.item_id  = e0.item_id
    and e1.batch_id = e0.batch_id + 1
where coalesce(e1.batch_id, e0.batch_id + 1) <= (select max(batch_id) from example_cumulative)
group by coalesce(e1.batch_id, e0.batch_id + 1)

这个 where 子句的作用是防止查询返回一个额外的、不需要的记录 batch_id 大于最大可用值。
对于示例数据，返回：

batch_id | new_items | removed_items | net_change
-------: | --------: | ------------: | ---------:
       1 |         6 |             0 |          6
       2 |         1 |             3 |         -2
       3 |         2 |             1 |          1

赞(0）回复(0）举报 2021-08-09

我来回答

如何在sql server中循环数据并将上一批与当前批进行比较？

1条答案

相关问题

热门标签

最新问答