sql—将两个表聚合为时间序列

w3nuxt5m  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(268)

假设我有两个表,它们成对地保存面向时间的数据(date,data)。其中一个是我的日常开支,另一个是我的日常收入。日期是指该费用/收入率的起始日期。
例如:如果我在收入表中有两个记录:(2020-01-01,50),(2020-02-14100)这意味着在2020-01-01和2020-02-14之间,我的收入是每天50个单位,而在2020-02-14之后,我的收入是每天100个单位。费用也一样。
我想在postgres表中为给定的时间间隔生成一个时间序列(可能使用time generate\ u series(date,date,interval)函数),其中一行如下所示:(day,income,expense)
例如,如果我的收入表是这样的:

+------------+--------+
|    date    | income |
+------------+--------+
| 2020-02-12 |     50 |
| 2020-02-14 |    100 |
+------------+--------+

费用如下:

+------------+--------------+
|    date    | expenses     |
+------------+--------------+
| 2020-02-12 |           70 |
| 2020-02-13 |           50 |
+------------+--------------+

我希望从2020-02-12到2020-02-15的时间间隔结果如下所示:

+------------+--------+--------------+
|    day     | income | expenses     |
+------------+--------+--------------+
| 2020-02-12 |     50 |           70 |
| 2020-02-13 |     50 |           50 |
| 2020-02-14 |    100 |           50 |
| 2020-02-15 |    100 |           50 |
+------------+--------+--------------+

例如,稍后我可以计算我的利润、亏损和其他统计数据。我怎样才能做到这一点?

ivqmmu1c

ivqmmu1c1#

这有点棘手。你可以 full join 但你需要填补空白。postgres不支持 ignore nulls 在窗口函数中,一个选项是使用条件和来构建组,然后 first_value() :

select
    date,
    first_value(income)  over(partition by grp_i order by date) income,
    first_value(expense) over(partition by grp_e order by date) expense
from (
    select
        date, 
        i.income,
        e.expense,
        count(*) filter(where i.income  is not null) over(order by date) grp_i,
        count(*) filter(where e.expense is not null) over(order by date) grp_e
    from incomes i
    full join expenses e using(date)
) t

另一方面,也可以从选定的日期段开始(使用 generate_series() ),然后把table带来 left join s。其余逻辑不变:

select
    date,
    first_value(income)  over(partition by grp_i order by date) income,
    first_value(expense) over(partition by grp_e order by date) expense
from (
    select
        d.date, 
        i.income,
        e.expense,
        count(*) filter(where i.income  is not null) over(order by d.date) grp_i,
        count(*) filter(where e.expense is not null) over(order by d.date) grp_e
    from generate_series(date '2020-02-12', date '2020-02-15', interval '1' day) d(date)
    left join incomes  i on i.date = d.date
    left join expenses e on e.date = d.date
) t
order by date
vddsk6oq

vddsk6oq2#

一种方法——如果数据不太大——是横向连接:

select gs.dte, i.income, e.expense
from generate_series('2020-02-12'::date, '2020-02-15'::date, interval '1 day'
                   ) gs(dte) left join lateral
     (select i.*
      from income i
      where i.date <= gs.dte
      order by i.date desc
      limit 1
     ) i
     on true left join lateral
     (select e.*
      from expense e
      where e.date <= gs.dte
      order by e.date desc
      limit 1
     ) e
     on true;

另一种选择是使用 generate_series() 在每个表上独立计算值。两者都是从最早的日期开始的,所以这是可行的:

select i.date, i.income, e.expense
from (select gs.date, i.income
      from (select i.*, lead(date) over (order by date) as next_date
            from income i
           ) i cross join lateral
           generate_series(date, coalesce(next_date - interval '1 day', '2020-02-15'::date), interval '1 day') gs(date)
     ) i join
     (select gs.date, e.expense
      from (select e.*, lead(date) over (order by date) as next_date
            from expense e
           ) e cross join lateral
           generate_series(date, coalesce(e.next_date - interval '1 day', '2020-02-15'::date), interval '1 day') gs(date)
     ) e 
     on i.date = e.date;

这是两种解决方案的混合体。
这很容易修改,以处理两个系列不在同一日期开始的情况。

相关问题