python—使用字符串而不是datetime列为数据库编制索引

pprl5pva  于 2021-09-29  发布在  Java
关注(0)|答案(0)|浏览(125)

我有一个数据框架(df2),它由30年的每日气象数据组成。多次运行时重复此数据(请参阅运行文件年)。以下是 Dataframe 的示例:

Date  DHI  ...      WD    run_file_year
Date                                           ...                         
1991-01-01 00:00:00  01/01/1991 00:00:00  0.0  ...  281.70  1991_r1_r10i2p1
1991-01-01 01:00:00  01/01/1991 01:00:00  0.0  ...  281.01  1991_r1_r10i2p1
1991-01-01 02:00:00  01/01/1991 02:00:00  0.0  ...  274.43  1991_r1_r10i2p1
1991-01-01 03:00:00  01/01/1991 03:00:00  0.0  ...  280.94  1991_r1_r10i2p1
1991-01-01 04:00:00  01/01/1991 04:00:00  0.0  ...  272.53  1991_r1_r10i2p1
...                                  ...  ...  ...     ...              ...
2021-12-31 19:00:00  31/12/2021 19:00:00  0.0  ...  289.06   2021_r5_r9i2p1
2021-12-31 20:00:00  31/12/2021 20:00:00  0.0  ...  301.39   2021_r5_r9i2p1
2021-12-31 21:00:00  31/12/2021 21:00:00  0.0  ...  301.30   2021_r5_r9i2p1
2021-12-31 22:00:00  31/12/2021 22:00:00  0.0  ...  313.21   2021_r5_r9i2p1
2021-12-31 23:00:00  31/12/2021 23:00:00  0.0  ...  313.29   2021_r5_r9i2p1

我当前的代码如下(请参见>>>>>>了解需要注意的具体行):

df2 = pd.DataFrame(df2, columns=['dry_bulb_temp', 'dew_point_temp','WS','GIR','max_temp','min_temp','max_dew_point','min_dew_point','max_wind'])

for i in range(12):
    c, Q = selectYear(df2, i + 1, config)

def selectYear(d, m, config):
    """
    Use the Sandia method, to select the most typical year of data
    for the given month
    """
>>>>d = d[d.index.month == m]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    n_bins = config['cdf_bins']
    weights = dict(config['weights'])
    total = weights.pop('total')

    score = dict.fromkeys(d.index.year, 0)
    fs = dict.fromkeys(weights)
    cdfs = dict.fromkeys(weights)
    i = 0
    x2 = np.zeros((len(weights), 30))

    for w in weights:
        cdfs[w] = dict([])
        fs[w] = dict([])

        # Calculate the long term CDF for this weight
        cdfs[w]['Long-Term'], bin_edges = cdf(d, w, n_bins)

        x = bin_edges[:-1] * np.diff(bin_edges) / 2
        x2[i, :] = x
        i += 1

>>>>>>>>for yr in set(d.index.year):<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
            dy = d[d.index.year == yr]
            #print(dy)

            # calculate the CDF for this weight for specific year
            cdfs[w][yr], b = cdf(dy, w, bin_edges)

            # Finkelstein-Schafer statistic (difference between long term
            # CDF and year CDF
            fs[w][yr] = np.mean(abs(cdfs[w]['Long-Term'] - cdfs[w][yr]))

            # Add weighted FS value to score for this year
            score[yr] += fs[w][yr] * weights[w] / total

    # select the top 5 years ordered by their weighted scores
    top5 = sorted(score, key=score.get)[:5]

目前,我的代码按月对数据进行索引,然后比较每年的数据。换句话说,每年的1月份都要进行评估(计算cdf),然后进行排名。
出现的问题是,由于存在多个运行,因此存在多个2001年1月。我的代码当前合并了它们的数据,而不是将2001年1月的运行1与2001年1月的运行2视为单独的实体进行比较。我的问题是,有没有一种方法可以使用我的列“run\u file\u year”(一个字符串)进行索引,并让代码在所有run\u file\u year列中运行(而不列出它们)?
目前, Dataframe d(按月索引)随后按年索引。我想知道,我是否可以通过某种方式通过run_file_year列进行索引,而不是按年份进行索引,而无需迭代中的所有项目?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题