对多行 Dataframe 应用自定义函数

2guxujil 于 2021-09-29 发布在 Java

关注(0)|答案(0)|浏览(244)

我目前正在开发一个函数，该函数将根据多个条件（平方米、图像和价格）检测行是否重复。它工作得非常好，直到找到重复项，从 Dataframe 中删除该行，然后我的for循环被打乱。这就产生了 IndexError: single positional indexer is out-of-bounds .

def image_duplicate(df):
    # Detecting duplicates based on the publications' images, m2 and price.
    for index1 in df.index:
        for index2 in df.index:
            if index1 == index2:
                continue
            print('index1: {} \t index2: {}'.format(index1, index2))

            img1 = Image.open(requests.get(df['img_url'].iloc[index1], stream=True).raw).resize((213, 160))
            img2 = Image.open(requests.get(df['img_url'].iloc[index2], stream=True).raw).resize((213, 160))
            img1 = np.array(img1).astype(float)
            img2 = np.array(img2).astype(float)

            ssim_result = ssim(img1, img2, multichannel=True)

            ssim_result_percentage = (1+ssim_result)/2

            if ssim_result_percentage > 0.80 and df['m2'].iloc[index1] == df['m2'].iloc[index2] \
                    and df['Price'].iloc[index1] == df['Price'].iloc[index2]:
                df.drop(df.iloc[index2], inplace=True).reindex()

image_duplicate(full_df)

解决这个问题的好办法是什么？
编辑：示例：

预期输出：从数据框中删除一行[2]。