csv 矩阵后处理问题

jum4pzuy  于 8个月前  发布在  其他
关注(0)|答案(1)|浏览(52)

如果这是我的数组:

csv_arr_output = [
    ["",        "",       "Average", "",    "Red"     ],
    ["",        "",       "",        "",    "Red eyes"],
    ["",        "height", "weight",  "",    ""        ],
    ["Males",   "1.9",    "0.003",   "40%", ""        ],
    ["Females", "1.7",    "0.002",   "43%", ""        ],
]

字符串
稍后将转换为CSV文件,但我使用此函数进行后期处理:

def post_processing(csv_array, common_percent_threshold=0.5):
        # Replace longer next cell if they have a certain percent of common string
        for row in range(len(csv_array) - 1):
            for col in range(len(csv_array[row])):
                current_cell = csv_array[row][col]
                next_cell = csv_array[row + 1][col]
    
                if calculate_common_percent(current_cell, next_cell) >= common_percent_threshold:
                    if len(current_cell) < len(next_cell):
                        csv_array[row][col] = next_cell
                        csv_array[row + 1][col] = ""
    
        # Merge empty cells in the same column
        for col in range(len(csv_array[0])):
            column_cells = [row[col] for row in csv_array]
            non_empty_cells = [cell for cell in column_cells if cell != ""]
            empty_cells = [""] * (len(column_cells) - len(non_empty_cells))
    
            for row in range(len(csv_array)):
                if csv_array[row][col] == "":
                    csv_array[row][col] = empty_cells.pop(0)
    
        # Remove empty rows
        csv_array = [row for row in csv_array if any(cell != "" for cell in row)]
    
        return csv_array
    
    def calculate_common_percent(cell1, cell2):
        set1 = set(cell1.split())
        set2 = set(cell2.split())
        
        # Check if both sets are empty
        if not set1 and not set2:
            return 0.0
        
        common_words = set1.intersection(set2)
        total_words = set1.union(set2)
        
        # Avoid division by zero
        if not total_words:
            return 0.0
        
        return len(common_words) / len(total_words)


这是我得到的输出:

[
    ["",        "",       "Average", "",    "Red eyes"],
    ["",        "height", "weight",  "",    ""        ],
    ["Males",   "1.9",    "0.003",   "40%", ""        ],
    ["Females", "1.7",    "0.002",   "43%", ""        ],
]


正如你可以看到下面的列红眼所有的细胞是空的,我想shif红眼到左边,然后删除空列colud你请帮助我在这个函数中集成此逻辑
我期待后处理功能做所有的functionalty除了转移列标题,它下面有空单元格,然后removign所有的空列从矩阵

pgx2nnw8

pgx2nnw81#

从一个二维列表开始,它看起来像:

l = [
    ["",        "",       "Average", "",    "Red"     ],
    ["",        "",       "",        "",    "Red eyes"],
    ["",        "height", "weight",  "",    ""        ],
    ["Males",   "1.9",    "0.003",   "40%", ""        ],
    ["Females", "1.7",    "0.002",   "43%", ""        ],
]

字符串
您可以:

  • 把“红眼睛”向上移动
  • 删除空行
  • 删除最后(第5)列

比如:

# move 'Red eyes' up and over
l[0][3] = l[1][4]

# delete unnecessary row
del l[1]

# trim 5th cell from each row (effectively deleting 5th column)
l = [x[0:4] for x in l]

print(l)


这给了我:

[
    ["",        "",       "Average", "Red eyes"],
    ["",        "height", "weight",  ""        ],
    ["Males",   "1.9",    "0.003",   "40%"     ],
    ["Females", "1.7",    "0.002",   "43%"     ],
]

相关问题