使用python从CSV文件中提取表格

jfgube3f  于 5个月前  发布在  Python
关注(0)|答案(3)|浏览(44)

我尝试使用Python从报告中提取特定的表,并将数据编译为单个表。然而,似乎一旦它选择或找到所需的一系列列名,它就会复制下面的所有数据,包括机器无法读取的所有必要数据。输出应该是表格格式。我正在运行它。

验证码

import csv
import pandas as pd

# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
    data_start = False
    headers = []
    rows = []

    with open(file_path, newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        
        for row in csv_reader:
            if not data_start:
                if row and row[0] == "SUBSCRIBER NAME":
                    headers = row
                    data_start = True
            elif data_start and row:
                if row[0].strip().upper() == "PRODUCT TOTAL":
                    break
                rows.append(row)

    return headers, rows

# Specify the path to your CSV file
csv_file_path = 'Testing/sample report.csv'

# Extract data
headers, rows = extract_data_from_csv(csv_file_path)

# Create a DataFrame using pandas
df = pd.DataFrame(rows, columns=headers)

# Display the DataFrame as a table
display(df)

# Save the DataFrame to a new CSV file
output_csv_file_path = 'mark_table_1.csv'
df.to_csv(output_csv_file_path, index=False)

字符串
请参阅谷歌链接的文件:Link

k10s72fa

k10s72fa1#

可能需要考虑一个自定义的生成器函数,它在文件上循环,在某个条件下开始捕获,然后消耗并产生其余的,直到另一个条件(或文件结束),并中断继续寻找第一个条件,(与您所做的没有很大的不同),但是例如:

import csv
import pandas as pd

def extract(filename):
    with open(filename) as fin:
        csvin = csv.reader(fin)
        for row in csvin:
            if row and row[0] == 'SUBSCRIBER NAME':
                for subrow in csvin:
                    if subrow and subrow[0]: # change me perhaps?
                        yield dict(zip(row, subrow))
                    else:
                        break

字符串
然后通过以下方式获取DataFrame:

df = pd.DataFrame(extract('your_file_name'))

btqmn9zl

btqmn9zl2#

我还没有测试过,但你可以尝试用下面的命令更改extract_data_from_csv

# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
    data_start = False
    headers = []
    rows = []

    with open(file_path, newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        
        for row in csv_reader:
            if not data_start:
                if row and row[0] == "SUBSCRIBER NAME":
                    headers = row
                    data_start = True
            elif data_start and row:
                if row[5].strip().upper() == "PRODUCT TOTAL": #changed row index to 5 
                    data_start = False #in place of break
                rows.append(row)

    return headers, rows

字符串
其中进行了两项更改:
1.索引从row[0].strip().upper() == "PRODUCT TOTAL"更改为row[5].strip().upper() == "PRODUCT TOTAL"
1.将break替换为data_start = False,以暂时停止行写入,直到出现新的“SUBSCRIBER NAME”表;注意headers将在if语句中重写,但这不应成为问题,因为它们始终相同,您也可以进一步修改函数以避免这种情况

qaxu7uf2

qaxu7uf23#

我知道了
分享整个代码以供将来参考。

import csv
import pandas as pd

# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
    headers = None
    rows = []

    with open(file_path, newline='') as fin:
        csvin = csv.reader(fin)
        for row in csvin:
            if row and row[0] == 'SUBSCRIBER NAME':
                headers = row
                for subrow in csvin:
                    if subrow and subrow[0].strip().upper() == 'PRODUCT TOTAL':
                        break
                    elif subrow and subrow[0]:
                        rows.append(subrow)
                    else:
                        break

    return headers, rows

# Specify the path to your CSV file
csv_file_path = 'Testing/0976_Bills_Payment_Abstract_of_Collection_11142023_115828.csv'

# Extract data using the new extract function
headers, rows = extract_data_from_csv(csv_file_path)

# Create a DataFrame using pandas
df = pd.DataFrame(rows, columns=headers)

# Display the DataFrame as a table
display(df)

# Save the DataFrame to a new CSV file
output_csv_file_path = 'mark_table_2.csv'
df.to_csv(output_csv_file_path, index=False)

字符串

相关问题