Python写入CSV格式问题

djp7away  于 5个月前  发布在  Python
关注(0)|答案(2)|浏览(93)

我在将数据写入正确格式化的CSV时遇到了麻烦。数据是动漫电视节目,包括某些数据,如标题,流派和梗概。在解析来自API调用的数据后,除了流派和梗概外,它都正确写入CSV。如果动漫有几个流派,只有第一个似乎被 Package 在““中,其余的部分与大纲结合起来,从大纲开始。
解析函数:

def parse_data(data):
    try: 
        
        # Genre parse
        genres_list = data['genres']
        genres = ', '.join(genre['name'] for genre in genres_list)
        print(genres)
        
        # Studio parse
        studio_name = "unknown"
        studio_parse = str(data.get('studios'))
        match = re.search(r"'name':\s*'([^']*)'", studio_parse)
        if match:
            studio_name = match.group(1)
        else:
            None
        
        # Synopsis parse
        synopsis_dirty = data['synopsis']
        synopsis = re.sub(r"\(Source: [^\)]+\)", "", synopsis_dirty).strip()
        synopsis = re.sub(r'\[Written by MAL Rewrite\]', '', synopsis).strip()

        
        details = str(data['id']) + ',' + data['title'].encode('utf-8').decode('cp1252', 'replace') + ',' + data['start_date'] + ',' + str(data['mean']) + ',' + str(data['rank']) + ',' + str(data['popularity']) + ',' + str(data['num_episodes']) + ',' + data['rating'] +  ',' + studio_name + ',' + genres + ',' + synopsis.encode('utf-8').decode('cp1252', 'replace')
        split_data = re.split(r"[,]", details)
        
        return split_data

字符串
词典格式:

def parsed_data_to_dict(data):
    try:
        dict = {
            "id" :  data[0],
            "title" : data[1].encode('utf-8').decode('cp1252', 'replace'),
            "start-date" : data[2],
            "mean"  : data[3],
            "rank"  : data[4],
            "popularity" : data[5],
            "num_episodes" : data[6],
            "rating" : data[7],
            "studio" : data[8],
            "genres" : data[9],
            "synopsis" : ''.join(data[10:]).encode('utf-8').decode('cp1251', 'replace')
        }
Expected CSV:

"8","Bouken Ou Beet","2004-09-30","6.93","4426","5274","52","pg","Toei Animation","Adventure, Fantasy, Shounen, Supernatural", "It is the dark century and the people are suffering under the rule of the devil Vandel who is able to manipulate monsters."
Actual CSV:

"8","Bouken Ou Beet","2004-09-30","6.93","4426","5274","52","pg","Toei Animation","Adventure"," Fantasy Shounen SupernaturalIt is the dark century and the people are suffering under the rule of the devil Vandel who is able to manipulate monsters."
tuwxkamq

tuwxkamq1#

您可以手动创建逗号分隔的字符串以将其用作CSV原始文件。这种方法容易出错,特别是当字符串中包含逗号时。您始终需要小心地组合合并”“和”“字符串。
我建议你使用csv standard library代替。

af7jpaap

af7jpaap2#

split_data = re.split(r"[,]", details)

字符串
在这一行之前,你已经有了data,它是字典,因此适合与csv.DictWriter(标准库的一部分)一起使用,它允许字符串包含,,也将处理包含"的字符串,考虑下面的例子

import csv
with open('names.csv', 'w', newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=['uno', 'dos', 'tres'])
    writer.writeheader()
    writer.writerow({'uno': 'simple', 'dos': 'with , comma', 'tres': 'with " quote'})


将给予文件names.csv以下内容

uno,dos,tres
simple,"with , comma","with "" quote"

相关问题