大量的CSV,在Python中使用SQLite

h5qlskok  于 5个月前  发布在  Python
关注(0)|答案(1)|浏览(59)

我有超过1000个CSV文件(我们称之为辅助文件),具有相同的结构(压缩大小从400 KB到3531 KB不等)
SECONDARY文件的名称对应于将它们连接到具有另一种信息类型的主CSV文档的键。
我需要在SECONDARY文件的几列上执行一些功能(有些值在整个文件中重复,需要进行分组)。
后来,我将需要得到一些信息,从次要文件考虑到一些信息,从主文件。
我试过了:

import pandas as pd
import sqlite3
import os
import re

# Set the directory where your CSV files are located
csv_directory = r"path"

# Connect to SQLite database (or create a new one if it doesn't exist)
conn = sqlite3.connect("db.db")
cursor = conn.cursor()

# Get a list of all CSV files in the directory
csv_files = [file for file in os.listdir(csv_directory) if file.endswith(".csv")]

# Define the master table name
master_bus = "master_table"
# Replace or remove invalid characters from the master table name
master_bus = re.sub(r'\W+', '', master_bus)

# loop through each CSV file and create a table in the database
for csv_file in csv_files:
    table_name = os.path.splitext(csv_file)[0] 
    csv_path = os.path.join(csv_directory, csv_file)

# Read CSV file into a pandas DataFrame
    df = pd.read_csv(csv_path)

# Add a new column with the filename
    df['filename'] = os.path.basename(csv_file)

# Write the DataFrame to SQLite
    df.to_sql(table_name, conn, index=False)

# Concatenate all tables into the master table
tables = cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
tables = [table[0] for table in tables]

# Create the master table
cursor.execute(f"CREATE TABLE '{master_bus}' AS SELECT * FROM {tables[0]} WHERE 0;")

# Concatenate data from all tables into the master table
for table in tables:
    cursor.execute(f"INSERT INTO {master_bus} SELECT * FROM {table};")

# Commit changes and close connection
conn.commit()
conn.close()

字符串
它有点工作。我可以用另一个代码为每个csv文件创建一个带有表的DB。但是当我试图将文件的名称添加到一个新列中时,并创建一个主表时,会发生这种情况:

OperationalError Traceback (most recent call last)

c:\Users\acuna\Desktop\Computer Science\Project_Bus.ipynb Cell 17 line 4

37 tables = [table[0] for table in tables]

39 # Create the master table

---> 40 cursor.execute(f"CREATE TABLE '{master_bus}' AS SELECT * FROM {tables[0]} WHERE 0;")

42 # Concatenate data from all tables into the master table

43 for table in tables:

OperationalError: near "-": syntax error

mwkjh3gx

mwkjh3gx1#

您的代码中的错误是您试图创建的表名中的无效字符。在SQLite中,表名不能包含特殊字符(如-),因此您需要在创建表之前删除或替换它们。

master_bus = "master_table"

master_bus = re.sub(r'\W+', '', master_bus)

cursor.execute(f"CREATE TABLE {master_bus} AS SELECT * FROM {tables[0]} WHERE `0;")`

字符串

相关问题