如何优化SQLite数据库？

toiithl6 于 8个月前发布在 SQLite

关注(0)|答案(1)|浏览(57)

我有下面的代码来处理我的文件。如果文件之前被添加到数据库中，那么该文件将被跳过（pass）。

import sqlite3 as lite

for i, file in enumerate( allMedia ):
  con = lite.connect(DB_PATH)
  con.text_factory = str
  with con:
    cur = con.cursor()
    cur.execute("SELECT rowid,files_id,path,set_id,md5,tagged FROM files WHERE path = ?", (file,))
    row = cur.fetchone()
    if (row is not None):
       pass

此代码的问题是处理速度慢（在数据库中找到的每个文件需要2-3秒）。数据库大小约为30 Mb。数以千计的文件需要处理。
有没有办法加快这一进程？

sqlite

来源：https://stackoverflow.com/questions/76860082/how-to-optimize-working-with-sqlite-database

1条答案

按热度按时间

xu3bshqb1#

如果您只对文件路径匹配的表的记录感兴趣，则可以使用一个SQL查询来完成此操作--一个使用in操作符的查询。然后，您需要准备?的列表：

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
anchors = (",?" * len(allMedia))[1:]
sql = f"SELECT * FROM files WHERE path in ({anchors})"
result = cur.execute(sql, allMedia)

for row in result.fetchall():
    # do something with row
    pass

如果您的列表（allMedia）比Sqlite支持的in操作符长，那么首先拆分列表into chunks，然后对每个块执行上述查询。

# From the linked Q&A
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
for media in chunks(allMedia, 500):
    anchors = (",?" * len(media))[1:]
    sql = f"SELECT * FROM files WHERE path in ({anchors})"
    result = cur.execute(sql, media)
    for row in result.fetchall():
        # do something with row
        pass

如果你的allMedia列表可能有重复项，并且你不需要处理同一个文件两次，那么首先从正在迭代的集合中消除这些重复项：

for media in chunks(list(set(allMedia)), 500):

赞(0）回复(0）举报 8个月前

我来回答

如何优化SQLite数据库？

1条答案

相关问题

热门标签

最新问答