sqlite 此应用程序将读取邮箱数据(mbox.txt)并计算每个组织的电子邮件数量

cetgtptt  于 8个月前  发布在  SQLite
关注(0)|答案(2)|浏览(61)

此应用程序将读取邮箱数据(mbox.txt)并计算每个组织的电子邮件数量(即电子邮件地址的域名)使用具有以下模式的数据库来维护计数。
CREATE TABLE Counts (org TEXT, count INTEGER)
我写了一段代码,但他们在作业中给出了一个提示,最高的组织计数是536,我得到了390
Link to the data

import sqlite3

conn = sqlite3.connect('data2.sqlite')
cur = conn.cursor()

cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('CREATE TABLE Counts (org TEXT,  count INTEGER)')

for line in open('mbox.txt'):
    
    if not line.startswith('From'):
        continue
    
    line = line.split()
    email = line[1]
    domaine = email.split('@')[1]    
    
    cur.execute('SELECT count FROM Counts WHERE org = ?',(email,))
    row = cur.fetchone()
    
    if not row:
        cur.execute('INSERT INTO Counts `(`org, count`) `VALUES (?,1)',(email,))

    else:
        cur.execute('UPDATE Counts SET count=count+1 WHERE org=?', (email,))

    conn.commit()
d6kp6zgx

d6kp6zgx1#

{'iupui.edu': 536,
 'umich.edu': 491,
 'indiana.edu': 178,
 'caret.cam.ac.uk': 157,
 'vt.edu': 110,
 'uct.ac.za': 96,
 'media.berkeley.edu': 56,
 'ufp.pt': 28,
 'gmail.com': 25,
 'et.gatech.edu': 17,
 'txstate.edu': 17,
 'whitman.edu': 17,
 'lancaster.ac.uk': 14,
 'bu.edu': 14,
 'stanford.edu': 12,
 'unicon.net': 9,
 'loi.nl': 9,
 'rsmart.com': 8,
 'ucdavis.edu': 1,
 'fhda.edu': 1,
 'utoronto.ca': 1}

这是我从这段代码中得到的dict。536是最高的。

from pprint import pp

emails = {}
with open('mbox.txt') as file:
    while (line := file.readline()):
        if not (line.startswith('From ') and '@' in line): continue
        domain = line.split('@')[-1].split()[0]
        emails[domain] = emails.get(domain, 0) + 1

pp(dict(sorted(emails.items(), key=lambda x: -x[1])))

这是我用你的代码得到的dict,所以我认为过滤是可疑的。

{'iupui.edu': 1072,
 'umich.edu': 982,
 'indiana.edu': 356,
 'caret.cam.ac.uk': 314,
 'vt.edu': 220,
 'uct.ac.za': 192,
 'media.berkeley.edu': 112,
 'ufp.pt': 56,
 'gmail.com': 50,
 'et.gatech.edu': 34,
 'txstate.edu': 34,
 'whitman.edu': 34,
 'lancaster.ac.uk': 28,
 'bu.edu': 28,
 'stanford.edu': 24,
 'unicon.net': 18,
 'loi.nl': 18,
 'rsmart.com': 16,
 'ucdavis.edu': 2,
 'fhda.edu': 2,
 'utoronto.ca': 2}
col17t5w

col17t5w2#

导入sqlite3
conn=sqlite3.connect('data2.sqlite')cur=conn.cursor()
cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('CREATE TABLE Counts(org TEXT,count INTEGER ')
fhandle=open('mbox.txt')for line in fhandle:if not line.startswith('From:'):continue

line=line.split()
email=line[1]
domaine=email.split('@')
domaine=domaine[1]

cur.execute('SELECT count FROM Counts WHERE org= ?',(domaine,))
row=cur.fetchone()
if row is None :
    cur.execute('INSERT INTO Counts (org,count) VALUES (?,1)',(domaine,))
else:
    cur.execute('UPDATE Counts SET count=count+1 WHERE org=?',(domaine,))
    
    
conn.commit()

相关问题