import pandas as pd
for i,chunk in enumerate(pd.read_csv('C:/your_path_here/main.csv', chunksize=1000000)):
chunk.to_csv('chunk{}.csv'.format(i))
或
import os
os.getcwd()
csvfile = open('C:/your_path/Book1.csv', 'r').readlines()
filename = 1
for i in range(len(csvfile)):
if i % 1000000 == 0:
open(str(filename) + '.csv', 'w+').writelines(csvfile[i:i+1000000])
filename += 1
2条答案
按热度按时间nc1teljy1#
你安装了python吗?
或
owfi6suc2#
你可以这样做http://docs.cascading.org/cascading/2.5/javadoc/cascading/tap/hadoop/partitiontap.html 以及实施
Partition
指定如何从TupleEntry
到特定的子目录。