pandas：如何将Dataframe插入clickhouse

cbeh67ev 于 2021-07-15 发布在 ClickHouse

关注(0)|答案(4)|浏览(3318)

我正在尝试将Pandas数据框插入clickhouse。
这是我的密码

import pandas
import sqlalchemy as sa

uri = 'clickhouse://default:@localhost/default'
ch_db = sa.create_engine(uri)

pdf = pandas.DataFrame.from_records([
    {'year': 1994, 'first_name': 'Vova'},
    {'year': 1995, 'first_name': 'Anja'},
    {'year': 1996, 'first_name': 'Vasja'},
    {'year': 1997, 'first_name': 'Petja'},
])

pdf.to_sql('test_humans', ch_db, if_exists='append', index=False)

这就是我收到的错误。这是否与发动机缺少的额外参数有关？我怎样才能解决这个问题？
异常：代码：62，e.displaytext（）=db:：异常：语法错误：位置65处失败（第7行，第2列）：格式tabseparatedwithnamesandtypes。应为以下之一：引擎、存储定义（版本19.15.2.2（正式构建））
登录中
info:sqlalchemy.engine.base.engine：exists table test\u人类info:sqlalchemy.engine.base.engine:{} info:sqlalchemy.engine.base.engine：create table test\u humans（first\u name text，year bigint）
info:sqlalchemy.engine.base.engine:{} info:sqlalchemy.engine.base.engine：回滚

clickhouse python pandas

来源：https://stackoverflow.com/questions/58422110/pandas-how-to-insert-dataframe-into-clickhouse

4条答案

按热度按时间

ctrmrzij1#

你也可以使用https://github.com/kszucs/pandahouse 插入Dataframe而无需额外转换。
pip安装pandahouse

import pandahouse as ph

pdf = pandas.DataFrame.from_records([
    {'year': 1994, 'first_name': 'Vova'},
    {'year': 1995, 'first_name': 'Anja'},
    {'year': 1996, 'first_name': 'Vasja'},
    {'year': 1997, 'first_name': 'Petja'},
])

connection = dict(database='default',
                  host='localhost',
                  user='default',
                  password='')

ph.to_clickhouse(pdf, 'test_humans', index=False, chunksize=100000, connection=connection)

赞(0）回复(0）举报 2021-07-15

r6l8ljro2#

你可以不用担心 sqlalchemy .
pip安装clickhouse驱动程序

from clickhouse_driver import Client

client = Client('localhost')
df = pandas.DataFrame.from_records([
    {'year': 1994, 'first_name': 'Vova'},
    {'year': 1995, 'first_name': 'Anja'},
    {'year': 1996, 'first_name': 'Vasja'},
    {'year': 1997, 'first_name': 'Petja'},
])

# df processing blablabla...

client.execute("INSERT INTO your_table VALUES", df.to_dict('records'))

希望这有帮助。

赞(0）回复(0）举报 2021-07-15

luaexgnf3#

sqlalchemy clickhouse无法自动创建表（至少在版本0.1.5.post0（含0.1.5.post0）之前），并将创建表的任何sql查询解释为始终以格式tabseparatedwithnamesandtypes子句结尾的select查询。
要解决此问题，需要使用infi.clickhouse\u orm-api手动创建表（此模块随sqlalchemy clickhouse一起提供）：

import pandas as pd
from infi.clickhouse_orm.engines import Memory
from infi.clickhouse_orm.fields import UInt16Field, StringField
from infi.clickhouse_orm.models import Model
from sqlalchemy import create_engine

# define the ClickHouse table schema

class Test_Humans(Model):
    year = UInt16Field()
    first_name = StringField()
    engine = Memory()

engine = create_engine('clickhouse://default:@localhost/test')

# create table manually

with engine.connect() as conn:
    conn.connection.create_table(Test_Humans) # https://github.com/Infinidat/infi.clickhouse_orm/blob/master/src/infi/clickhouse_orm/database.py#L142

pdf = pd.DataFrame.from_records([
    {'year': 1994, 'first_name': 'Vova'},
    {'year': 1995, 'first_name': 'Anja'},
    {'year': 1996, 'first_name': 'Vasja'},
    {'year': 1997, 'first_name': 'Petja'},
    # ! sqlalchemy-clickhouse ignores the last item so add fake one
    {}
])

pdf.to_sql('test_humans', engine, if_exists='append', index=False)

考虑到sqlalchemy clickhouse忽略了最后一项，所以添加一个伪项（参见源代码和相关的第10期）。
让我们在db中检查此表：

SELECT *
FROM test.test_humans

┌─year─┬─first_name─┐
│ 1994 │ Vova       │
│ 1995 │ Anja       │
│ 1996 │ Vasja      │
│ 1997 │ Petja      │
└──────┴────────────┘

4 rows in set. Elapsed: 0.003 sec.

* /

赞(0）回复(0）举报 2021-07-15

3df52oht4#

从版本0.2.0开始，clickhouse\u驱动程序实现方法 insert_dataframe . 请参见：https://clickhouse-driver.readthedocs.io/en/latest/api.html#clickhouse_driver.client.insert_dataframe

赞(0）回复(0）举报 2021-07-15

我来回答

pandas：如何将Dataframe插入clickhouse

4条答案

相关问题

热门标签

最新问答