pandas 使用druiddb时不显示列名

jdgnovmf  于 7个月前  发布在  Druid
关注(0)|答案(2)|浏览(80)

我正在本地运行Apache Druid插件。我正在从Kafka流加载数据。
在Druid上,我可以看到列名称:


的数据
然后使用druiddb(https://github.com/betodealmeida/druid-dbapi),我编写了一个SQL查询,并将数据阅读到Python环境中,并将其放入pandas框架中。然而,某些列名没有出现:

from druiddb import connect
# https://github.com/betodealmeida/druid-dbapi
import pandas as pd

druid_host = "localhost"
druid_port = 8888
druid_path = "/druid/v2/sql"
druid_scheme = "http"
druid_query = """SELECT * FROM malaria_cases_full"""    
druid_connection = connect(host=druid_host, port=druid_port, path=druid_path, scheme=druid_scheme)
druid_cursor= druid_connection.cursor()
df = pd.DataFrame(druid_cursor.execute(druid_query))
df.head(n =10)

字符串


rmbxnbpk

rmbxnbpk1#

我建议你使用Druid的(官方的?)Python连接器,也就是pydruid
或者简单地使用sqlalchemy引擎的read_sql

# pip install sqlalchemy==1.4.4
from sqlalchemy import MetaData, Table
from sqlalchemy.engine import create_engine

engine = create_engine("druid://localhost:8888/druid/v2/sql") # add ?header=True
ta = Table("wikipedia", MetaData(bind=engine), autoload=True) # if needed

df = pd.read_sql(ta.select(), engine)

字符串
输出量:

print(df.columns)

Index(['__time', 'isRobot', 'channel', 'flags', 'isUnpatrolled', 'page',
       'diffUrl', 'added', 'comment', 'commentLength', 'isNew', 'isMinor',
       'delta', 'isAnonymous', 'user', 'deltaBucket', 'deleted', 'namespace',
       'cityName', 'countryName', 'regionIsoCode', 'metroCode',
       'countryIsoCode', 'regionName'],
      dtype='object')
print(df)

                         __time isRobot  ... countryIsoCode       regionName
21243  2016-06-27T18:50:07.084Z    true  ...                                
10272  2016-06-27T10:20:13.238Z   false  ...             AU  New South Wales
...                         ...     ...  ...            ...              ...
21271  2016-06-27T18:51:31.698Z   false  ...                                
5773   2016-06-27T06:16:43.741Z   false  ...                                

[24433 rows x 24 columns]

中文(简体):


的数据

6yt4nkrj

6yt4nkrj2#

这是一个pandas列显示功能。如果你想查看数据集的所有列,用途:

pd.set_option('display.max_columns', len(df.columns))
df.head(n =10)

字符串

相关问题