pandas 用值交换索引的最快方法

cedebl8k  于 5个月前  发布在  其他
关注(0)|答案(4)|浏览(74)

考虑pd.Seriess

s = pd.Series(list('abcdefghij'), list('ABCDEFGHIJ'))
s

A    a
B    b
C    c
D    d
E    e
F    f
G    g
H    h
I    i
J    j
dtype: object

字符串
交换索引和值并获得以下内容的最快方法是什么

a    A
b    B
c    C
d    D
e    E
f    F
g    G
h    H
i    I
j    J
dtype: object

56lgkhnf

56lgkhnf1#

一个可能的解决方案是通过以下方式交换键和值:

s1 = pd.Series(dict((v,k) for k,v in s.iteritems()))
print (s1)
a    A
b    B
c    C
d    D
e    E
f    F
g    G
h    H
i    I
j    J
dtype: object

字符串
另一个最快的:

print (pd.Series(s.index.values, index=s ))
a    A
b    B
c    C
d    D
e    E
f    F
g    G
h    H
i    I
j    J
dtype: object

时间

In [63]: %timeit pd.Series(dict((v,k) for k,v in s.iteritems()))
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 146 µs per loop

In [71]: %timeit (pd.Series(s.index.values, index=s ))
The slowest run took 7.42 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 102 µs per loop


如果Series的长度是1M

s = pd.Series(list('abcdefghij'), list('ABCDEFGHIJ'))
s = pd.concat([s]*1000000).reset_index(drop=True)
print (s)

In [72]: %timeit (pd.Series(s.index, index=s ))
10000 loops, best of 3: 106 µs per loop

In [229]: %timeit pd.Series(dict((v,k) for k,v in s.iteritems()))
1 loop, best of 3: 1.77 s per loop

In [230]: %timeit (pd.Series(s.index, index=s ))
10 loops, best of 3: 130 ms per loop

In [231]: %timeit (pd.Series(s.index.values, index=s ))
10 loops, best of 3: 26.5 ms per loop

irlmq6kh

irlmq6kh2#

a2b = my_df
b2a = pd.Series(data = a2b.index, index = a2b.values)

字符串

fnatzsnv

fnatzsnv3#

如果序列和索引都有名称,并且您也想交换它们:

srs_1 = pd.Series(list('ABC'), list('abc'), name='upper').rename_axis('lower')
# lower
# a    A
# b    B
# c    C
# Name: upper, dtype: object

srs_2 = pd.Series(srs_1.index, index=srs_1)
# upper
# A    a
# B    b
# C    c
# Name: lower, dtype: object

字符串

h22fl7wq

h22fl7wq4#

s = s.reset_index().set_index(s.name).squeeze()

字符串

相关问题