如何在sql/pandas/python中获得部分匹配的结果并用条件值填充列?

moiiocjp  于 2021-07-26  发布在  Java
关注(0)|答案(1)|浏览(238)

我希望在一列中找到部分匹配的值,并用从“item name”列派生的值替换“subcategory”列。“subcategory”列当前与“item name”列重复。
例如,在下图中,我想将“subcategory”列中的当前值替换为啤酒类别(ipa、ale、淡色ale、lager、pilsner等…),该类别源自“item name”中字符串的部分匹配。
sql数据库
我已经在pandas dataframe和sql数据库中创建了一个新的列,我可以通过它来获取这些信息,但缺乏这样做的知识。
我在sql中使用的查询如下:

ALTER TABLE public.all_beers_specs
ADD "Sub_Category" text;

UPDATE public.all_beers_specs
SET "Sub_Category" = all_beers_specs."Item Name";

SELECT "Sub_Category" from public.all_beers_specs
WHERE "Item Name" like '%IPA%';

输出:

Sub_Category
11 Below Hipster Sauce IPA 
21st Amendment Brew Free Or Die IPA 
21st Amendment Orange Brew Free IPA 
4th Tap Double Dry Hopped Double Dragon IPA 
4th Tap Kung Fu Robot IPA

我不想把啤酒的全名写在专栏里,我想删掉它,只写ipa。我想在许多不同类型的啤酒中重复这个过程,以得到类似的结果。
编辑#1:
下面是我用python提出的解决方案:
代码:

dataframe = pd.read_csv(path_to_my_dataframe.csv')

# print(dataframe)

list_of_beers = dataframe['Item Name'].to_list() 

lower_case_list = [x.upper() for x in list_of_beers]

# creating a list to append results

subcategory_list =[]
for b in lower_case_list:
    #print(b)
    if 'IPA' in b:
        ipa = 'IPA'
        subcategory_list.append(ipa)
    else:
        if 'ALE' in b:
            ale = 'ALE'
            subcategory_list.append(ale)
        else:
            if 'STOUT' in b:
                stout = 'STOUT'
                subcategory_list.append(stout)

输出:

Token                                  Item Name  ...   Price Sub_Category
0       NaN                    11 Below 7 Iron Blonde   ...   $8.82          N/A
1       NaN             11 Below Big Mistake Ba Stout   ...   $8.41        STOUT
2       NaN                11 Below Hipster Sauce IPA   ...   $9.78          IPA
3       NaN          11 Below Java Space Coffee Stout   ...  $12.48        STOUT
4       NaN               11 Below Lame Duck Pale Ale   ...   $8.76          ALE
...     ...                                        ...  ...     ...          ...
2899    NaN          Zilker Brewing Coffee Milk Stout   ...   $8.94        STOUT
2900    NaN  Zilker Brewing Friends With Benefits IPA   ...  $12.78          IPA
2901    NaN                  Zilker Brewing Marco IPA   ...   $8.94          IPA
2902    NaN     Zilker Brewing Parks and Rec Pale Ale   ...   $8.94          ALE
2903    NaN                  Zilker Heavenly Daze IPA   ...  $10.92          IPA

我相信有一种更有效的方法来分类啤酒,而不是没完没了的if-else语句,也许是使用类或定义函数?
另外,对于记录4和2902,我希望它们被归类为淡啤酒和纯啤酒。我试着添加

if 'ALE' != 'PALE ALE' in B:
   ALE = ALE

有没有办法精确匹配/过滤那些字符串中的字符?

c3frrgcw

c3frrgcw1#

考虑纯sql条件 CASE 中的语句 UPDATE . 在postgres中可以在单个语句下面运行,也可以通过python调用它。离开 pandas 用于数据分析!

UPDATE public.all_beers_specs
SET "Sub_Category" = CASE 
                          WHEN all_beers_specs."Item Name" LIKE '%IPA%'      THEN 'IPA'
                          WHEN all_beers_specs."Item Name" LIKE '%Pale Ale%' THEN 'Pale Ale'
                          WHEN all_beers_specs."Item Name" LIKE '%Ale%'      THEN 'Ale'
                          WHEN all_beers_specs."Item Name" LIKE '%Lager%'    THEN 'Lager'
                          WHEN all_beers_specs."Item Name" LIKE '%Pilsner%'  THEN 'Pilsner'
                          WHEN all_beers_specs."Item Name" LIKE '%Stout%'    THEN 'Stout'
                          WHEN all_beers_specs."Item Name" LIKE '%Porter%'   THEN 'Porter'
                          ELSE NULL
                     END

相关问题