linux—将csv文件中的第一行和第三行转置到shell脚本中的列中

f8rj6qna  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(342)

我需要从csv文件转置第一和第三行,并需要在其中添加一些关键字。例如,我有一个包含这样数据的文件。

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Test1_AA_Col8,Test1_AA_Col9,Test1_AA_Col10,Test1_BB_col11,Test1_BB_col12,Test1_BB_col13,Test2_AA_col14,Test2_AA_col15,Test2_AA_cl16,Test2_BB_col17,Test2_BB_col18,Test2_BB_col19,Test3_AA_col20,Test3_AA_col21,Test3_AA_col22,Test3_BB_col23,Test3_BB_col24,Test3_BB_col25,Test4_AA_col26,Test4_AA_cl27,Test4_AA_col28,Test4_BB_col29,Test4_BB_col30,Test4_BB_col31

对象,对象,对象,对象,对象,对象,对象,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64字符串,字符串,字符串,字符串,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”
我试着换位第一行和第三行,我的输出是这样的

CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS String) AS Col3,
CAST(Col4 AS String) AS Col4,
CAST(Col5 AS String) AS Col5,
CAST(Col6 AS String) AS Col6,
CAST(Col7 AS String) AS Col7,
CAST(Test1_AA_Col8 AS Decimal(35,15)) AS Test1_AA_Col8,
CAST(Test1_AA_Col9 AS Decimal(35,15)) AS Test1_AA_Col9,
CAST(Test1_AA_Col10 AS Decimal(35,15)) AS Test1_AA_Col10,
CAST(Test1_BB_col11 AS Decimal(35,15)) AS Test1_BB_col11,
CAST(Test1_BB_col12 AS Decimal(35,15)) AS Test1_BB_col12,
CAST(Test1_BB_col13 AS Decimal(35,15)) AS Test1_BB_col13,
CAST(Test2_AA_col14 AS Decimal(35,15)) AS Test2_AA_col14,
CAST(Test2_AA_col15 AS Decimal(35,15)) AS Test2_AA_col15,
CAST(Test2_AA_cl16 AS Decimal(35,15)) AS Test2_AA_cl16,
CAST(Test2_BB_col17 AS Decimal(35,15)) AS Test2_BB_col17,
CAST(Test2_BB_col18 AS Decimal(35,15)) AS Test2_BB_col18,
CAST(Test2_BB_col19 AS Decimal(35,15)) AS Test2_BB_col19,
CAST(Test3_AA_col20 AS Decimal(35,15)) AS Test3_AA_col20,
CAST(Test3_AA_col21 AS Decimal(35,15)) AS Test3_AA_col21,
CAST(Test3_AA_col22 AS Decimal(35,15)) AS Test3_AA_col22,
CAST(Test3_BB_col23 AS Decimal(35,15)) AS Test3_BB_col23,
CAST(Test3_BB_col24 AS Decimal(35,15)) AS Test3_BB_col24,
CAST(Test3_BB_col25 AS Decimal(35,15)) AS Test3_BB_col25,
CAST(Test4_AA_col26 AS Decimal(35,15)) AS Test4_AA_col26,
CAST(Test4_AA_cl27 AS Decimal(35,15)) AS Test4_AA_cl27,
CAST(Test4_AA_col28 AS Decimal(35,15)) AS Test4_AA_col28,
CAST(Test4_BB_col29 AS Decimal(35,15)) AS Test4_BB_col29,
CAST(Test4_BB_col30 AS Decimal(35,15)) AS Test4_BB_col30,
) AS Test4_BB_col31

这是我写的代码。

hadoop fs -text $f | sed -n '1p;3p' | awk -F, '{for (i=1; i<=NF; i++) a[i,NR=$i; max=(max<NF?NF:max)} END {for (i=1; i<=max; i++) {for (j=1; j<=NR;j++) printf "%s%s" , a[i,j], (i==NR?RS:FS) }}'

这就是我想要达到的目标。

CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS String) AS Col3,
CAST(Col4 AS String) AS Col4,
CAST(Col5 AS String) AS Col5,
CAST(Col6 AS String) AS Col6,
CAST(Col7 AS String) AS Col7,
CAST(Test1_AA_Col8 AS Decimal(35,15)) AS Test1_AA_Col8,
CAST(Test1_AA_Col9 AS Decimal(35,15)) AS Test1_AA_Col9,
CAST(Test1_AA_Col10 AS Decimal(35,15)) AS Test1_AA_Col10,
CAST(Test1_BB_col11 AS Decimal(35,15)) AS Test1_BB_col11,
CAST(Test1_BB_col12 AS Decimal(35,15)) AS Test1_BB_col12,
CAST(Test1_BB_col13 AS Decimal(35,15)) AS Test1_BB_col13,
CAST(Test2_AA_col14 AS Decimal(35,15)) AS Test2_AA_col14,
CAST(Test2_AA_col15 AS Decimal(35,15)) AS Test2_AA_col15,
CAST(Test2_AA_cl16 AS Decimal(35,15)) AS Test2_AA_cl16,
CAST(Test2_BB_col17 AS Decimal(35,15)) AS Test2_BB_col17,
CAST(Test2_BB_col18 AS Decimal(35,15)) AS Test2_BB_col18,
CAST(Test2_BB_col19 AS Decimal(35,15)) AS Test2_BB_col19,
CAST(Test3_AA_col20 AS Decimal(35,15)) AS Test3_AA_col20,
CAST(Test3_AA_col21 AS Decimal(35,15)) AS Test3_AA_col21,
CAST(Test3_AA_col22 AS Decimal(35,15)) AS Test3_AA_col22,
CAST(Test3_BB_col23 AS Decimal(35,15)) AS Test3_BB_col23,
CAST(Test3_BB_col24 AS Decimal(35,15)) AS Test3_BB_col24,
CAST(Test3_BB_col25 AS Decimal(35,15)) AS Test3_BB_col25,
CAST(Test4_AA_col26 AS Decimal(35,15)) AS Test4_AA_col26,
CAST(Test4_AA_cl27 AS Decimal(35,15)) AS Test4_AA_cl27,
CAST(Test4_AA_col28 AS Decimal(35,15)) AS Test4_AA_col28,
CAST(Test4_BB_col29 AS Decimal(35,15)) AS Test4_BB_col29,
CAST(Test4_BB_col30 AS Decimal(35,15)) AS Test4_BB_col30,
CAST(Test4_BB_col31 AS Decimal(35,15)) AS Test4_BB_col31
mrphzbgm

mrphzbgm1#

使用gnu awk从给定样本到预期输出(如果使用 FPAT ,不是 FS ):

$ awk '
BEGIN {
    # FS=","                        # no quoted fields
    FPAT="([^,]*)|(\"[^\"]+\")"
}
NR==1 {
    for(i=1;i<=NF;i++)
        a[i]=$i
}
NR==3 {
    for(i=1;i<=NF;i++) {
        gsub(/^"|"$/,"",$i)
        printf "CAST(%s AS %s) AS %s%s\n",a[i],$i,a[i],(i==NF?"":",")
    }
    exit
}' file

输出:

CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS Bigint) AS Col3,
CAST(Col4 AS Int) AS Col4,
CAST(Col5 AS String) AS Col5

一个管道内衬:

$ ... | awk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")"}NR==1{for(i=1;i<=NF;i++)a[i]=$i}NR==3{for(i=1;i<=NF;i++){gsub(/^"|"$/,"",$i);printf "CAST(%s AS %s) AS %s%s\n",a[i],$i,a[i],(i==NF?"":",")}exit}'

已更新带引号字段的输出:

CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS Bigint) AS Col3,
CAST(Col4 AS Int) AS Col4,
CAST(Col5 AS String) AS Col5,
CAST(Col6 AS Decimal (35,2)) AS Col6

更新的版本 FPAT 少一点臭味。它有一个入门级解析器来处理(单对)双引号中的逗号( \" 未正确处理):

awk '
function parse(str,    i,j,n,q) {
    for(i=1;i<=length(str);i++) {
        if(i==length(str)||(substr(str,i+1,1)=="," && q==0)) {
            b[++n]=substr(str,j+1,i-j)
            j=i+1
        }
        if(substr(str,i+1,1)=="\"")
            q=(!q)
    }
    return n
}
BEGIN {
    FS=","
}
NR==1 {
    for(i=1;i<=NF;i++)
        a[i]=$i
}
NR==3 {
    n=parse($0)
    for(i=1;i<=n;i++) {
        gsub(/^"|"$/,"",b[i])
        printf "CAST(%s AS %s) AS %s%s\n",a[i],b[i],a[i],(i==n?"":",")
    }
    exit
}' file

相关问题