ApachePig—如何在pig中找到连接两个数据集并分组后的平均值

monwx1rj  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(265)

我有两个数据集,employeedetail包含4列(id、name、gender、location)和salarydetail(id、salary)。我加入了两个数据集,并将它们分组为位置。

EmpDetail = load '/Users/bmohanty6/EmployeeDetails/EmpDetail.txt' as (id:int, name:chararray, gender:chararray, location:chararray);
SalaryDetail = load '/Users/bmohanty6/EmployeeDetails/EmpSalary.txt' as (id:int, salary:float);                                     
JoinedEmpDetail = join EmpDetail by id, SalaryDetail by id;                                                                         
GroupedByLocation = group JoinedEmpDetail by location;

dump groupedbylocation为我提供了预期的正确结果。现在当我试着用线下的平均值,

AverageSalary = foreach GroupedByLocation generate group, AVG(SalaryDetail.salary);

它抛出下面的错误。

<line 11, column 58> Could not infer the matching function for org.apache.pig.builtin.AVG as multiple or none of them fit. Please use an explicit cast.

我也尝试了以下方法。但也有同样的错误。

AverageSalary = foreach GroupedByLocation {
  Sum = SUM(SalaryDetail.salary);
  Count = COUNT(SalaryDetail.salary);
  avgSal = Sum/Count;
  generate group as location, avgSal;
  };

此时间错误为:

Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

谁能给我建议一个正确的方法吗。
感谢sivasakthi jayaraman回答我的问题。

AverageSalary = foreach GroupedByLocation generate group, AVG(JoinedEmpDetail.SalaryDetail::salary);

这给了我每个位置的平均工资。现在我试着找出每种性别的平均工资 location . 所以我试着分组 gender 内部 GroupedByLocation 变量。但面对其中的一些问题。

GroupdByGender = foreach GroupedByLocation { 
genderGrp = group JoinedEmpDetail by JoinedEmpDetail.EmpDetail::gender;
avgSalary = foreach genderGrp generate group, AVG(JoinedEmpDetail.SalaryDetail::salary);
generate group as location, JoinedEmpDetail.EmpDetail::gender, avgSalary;
};

我犯了这个错误

Syntax error, unexpected symbol at or near 'JoinedEmpDetail'

有人能帮忙吗。

jecbmhm3

jecbmhm31#

您无法访问 salary 像这样的列,首先需要投影 JoinedEmpDetail 然后访问 salary 列。
你能试试下面的句子吗?

AverageSalary = foreach GroupedByLocation generate group, AVG(JoinedEmpDetail.SalaryDetail::salary);

相关问题