ApacheFlink:如何在转换(map、reduce、join等)函数中使用动态类型

rxztt3cl  于 2021-06-25  发布在  Flink
关注(0)|答案(1)|浏览(300)

我创建了一个定制的csv读取器,该读取器生成用指定的动态返回类型 env.readCsvFile(location).pojoType(dynClass, arr); ,在哪里 dynClass 是用bytebuddy和 arr 是列名数组。然后我尝试将pojoMap到一个元组:

public class PojoToTupleRichMapFunction extends RichMapFunction<I, O> implements ResultTypeQueryable {

    Class tupleClass = null;
    Class pojoClass = null;
    Config.Schema schema = null;
    transient List<Field> fields = null;

    PojoToTupleRichMapFunction(DynDataSet dynSet) {
        this.schema = dynSet.dataDef.schema;
        // Create a map from pojo to tuple
        this.tupleClass = O.getTupleClass(schema.columns.size());
        this.pojoClass = dynSet.recType;

    }

    @Override
    public void open(Configuration parameters) {
        fields = new ArrayList<>(schema.columns.size());
        for (int i = 0; i < schema.columns.size(); i++) {
            try {
                fields.add(pojoClass.getField(schema.columns.get(i).name));
            } catch (NoSuchFieldException | SecurityException ex) {
                Logger.getLogger(PojoToTupleRichMapFunction.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
    }

    @Override
    public TupleTypeInfo getProducedType() {
        // build list of types
        List<BasicTypeInfo<?>> types = new ArrayList<>(schema.columns.size());
        for (int i = 0; i < schema.columns.size(); i++) {
            BasicTypeInfo bt = null;
            String typeName = schema.columns.get(i).type.getName();
            switch (typeName) {
                case "java.lang.Integer":
                    bt = BasicTypeInfo.INT_TYPE_INFO;
                    break;
                case "java.lang.String":
                    bt = BasicTypeInfo.STRING_TYPE_INFO;
                    break;
                case "java.lang.Long":
                    bt = BasicTypeInfo.LONG_TYPE_INFO;
                    break;
                case "java.lang.Short":
                    bt = BasicTypeInfo.SHORT_TYPE_INFO;
                    break;
                default:
                    Logger.getLogger(Config.class.getName()).log(Level.SEVERE, "Unknown type: {0}", typeName);

            }
            types.add(bt);
        }
        return new TupleTypeInfo(tupleClass, types.toArray(new BasicTypeInfo[0]));
    }

    @Override
    public O map(I pojo) throws Exception {
        O ret;
        ret = (O) tupleClass.newInstance();
        for (int i = 0; i < schema.columns.size(); i++) {
            ret.setField(fields.get(i).get(pojo), i);
        }
        return ret;
    }
}

我遇到的挑战是这个运行时错误片段: org.apache.flink.api.common.functions.InvalidTypesException: Input mismatch: POJO type 'com.me.dynamic.FlinkPojo$ByteBuddy$zQ9VllB1' expected but was 'com.me.dynamic.I'. 函数声明指定基类型。实际输入类型是动态子类。输出类型由getproducedtype提供。
如何创建mapfunction来处理动态输入类型?

e5nszbig

e5nszbig1#

为了提供至少一个解决方案(可能不是最好的),我将类定义更改为:

public class PojoToTupleRichMapFunction<I extends FlinkPojo, O extends Tuple> extends RichMapFunction<I, O> implements ResultTypeQueryable {
}

然后我使用bytebuddy对包含泛型参数的已编译类进行重新分类。

static private DataSet<?> mapPojoToTuple(DataSet ds, DynDataSet dynSet) {
    Class<?> clazz = new ByteBuddy()
        .subclass(TypeDescription.Generic.Builder.parameterizedType(PojoToTupleRichMapFunction.class, dynSet.recType, Tuple.class).build())
        .make()
        .load(PojoToTupleRichMapFunction.class.getClassLoader())
        .getLoaded();
    Constructor<?> ctr = clazz.getConstructors()[0];
    RichMapFunction fcn = null;
    try {
        fcn = (RichMapFunction) ctr.newInstance(dynSet);
    } catch (InstantiationException | IllegalAccessException | IllegalArgumentException | InvocationTargetException ex) {
        Logger.getLogger(dyn_demo.class.getName()).log(Level.SEVERE, null, ex);
    }
    return ds.map(fcn);
}

这似乎让Flink很满意。

相关问题