本文整理了Java中cz.seznam.euphoria.core.client.dataset.Dataset.persist()
方法的一些代码示例,展示了Dataset.persist()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Dataset.persist()
方法的具体详情如下:
包路径:cz.seznam.euphoria.core.client.dataset.Dataset
类名称:Dataset
方法名:persist
[英]Persist this dataset.
[中]
代码示例来源:origin: seznam/euphoria
public <S extends DataSink<T>> Dataset<T> persist(S dst) {
this.wrap.persist(dst);
return this;
}
代码示例来源:origin: seznam/euphoria
@SuppressWarnings("unchecked")
private static void applySinkTransforms(Flow flow) {
List<Dataset<?>> outputs = flow.operators().stream()
.filter(o -> o.output().getOutputSink() != null)
.map(Operator::output)
.collect(Collectors.toList());
outputs.forEach(d -> {
if (d.getOutputSink().prepareDataset((Dataset) d)) {
// remove the old output sink
d.persist(null);
}
});
}
代码示例来源:origin: seznam/euphoria
@SuppressWarnings("unchecked")
private static void applySinkTransforms(Flow flow) {
List<Dataset<?>> outputs = flow.operators().stream()
.filter(o -> o.output().getOutputSink() != null)
.map(Operator::output)
.collect(Collectors.toList());
outputs.forEach(d -> {
if (d.getOutputSink().prepareDataset((Dataset) d)) {
// remove the old output sink
d.persist(null);
}
});
}
代码示例来源:origin: seznam/euphoria
/**
* Persist given dataset into this sink via given mapper.
* @param <T> input datatype
* @param input the input dataset
* @param mapper map function for transformation of input value into {@link Cell}.
*/
public <T> void persist(Dataset<T> input, UnaryFunction<T, Cell> mapper) {
MapElements.of(input)
.using(mapper)
.output()
.persist(this);
}
代码示例来源:origin: seznam/euphoria
@Override
public boolean prepareDataset(Dataset<OUT> output) {
Dataset<IN> mapped = MapElements.of(output)
.using(mapper)
.output();
mapped.persist(sink);
sink.prepareDataset(mapped);
return true;
}
代码示例来源:origin: seznam/euphoria
@Override
public boolean prepareDataset(Dataset<T> input) {
MapElements.of(input)
.using(i -> Pair.of(EMPTY, i))
.output()
.persist(wrapped);
return true;
}
代码示例来源:origin: seznam/euphoria
@Override
public boolean prepareDataset(Dataset<OUT> output) {
Dataset<IN> mapped = MapElements.of(output)
.using(mapper)
.output();
mapped.persist(sink);
sink.prepareDataset(mapped);
return true;
}
代码示例来源:origin: seznam/euphoria
public void execute(Executor executor) {
final Flow flow = Flow.create("test");
final ListDataSink<OUT> sink = ListDataSink.get();
buildFlow(flow).persist(sink);
executor.submit(flow).join();
DatasetAssert.unorderedEquals(getOutput(), sink.getOutputs());
}
}
代码示例来源:origin: seznam/euphoria
public void execute(Executor executor) {
final Flow flow = Flow.create("test");
final ListDataSink<OUT> sink = ListDataSink.get();
buildFlow(flow).persist(sink);
executor.submit(flow).join();
DatasetAssert.unorderedEquals(getOutput(), sink.getOutputs());
}
}
代码示例来源:origin: seznam/euphoria
@Override
public ListDataSink<Pair<Integer, Long>> modifySink(
ListDataSink<Pair<Integer, Long>> sink) {
return sink.withPrepareDataset(d -> {
ReduceByKey.of(d)
.keyBy(p -> p.getFirst() % 2)
.valueBy(Pair::getSecond)
.reduceBy((Stream<Long> values, Collector<Long> c) -> values.forEach(c::collect))
.withSortedValues(Long::compare)
.output()
.persist(sink);
});
}
代码示例来源:origin: seznam/euphoria
@Override
public ListDataSink<Pair<Integer, Long>> modifySink(
ListDataSink<Pair<Integer, Long>> sink) {
return sink.withPrepareDataset(d -> {
ReduceByKey.of(d)
.keyBy(p -> p.getFirst() % 2)
.valueBy(Pair::getSecond)
.reduceBy((Stream<Long> values, Collector<Long> c) -> values.forEach(c::collect))
.withSortedValues(Long::compare)
.output()
.persist(sink);
});
}
代码示例来源:origin: seznam/euphoria
private void run() {
Flow flow = Flow.create();
Dataset<Pair<ImmutableBytesWritable, Result>> ds = flow.createInput(
Utils.getHBaseSource(input, conf.get()));
FlatMap.of(ds)
.using((Pair<ImmutableBytesWritable, Result> p, Collector<byte[]> c) -> {
writeCellsAsBytes(p.getSecond(), c);
})
.output()
.persist(Utils.getSink(output, conf.get()));
LOG.info("Starting flow reading from {} and persisting to {}", input, output);
executor.submit(flow).join();
}
代码示例来源:origin: seznam/euphoria
@Test
public void simpleUnionTest() throws InterruptedException, ExecutionException {
Dataset<Integer> first = flow.createInput(
ListDataSource.unbounded(
Arrays.asList(1),
Arrays.asList(2, 3, 4, 5, 6)));
Dataset<Integer> second = flow.createInput(
ListDataSource.unbounded(
Arrays.asList(7, 8, 9)));
// collector of outputs
ListDataSink<Integer> outputSink = ListDataSink.get();
Union.of(first, second)
.output()
.persist(outputSink);
executor.submit(flow).get();
List<Integer> outputs = outputSink.getOutputs();
DatasetAssert.unorderedEquals(
outputs,
1, 2, 3, 4, 5, 6, 7, 8, 9);
}
代码示例来源:origin: seznam/euphoria
@Test
public void testTestSimpleOutput() throws IOException {
List<String> data = Arrays.asList("a", "b", "c");
ListDataSource<String> source = ListDataSource.unbounded(data);
Dataset<String> input = flow.createInput(source);
MapElements.of(input)
.using(HBaseTestCase::put)
.output()
.persist(sink);
new LocalExecutor().submit(flow).join();
for (String v : data) {
assertArrayEquals(b(v), get(v));
}
}
代码示例来源:origin: seznam/euphoria
@Test
public void simpleFlatMapTest() throws InterruptedException, ExecutionException {
Dataset<Integer> ints = flow.createInput(
ListDataSource.unbounded(
Arrays.asList(0, 1, 2, 3),
Arrays.asList(4, 5, 6)));
// repeat each element N N count
Dataset<Integer> output = FlatMap.of(ints)
.using((Integer e, Collector<Integer> c) -> {
for (int i = 0; i < e; i++) {
c.collect(e);
}
})
.output();
// collector of outputs
ListDataSink<Integer> outputSink = ListDataSink.get();
output.persist(outputSink);
executor.submit(flow).get();
List<Integer> outputs = outputSink.getOutputs();
DatasetAssert.unorderedEquals(
outputs,
1, 2, 2, 3, 3, 3,
4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6);
}
代码示例来源:origin: seznam/euphoria
/**
* Collects Avro record as JSON string
*
* @param outSink
* @param inSource
* @throws Exception
*/
public static void runFlow(
DataSink<String> outSink,
DataSource<Pair<AvroKey<GenericData.Record>, NullWritable>> inSource)
throws Exception {
Flow flow = Flow.create("simple read avro");
Dataset<Pair<AvroKey<GenericData.Record>, NullWritable>> input = flow.createInput(inSource);
final Dataset<String> output =
FlatMap.named("avro2csv").of(input).using(AvroSourceTest::apply).output();
output.persist(outSink);
Executor executor = new LocalExecutor();
executor.submit(flow).get();
}
代码示例来源:origin: seznam/euphoria
@Test
public void testReading() throws IOException {
Put p = new Put(b("test"));
KeyValue kv = new KeyValue(
b("test"), b("t"), b("col"), System.currentTimeMillis(), b("value"));
p.add(kv);
client.put(p);
ListDataSink<KeyValue> sink = ListDataSink.get();
Dataset<Pair<ImmutableBytesWritable, Result>> input = flow.createInput(source);
Dataset<Cell> cells = FlatMap.of(input)
.using(ResultUtil.toCells())
.output();
MapElements.of(cells)
.using(c -> (KeyValue) c)
.output()
.persist(sink);
new LocalExecutor().submit(flow).join();
assertEquals(Collections.singletonList(kv), sink.getOutputs());
}
代码示例来源:origin: seznam/euphoria
@Test(expected = IllegalArgumentException.class)
public void testMultipleOutputsToSameSink() throws Exception {
flow = Flow.create(getClass().getSimpleName());
input = flow.createInput(new MockStreamDataSource<>());
Dataset<Object> mapped = MapElements.of(input).using(e -> e).output();
Dataset<Pair<Object, Long>> reduced = ReduceByKey
.of(mapped)
.keyBy(e -> e).reduceBy(values -> 1L)
.windowBy(Time.of(Duration.ofSeconds(1)))
.output();
Dataset<Pair<Object, Long>> output = Join.of(mapped, reduced)
.by(e -> e, Pair::getFirst)
.using((Object l, Pair<Object, Long> r, Collector<Long> c) -> {
c.collect(r.getSecond());
})
.windowBy(Time.of(Duration.ofSeconds(1)))
.output();
ListDataSink<Pair<Object, Long>> sink = ListDataSink.get();
output.persist(sink);
reduced.persist(sink);
FlowUnfolder.unfold(flow, Executor.getBasicOps());
}
代码示例来源:origin: seznam/euphoria
@Test
public void testDistinctOnBatchWithoutWindowingLabels() throws Exception {
Flow flow = Flow.create("Test");
Dataset<String> lines = flow.createInput(ListDataSource.bounded(
asList("one two three four", "one two three", "one two", "one")));
// expand it to words
Dataset<String> words = FlatMap.of(lines)
.using(toWords(w -> w))
.output();
Dataset<String> output = Distinct.of(words).output();
ListDataSink<String> out = ListDataSink.get();
output.persist(out);
executor.submit(flow).get();
DatasetAssert.unorderedEquals(
out.getOutputs(),
"four", "one", "three", "two");
}
代码示例来源:origin: seznam/euphoria
@Before
public void before() throws Exception {
flow = Flow.create(getClass().getSimpleName());
input = flow.createInput(new MockStreamDataSource<>());
Dataset<Object> mapped = MapElements.of(input).using(e -> e).output();
Dataset<Pair<Object, Long>> reduced = ReduceByKey
.of(mapped)
.keyBy(e -> e).reduceBy(values -> 1L)
.windowBy(Time.of(Duration.ofSeconds(1)))
.output();
Dataset<Pair<Object, Long>> output = Join.of(mapped, reduced)
.by(e -> e, Pair::getFirst)
.using((Object l, Pair<Object, Long> r, Collector<Long> c) -> c.collect(r.getSecond()))
.windowBy(Time.of(Duration.ofSeconds(1)))
.output();
output.persist(new StdoutSink<>());
}
内容来源于网络,如有侵权,请联系作者删除!