cc.mallet.types.InstanceList类的使用及代码示例

x33g5p2x  于2022-01-21 转载在 其他  
字(8.0k)|赞(0)|评价(0)|浏览(149)

本文整理了Java中cc.mallet.types.InstanceList类的一些代码示例,展示了InstanceList类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。InstanceList类的具体详情如下:
包路径:cc.mallet.types.InstanceList
类名称:InstanceList

InstanceList介绍

[英]A list of machine learning instances, typically used for training or testing of a machine learning algorithm.

All of the instances in the list will have been passed through the same cc.mallet.pipe.Pipe, and thus must also share the same data and target Alphabets. InstanceList keeps a reference to the pipe and the two alphabets.

The most common way of adding instances to an InstanceList is through the add(PipeInputIterator) method. PipeInputIterators are a way of mapping general data sources into instances suitable for processing through a pipe. As each cc.mallet.types.Instance is pulled from the PipeInputIterator, the InstanceList copies the instance and runs the copy through its pipe (with resultant destructive modifications) before saving the modified instance on its list. This is the usual way in which instances are transformed by pipes.

InstanceList also contains methods for randomly generating lists of feature vectors; splitting lists into non-overlapping subsets (useful for test/train splits), and iterators for cross validation.
[中]机器学习实例列表,通常用于机器学习算法的训练或测试。
列表中的所有实例都将通过相同的cc传递。木槌管管道,因此也必须共享相同的数据和目标字母。InstanceList保留对管道和两个字母的引用。
向InstanceList添加实例的最常见方法是通过add(PipeInputIterator)方法。PipeInputInterators是一种将常规数据源映射到适合通过管道处理的实例的方法。作为每个cc。木槌类型。实例是从PipeInputIterator中提取的,InstanceList复制该实例并在其列表中保存修改后的实例之前通过其管道运行副本(产生破坏性修改)。这是通过管道转换实例的常用方法。
InstanceList还包含随机生成特征向量列表的方法;将列表拆分为非重叠子集(用于测试/训练拆分),并使用迭代器进行交叉验证。

代码示例

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

public InstanceList pipeInstances (Iterator<Instance> source)
{
 // I think that pipes should be associated neither with InstanceLists, nor
 //  with Instances. -cas
 InstanceList toked = new InstanceList (tokenizationPipe);
 toked.addThruPipe (source);
 InstanceList piped = new InstanceList (getFeaturePipe ());
 piped.addThruPipe (toked.iterator());
 return piped;
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

/** Return an list of instances with a particular label. */
public InstanceList getCluster(int label) {		
  InstanceList cluster = new InstanceList(instances.getPipe());		
  for (int n=0 ; n<instances.size() ; n++) 
  if (labels[n] == label)
      cluster.add(instances.get(n));			
  return cluster;
}

代码示例来源:origin: cc.mallet/mallet

public Alphabet[] getAlphabets () {
  return new Alphabet[] {getDataAlphabet(), getTargetAlphabet() };
}

代码示例来源:origin: cc.mallet/mallet

public InstanceList subList (int start, int end)
{
  InstanceList other = this.cloneEmpty();
  for (int i = start; i < end; i++) {
    other.add (get (i));
  }
  return other;
}

代码示例来源:origin: cc.mallet/mallet

public void testFixedNumLabels () throws IOException, ClassNotFoundException
{
 Pipe p = new GenericAcrfData2TokenSequence (2);
 InstanceList training = new InstanceList (p);
 training.addThruPipe (new LineGroupIterator (new StringReader (sampleFixedData), Pattern.compile ("^$"), true));
 assertEquals (1, training.size ());
 Instance inst1 = training.get (0);
 LabelsSequence ls1 = (LabelsSequence) inst1.getTarget ();
 assertEquals (4, ls1.size ());
}

代码示例来源:origin: com.github.steveash.mallet/mallet

public void testOne ()
{
Pipe p = createPipe();
InstanceList ilist = new InstanceList (p);
ilist.addThruPipe(new StringArrayIterator(data));
  assertTrue (ilist.size() == 3);
}

代码示例来源:origin: com.github.steveash.jg2p/jg2p-core

private InstanceList makeExamplesFromAligns(Collection<SWord> inputs) {
 Pipe pipe = makePipe();
 int count = 0;
 InstanceList instances = new InstanceList(pipe);
 for (SWord word : inputs) {
  Instance ii = new Instance(word, null, null, null);
  instances.addThruPipe(ii);
  count += 1;
 }
 log.info("Read {} instances of training data for syll phone tag", count);
 return instances;
}

代码示例来源:origin: uk.gov.dstl.baleen/baleen-mallet

@Override
protected void doProcess(JCas jCas) throws AnalysisEngineProcessException {
 InstanceList instances = new InstanceList(classifierModel.getInstancePipe());
 instances.addThruPipe(new Instance(jCas.getDocumentText(), "", "from jcas", null));
 Classification classify = classifierModel.classify(instances.get(0));
 Metadata md = new Metadata(jCas);
 md.setKey(metadataKey);
 md.setValue(classify.getLabeling().getBestLabel().toString());
 addToJCasIndex(md);
}

代码示例来源:origin: cc.mallet/mallet

public InstanceList subList (double proportion)
{
  if (proportion > 1.0)
    throw new IllegalArgumentException ("proportion must by <= 1.0");
  InstanceList other = (InstanceList) clone();
  other.shuffle(new java.util.Random());
  proportion *= other.size();
  for (int i = 0; i < proportion; i++)
    other.add (get(i));
  return other;
}

代码示例来源:origin: cc.mallet/mallet

public double getInstanceWeight (int index) {
  if (index > this.size()) {
    throw new IllegalArgumentException("Index out of bounds: index="+index+" size="+this.size());
  }
  if (instWeights != null) {
    Double value = instWeights.get(get(index));
    if (value != null) {
      return value;
    }
  }
  return 1.0;
}

代码示例来源:origin: cc.mallet/mallet

public Sequence pipeInput (Object input)
 {
  InstanceList all = new InstanceList (getFeaturePipe ());
  all.add (input, null, null, null);
  return (Sequence) all.get (0).getData();
 }
}

代码示例来源:origin: cc.mallet/mallet

public InstanceList sampleWithReplacement (java.util.Random r, int numSamples)
{
  InstanceList ret = this.cloneEmpty();
  for (int i = 0; i < numSamples; i++)
    ret.add (this.get(r.nextInt(this.size())));
  return ret;
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

public LabelVector targetLabelDistribution ()
{
  if (this.size() == 0) return null;
  if (!(get(0).getTarget() instanceof Labeling))
    throw new IllegalStateException ("Target is not a labeling.");
  double[] counts = new double[getTargetAlphabet().size()];
  for (int i = 0; i < this.size(); i++) {
    Instance instance =  get(i);
    Labeling l = (Labeling) instance.getTarget();
    l.addTo (counts, getInstanceWeight(i));
  }
  return new LabelVector ((LabelAlphabet)getTargetAlphabet(), counts);
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

/**
 *
 * @param i
 * @param j
 * @return A new {@link InstanceList} containing the two argument {@link Instance}s.
 */
public static InstanceList makeList (Instance i, Instance j) {
  InstanceList list = new InstanceList(new Noop(i.getDataAlphabet(), i.getTargetAlphabet()));
  list.add(i);
  list.add(j);
  return list;
}

代码示例来源:origin: cc.mallet/mallet

public void setPerLabelFeatureSelection (FeatureSelection[] selectedFeatures)
{
  if (selectedFeatures != null) {
    for (int i = 0; i < selectedFeatures.length; i++)
      if (selectedFeatures[i].getAlphabet() != getDataAlphabet())
        throw new IllegalArgumentException ("Vocabularies do not match");
  }
  perLabelFeatureSelection = selectedFeatures;
}

代码示例来源:origin: cc.mallet/mallet

/** Replaces the <code>Instance</code> at position <code>index</code>
 * with a new one. */
public void setInstance (int index, Instance instance)
{
  assert (this.getDataAlphabet().equals(instance.getDataAlphabet()));
  assert (this.getTargetAlphabet().equals(instance.getTargetAlphabet()));
  this.set(index, instance);
}

代码示例来源:origin: com.github.steveash.mallet/mallet

public BaggingClassifier train (InstanceList trainingList)
{
  Classifier[] classifiers = new Classifier[numBags];
  java.util.Random r = new java.util.Random ();
  for (int round = 0; round < numBags; round++) {
    InstanceList bag = trainingList.sampleWithReplacement (r, trainingList.size());
    classifiers[round] = underlyingTrainer.newClassifierTrainer().train (bag);
  }
  this.classifier = new BaggingClassifier (trainingList.getPipe(), classifiers);
  return classifier;
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

/** Adds the input instance to this list, after passing it through the
 * InstanceList's pipe.
 * <p>
 * If several instances are to be added then accumulate them in a List\<Instance\>
 * and use <tt>addThruPipe(Iterator<Instance>)</tt> instead.
 */
public void addThruPipe(Instance inst)
{
 addThruPipe(new SingleInstanceIterator(inst));
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

public TokenClassifiers(ClassifierTrainer trainer, InstanceList trainList, int randSeed, int numCV)
{
  super(trainList.getPipe());
  m_trainer = trainer;
  m_randSeed = randSeed;
  m_numCV = numCV;
  m_table = new HashMap();
  doTraining(trainList);
}

代码示例来源:origin: cc.mallet/mallet

public void testSetGetParameters ()
{
   MaxEntTrainer trainer = new MaxEntTrainer();
  Alphabet fd = dictOfSize (6);
  String[] classNames = new String[] {"class0", "class1", "class2"};
  InstanceList ilist = new InstanceList (new Randoms(1), fd, classNames, 20);
  Optimizable.ByGradientValue maxable = trainer.getOptimizable (ilist);
  TestOptimizable.testGetSetParameters (maxable);
}

相关文章

微信公众号

最新文章

更多