edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation.getTokenIdFromCharacterOffset()方法的使用及代码示例

x33g5p2x  于2022-01-30 转载在 其他  
字(14.1k)|赞(0)|评价(0)|浏览(112)

本文整理了Java中edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation.getTokenIdFromCharacterOffset()方法的一些代码示例,展示了TextAnnotation.getTokenIdFromCharacterOffset()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。TextAnnotation.getTokenIdFromCharacterOffset()方法的具体详情如下:
包路径:edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation
类名称:TextAnnotation
方法名:getTokenIdFromCharacterOffset

TextAnnotation.getTokenIdFromCharacterOffset介绍

[英]Get the position of token that corresponds to the character offset that is passed as a parameter. This function could be useful when dealing with corpora that specify annotation in terms of character offsets. In particular, the CuratorClient uses this function to convert views from the Curator representation. NOTE: one-past-the-end indexing can make this problematic. Currently, constituents are processed so that only characters within tokens are mapped to token ids (avoiding ambiguity at the cost of introducing complexity for users thinking of one-past-the-end indexing). I.E. you MUST modify the end offset in the call if you are using one-past-the-end offsets. (example: curator data structures use one-past-the- end, as do TextAnnotation Views/Constituents. This behavior was chosen to handle the case where there is arbitrary whitespace, and to avoid confusion when two tokens are contiguous (the first character of the second token would conflict with the last (one-past-the-end) character of the first. UPDATED to allow non-zero first token character offset (i.e. in case where source text has markup preamble that you want to ignore. Current implementation maps char offsets not representing tokens to the index '-1'.
[中]获取作为参数传递的字符偏移量对应的标记位置。当处理根据字符偏移量指定注释的语料库时,此函数可能很有用。特别是,策展人客户端使用此功能转换策展人表示的视图。注意:一次过结束索引可能会造成问题。目前,对组成部分进行处理,以便只将令牌中的字符映射到令牌ID(避免歧义,但代价是为用户引入复杂度,让他们认为索引已经结束)。也就是说,如果您使用的是超过结束偏移量的偏移量,则必须在通话中修改结束偏移量。(例如:curator数据结构使用结束后的一个字符,TextAnnotation视图/组成部分也使用结束后的一个字符。选择此行为是为了处理存在任意空白的情况,并避免两个标记连续时出现混淆(第二个标记的第一个字符将与第一个标记的最后一个(结束后的一个)字符冲突)。已更新以允许非零的第一个标记字符偏移量(即,如果源文本具有要忽略的标记前导。当前实现将不代表标记的字符偏移量映射到索引'-1'。

代码示例

代码示例来源:origin: edu.illinois.cs.cogcomp/wikipediaAPI

/**
 * Ignores the bug in pre-computing token offsets
 * @param ta
 */
private static void validateTextAnnotationOffset(TextAnnotation ta){
  try{
    ta.getTokenIdFromCharacterOffset(0);
  }catch(Exception e){
  }
}

代码示例来源:origin: edu.illinois.cs.cogcomp/wikipediaAPI-multilingual

/**
 * Ignores the bug in pre-computing token offsets
 * 
 * @param ta
 */
private static void validateTextAnnotationOffset(TextAnnotation ta) {
  try {
    ta.getTokenIdFromCharacterOffset(0);
  } catch (Exception e) {
  }
}

代码示例来源:origin: CogComp/cogcomp-nlp

protected static Constituent getNewConstituentForSpan(String label, String viewName,
    TextAnnotation ta, Span span) {
  int start = ta.getTokenIdFromCharacterOffset(span.getStart());
  int end = ta.getTokenIdFromCharacterOffset(span.getEnding() - 1) + 1;
  Constituent constituent = new Constituent(label, viewName, ta, start, end);
  if (span.isSetAttributes()) {
    copyAttributesToConstituent(span, constituent);
  }
  return constituent;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-curator

protected static Constituent getNewConstituentForSpan(String label, String viewName,
    TextAnnotation ta, Span span) {
  int start = ta.getTokenIdFromCharacterOffset(span.getStart());
  int end = ta.getTokenIdFromCharacterOffset(span.getEnding() - 1) + 1;
  Constituent constituent = new Constituent(label, viewName, ta, start, end);
  if (span.isSetAttributes()) {
    copyAttributesToConstituent(span, constituent);
  }
  return constituent;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-caching-curator

protected static Constituent getNewConstituentForSpan(String label, String viewName, TextAnnotation ta, Span span) {
  int start = ta.getTokenIdFromCharacterOffset(span.getStart());
  int end = ta.getTokenIdFromCharacterOffset(span.getEnding() - 1) + 1;
  Constituent constituent = new Constituent(label, viewName, ta, start, end);
  if (span.isSetAttributes()) {
    copyAttributesToConstituent(span, constituent);
  }
  return constituent;
}

代码示例来源:origin: CogComp/cogcomp-nlp

@Override
public void addView(TextAnnotation ta) throws AnnotatorException {
  assert (ta.hasView(ViewNames.SENTENCE));
  SpanLabelView quantifierView =
      new SpanLabelView(ViewNames.QUANTITIES, "illinois-quantifier", ta, 1d);
  List<QuantSpan> quantSpans = getSpans(ta.getTokenizedText(), true, ta);
  for (QuantSpan span : quantSpans) {
    int startToken = ta.getTokenIdFromCharacterOffset(span.start);
    int endToken = ta.getTokenIdFromCharacterOffset(span.end);
    quantifierView.addSpanLabel(startToken, endToken, span.object.toString(), 1d);
  }
  ta.addView(ViewNames.QUANTITIES, quantifierView);
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-quantifier

@Override
public void addView(TextAnnotation ta) throws AnnotatorException {
  assert (ta.hasView(ViewNames.SENTENCE));
  SpanLabelView quantifierView =
      new SpanLabelView(ViewNames.QUANTITIES, "illinois-quantifier", ta, 1d);
  List<QuantSpan> quantSpans = getSpans(ta.getTokenizedText(), true, ta);
  for (QuantSpan span : quantSpans) {
    int startToken = ta.getTokenIdFromCharacterOffset(span.start);
    int endToken = ta.getTokenIdFromCharacterOffset(span.end);
    quantifierView.addSpanLabel(startToken, endToken, span.object.toString(), 1d);
  }
  ta.addView(ViewNames.QUANTITIES, quantifierView);
}

代码示例来源:origin: CogComp/cogcomp-nlp

/**
 * Gets the token index of a Stanford dependency node relative to the current sentence
 * 
 * @param ta The TextAnnotation containing the sentences
 * @param node The Stanford Dependency node
 * @param sentId The sentence number
 * @return The token index relative to sentence
 */
private int getNodePosition(TextAnnotation ta, IndexedWord node, int sentId) {
  int sentenceStart =
      ta.getView(ViewNames.SENTENCE).getConstituents().get(sentId).getStartSpan();
  int nodeCharacterOffset = node.beginPosition();
  int tokenStartSpan = ta.getTokenIdFromCharacterOffset(nodeCharacterOffset);
  return tokenStartSpan - sentenceStart;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/stanford_3.3.1

/**
 * Gets the token index of a Stanford dependency node relative to the current sentence
 * 
 * @param ta The TextAnnotation containing the sentences
 * @param node The Stanford Dependency node
 * @param sentId The sentence number
 * @return The token index relative to sentence
 */
private int getNodePosition(TextAnnotation ta, IndexedWord node, int sentId) {
  int sentenceStart =
      ta.getView(ViewNames.SENTENCE).getConstituents().get(sentId).getStartSpan();
  int nodeCharacterOffset = node.beginPosition();
  int tokenStartSpan = ta.getTokenIdFromCharacterOffset(nodeCharacterOffset);
  return tokenStartSpan - sentenceStart;
}

代码示例来源:origin: CogComp/cogcomp-nlp

/**
 * Helper function to create a head constituent from an extent constituent.
 */
public static Constituent getEntityHeadForConstituent(Constituent extentConstituent,
                            TextAnnotation textAnnotation,
                            String viewName) {
  int startCharOffset =
      Integer.parseInt(extentConstituent
          .getAttribute(ACEReader.EntityHeadStartCharOffset));
  int endCharOffset =
      Integer.parseInt(extentConstituent.getAttribute(ACEReader.EntityHeadEndCharOffset)) - 1;
  int startToken = textAnnotation.getTokenIdFromCharacterOffset(startCharOffset);
  int endToken = textAnnotation.getTokenIdFromCharacterOffset(endCharOffset);
  if (startToken >= 0 && endToken >= 0 && !(endToken - startToken < 0)) {
    Constituent cons =
        new Constituent(extentConstituent.getLabel(), 1.0, viewName, textAnnotation,
            startToken, endToken + 1);
    for (String attributeKey : extentConstituent.getAttributeKeys()) {
      cons.addAttribute(attributeKey, extentConstituent.getAttribute(attributeKey));
    }
    return cons;
  }
  return null;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-corpusreaders

/**
 * Helper function to create a head constituent from an extent constituent.
 */
public static Constituent getEntityHeadForConstituent(Constituent extentConstituent,
                            TextAnnotation textAnnotation,
                            String viewName) {
  int startCharOffset =
      Integer.parseInt(extentConstituent
          .getAttribute(ACEReader.EntityHeadStartCharOffset));
  int endCharOffset =
      Integer.parseInt(extentConstituent.getAttribute(ACEReader.EntityHeadEndCharOffset)) - 1;
  int startToken = textAnnotation.getTokenIdFromCharacterOffset(startCharOffset);
  int endToken = textAnnotation.getTokenIdFromCharacterOffset(endCharOffset);
  if (startToken >= 0 && endToken >= 0 && !(endToken - startToken < 0)) {
    Constituent cons =
        new Constituent(extentConstituent.getLabel(), 1.0, viewName, textAnnotation,
            startToken, endToken + 1);
    for (String attributeKey : extentConstituent.getAttributeKeys()) {
      cons.addAttribute(attributeKey, extentConstituent.getAttribute(attributeKey));
    }
    return cons;
  }
  return null;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-curator

int topTokenId = ta.getTokenIdFromCharacterOffset(topNode.getSpan().getStart());
    int childTokenId = ta.getTokenIdFromCharacterOffset(childNode.getSpan().getStart());

代码示例来源:origin: edu.illinois.cs.cogcomp/md

Integer.parseInt(extentConstituent.getAttribute(ACEReader.EntityHeadEndCharOffset)) - 1;
int startToken = textAnnotation.getTokenIdFromCharacterOffset(startCharOffset);
int endToken = textAnnotation.getTokenIdFromCharacterOffset(endCharOffset);

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-caching-curator

/**
 * Aligns a {@link Labeling} to a {@link TokenLabelView}.
 *
 * @return A TokenLabelView
 */
public static TokenLabelView alignLabelingToTokenLabelView(String viewName, TextAnnotation ta, Labeling labeling) {
  List<Span> labels = labeling.getLabels();
  double score = labeling.getScore();
  String generator = labeling.getSource();
  TokenLabelView view = new TokenLabelView(viewName, generator, ta, score);
  for (Span span : labels) {
    int tokenId = ta.getTokenIdFromCharacterOffset(span.getStart());
    int endTokenId = ta.getTokenIdFromCharacterOffset(span.getEnding());
    if (tokenId == endTokenId)
      endTokenId++;
    for (int i = tokenId; i < endTokenId; i++) {
      view.addTokenLabel(i, span.getLabel(), span.getScore());
      if (span.isSetAttributes() && span.getAttributes().size() > 0) {
        Constituent c = view.getConstituentAtToken(i);
        copyAttributesToConstituent(span, c);
      }
    }
  }
  return view;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-caching-curator

public static TreeView alignForestToDependencyView(String viewName, TextAnnotation ta, Forest dep) {
  TreeView view = new TreeView(viewName, dep.getSource(), ta, 0.0d);
  for (edu.illinois.cs.cogcomp.thrift.base.Tree tree : dep.getTrees()) {
    int topId = tree.getTop();
    List<Node> nodes = tree.getNodes();
    int topTokenStart = nodes.get(topId).getSpan().getStart();
    int topTokenId = ta.getTokenIdFromCharacterOffset(topTokenStart);
    int sentenceId = ta.getSentenceId(topTokenId);
    Tree<Pair<String, Integer>> dependencyTree = makeDependencyTree(ta, tree);
    double score = tree.getScore();
    view.setDependencyTree(sentenceId, dependencyTree, score);
  }
  return view;
}

代码示例来源:origin: CogComp/cogcomp-nlp

/**
 * Aligns a {@link edu.illinois.cs.cogcomp.thrift.base.Labeling} to a
 * {@link edu.illinois.cs.cogcomp.core.datastructures.textannotation.TokenLabelView}.
 *
 * <b>NOTE:</b> must correct for one-past-the-end labeling when calling
 * {@link TextAnnotation#getTokenIdFromCharacterOffset(int)}.
 * 
 * @return A TokenLabelView
 */
public static TokenLabelView alignLabelingToTokenLabelView(String viewName, TextAnnotation ta,
    Labeling labeling) {
  List<Span> labels = labeling.getLabels();
  double score = labeling.getScore();
  String generator = labeling.getSource();
  TokenLabelView view = new TokenLabelView(viewName, generator, ta, score);
  for (Span span : labels) {
    int tokenId = ta.getTokenIdFromCharacterOffset(span.getStart());
    int endTokenId = ta.getTokenIdFromCharacterOffset(span.getEnding() - 1);
    if (tokenId == endTokenId)
      endTokenId++;
    for (int i = tokenId; i < endTokenId; i++) {
      view.addTokenLabel(i, span.getLabel(), span.getScore());
      if (span.isSetAttributes() && span.getAttributes().size() > 0) {
        Constituent c = view.getConstituentAtToken(i);
        copyAttributesToConstituent(span, c);
      }
    }
  }
  return view;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-curator

/**
 * Aligns a {@link edu.illinois.cs.cogcomp.thrift.base.Labeling} to a
 * {@link edu.illinois.cs.cogcomp.core.datastructures.textannotation.TokenLabelView}.
 *
 * <b>NOTE:</b> must correct for one-past-the-end labeling when calling
 * {@link TextAnnotation#getTokenIdFromCharacterOffset(int)}.
 * 
 * @return A TokenLabelView
 */
public static TokenLabelView alignLabelingToTokenLabelView(String viewName, TextAnnotation ta,
    Labeling labeling) {
  List<Span> labels = labeling.getLabels();
  double score = labeling.getScore();
  String generator = labeling.getSource();
  TokenLabelView view = new TokenLabelView(viewName, generator, ta, score);
  for (Span span : labels) {
    int tokenId = ta.getTokenIdFromCharacterOffset(span.getStart());
    int endTokenId = ta.getTokenIdFromCharacterOffset(span.getEnding() - 1);
    if (tokenId == endTokenId)
      endTokenId++;
    for (int i = tokenId; i < endTokenId; i++) {
      view.addTokenLabel(i, span.getLabel(), span.getScore());
      if (span.isSetAttributes() && span.getAttributes().size() > 0) {
        Constituent c = view.getConstituentAtToken(i);
        copyAttributesToConstituent(span, c);
      }
    }
  }
  return view;
}

代码示例来源:origin: CogComp/cogcomp-nlp

public static TreeView alignForestToDependencyView(String viewName, TextAnnotation ta,
    Forest dep) {
  TreeView view = new TreeView(viewName, dep.getSource(), ta, 0.0d);
  for (edu.illinois.cs.cogcomp.thrift.base.Tree tree : dep.getTrees()) {
    int topId = tree.getTop();
    List<Node> nodes = tree.getNodes();
    int topTokenStart = nodes.get(topId).getSpan().getStart();
    int topTokenId = ta.getTokenIdFromCharacterOffset(topTokenStart);
    int sentenceId = ta.getSentenceId(topTokenId);
    Tree<Pair<String, Integer>> dependencyTree = makeDependencyTree(ta, tree);
    double score = tree.getScore();
    view.setDependencyTree(sentenceId, dependencyTree, score);
  }
  return view;
}

代码示例来源:origin: edu.illinois.cs.cogcomp/illinois-curator

public static TreeView alignForestToDependencyView(String viewName, TextAnnotation ta,
    Forest dep) {
  TreeView view = new TreeView(viewName, dep.getSource(), ta, 0.0d);
  for (edu.illinois.cs.cogcomp.thrift.base.Tree tree : dep.getTrees()) {
    int topId = tree.getTop();
    List<Node> nodes = tree.getNodes();
    int topTokenStart = nodes.get(topId).getSpan().getStart();
    int topTokenId = ta.getTokenIdFromCharacterOffset(topTokenStart);
    int sentenceId = ta.getSentenceId(topTokenId);
    Tree<Pair<String, Integer>> dependencyTree = makeDependencyTree(ta, tree);
    double score = tree.getScore();
    view.setDependencyTree(sentenceId, dependencyTree, score);
  }
  return view;
}

代码示例来源:origin: CogComp/cogcomp-nlp

int cleanTextCharStart = xta.getXmlSt().computeModifiedOffsetFromOriginal(charOffsets.getFirst());
int cleanTextCharEnd = xta.getXmlSt().computeModifiedOffsetFromOriginal(charOffsets.getSecond());
int cleanTextNeTokStart = ta.getTokenIdFromCharacterOffset(cleanTextCharStart);
int cleanTextNeTokEnd = ta.getTokenIdFromCharacterOffset(cleanTextCharEnd - 1); // StringTransformation returns one-past-the-end index; TextAnnotation maps at-the-end index
Constituent neCon = new Constituent(neLabel, nerView.getViewName(), ta, cleanTextNeTokStart, cleanTextNeTokEnd + 1); //constituent token indexing uses one-past-the-end
nerView.addConstituent(neCon);

相关文章