com.google.common.base.Utf8类的使用及代码示例

x33g5p2x  于2022-02-01 转载在 其他  
字(10.4k)|赞(0)|评价(0)|浏览(188)

本文整理了Java中com.google.common.base.Utf8类的一些代码示例,展示了Utf8类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Utf8类的具体详情如下:
包路径:com.google.common.base.Utf8
类名称:Utf8

Utf8介绍

[英]Low-level, high-performance utility methods related to the Charsets#UTF_8character encoding. UTF-8 is defined in section D92 of The Unicode Standard Core Specification, Chapter 3.

The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them.
[中]与字符集#UTF_8字符编码相关的低级别、高性能实用方法。UTF-8在The Unicode Standard Core Specification, Chapter 3的D92节中定义。
这个类实现的UTF-8的变体是Unicode 3.1中引入的UTF-8的受限定义。这意味着它拒绝"non-shortest form"字节序列,即使JDK解码器可能会接受它们。

代码示例

代码示例来源:origin: google/guava

/**
 * Returns {@code true} if {@code bytes} is a <i>well-formed</i> UTF-8 byte sequence according to
 * Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
 * decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
 * sequences, but encoding never reproduces these. Such byte sequences are <i>not</i> considered
 * well-formed.
 *
 * <p>This method returns {@code true} if and only if {@code Arrays.equals(bytes, new
 * String(bytes, UTF_8).getBytes(UTF_8))} does, but is more efficient in both time and space.
 */
public static boolean isWellFormed(byte[] bytes) {
 return isWellFormed(bytes, 0, bytes.length);
}

代码示例来源:origin: google/error-prone

private boolean isValidTag(String tag) {
 return Utf8.encodedLength(tag) <= 23;
}

代码示例来源:origin: google/guava

utf8Length += ((0x7f - c) >>> 31); // branch free!
} else {
 utf8Length += encodedLengthGeneral(sequence, i);
 break;

代码示例来源:origin: google/guava

/**
 * Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
 * {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
 * isWellFormed(bytes)} is true.
 *
 * @param bytes the input buffer
 * @param off the offset in the buffer of the first byte to read
 * @param len the number of bytes to read from the buffer
 */
public static boolean isWellFormed(byte[] bytes, int off, int len) {
 int end = off + len;
 checkPositionIndexes(off, end, bytes.length);
 // Look for the first non-ASCII character.
 for (int i = off; i < end; i++) {
  if (bytes[i] < 0) {
   return isWellFormedSlowPath(bytes, i, end);
  }
 }
 return true;
}

代码示例来源:origin: google/guava

private static int encodedLengthGeneral(CharSequence sequence, int start) {
 int utf16Length = sequence.length();
 int utf8Length = 0;
 for (int i = start; i < utf16Length; i++) {
  char c = sequence.charAt(i);
  if (c < 0x800) {
   utf8Length += (0x7f - c) >>> 31; // branch free!
  } else {
   utf8Length += 2;
   // jdk7+: if (Character.isSurrogate(c)) {
   if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
    // Check that we have a well-formed surrogate pair.
    if (Character.codePointAt(sequence, i) == c) {
     throw new IllegalArgumentException(unpairedSurrogateMsg(i));
    }
    i++;
   }
  }
 }
 return utf8Length;
}

代码示例来源:origin: google/j2objc

/**
 * Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
 * {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
 * isWellFormed(bytes)} is true.
 *
 * @param bytes the input buffer
 * @param off the offset in the buffer of the first byte to read
 * @param len the number of bytes to read from the buffer
 */
public static boolean isWellFormed(byte[] bytes, int off, int len) {
 int end = off + len;
 checkPositionIndexes(off, end, bytes.length);
 // Look for the first non-ASCII character.
 for (int i = off; i < end; i++) {
  if (bytes[i] < 0) {
   return isWellFormedSlowPath(bytes, i, end);
  }
 }
 return true;
}

代码示例来源:origin: google/j2objc

private static int encodedLengthGeneral(CharSequence sequence, int start) {
 int utf16Length = sequence.length();
 int utf8Length = 0;
 for (int i = start; i < utf16Length; i++) {
  char c = sequence.charAt(i);
  if (c < 0x800) {
   utf8Length += (0x7f - c) >>> 31; // branch free!
  } else {
   utf8Length += 2;
   // jdk7+: if (Character.isSurrogate(c)) {
   if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
    // Check that we have a well-formed surrogate pair.
    if (Character.codePointAt(sequence, i) == c) {
     throw new IllegalArgumentException(unpairedSurrogateMsg(i));
    }
    i++;
   }
  }
 }
 return utf8Length;
}

代码示例来源:origin: google/j2objc

/**
 * Returns {@code true} if {@code bytes} is a <i>well-formed</i> UTF-8 byte sequence according to
 * Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
 * decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
 * sequences, but encoding never reproduces these. Such byte sequences are <i>not</i> considered
 * well-formed.
 *
 * <p>This method returns {@code true} if and only if {@code Arrays.equals(bytes, new
 * String(bytes, UTF_8).getBytes(UTF_8))} does, but is more efficient in both time and space.
 */
public static boolean isWellFormed(byte[] bytes) {
 return isWellFormed(bytes, 0, bytes.length);
}

代码示例来源:origin: springside/springside4

/**
   * 计算字符串被UTF8编码后的字节数 via guava
   * 
   * @see Utf8#encodedLength(CharSequence)
   */
  public static int utf8EncodedLength(@Nullable CharSequence sequence) {
    if (StringUtils.isEmpty(sequence)) {
      return 0;
    }
    return Utf8.encodedLength(sequence);
  }
}

代码示例来源:origin: wildfly/wildfly

/**
 * Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
 * {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
 * isWellFormed(bytes)} is true.
 *
 * @param bytes the input buffer
 * @param off the offset in the buffer of the first byte to read
 * @param len the number of bytes to read from the buffer
 */
public static boolean isWellFormed(byte[] bytes, int off, int len) {
 int end = off + len;
 checkPositionIndexes(off, end, bytes.length);
 // Look for the first non-ASCII character.
 for (int i = off; i < end; i++) {
  if (bytes[i] < 0) {
   return isWellFormedSlowPath(bytes, i, end);
  }
 }
 return true;
}

代码示例来源:origin: google/j2objc

utf8Length += ((0x7f - c) >>> 31); // branch free!
} else {
 utf8Length += encodedLengthGeneral(sequence, i);
 break;

代码示例来源:origin: wildfly/wildfly

private static int encodedLengthGeneral(CharSequence sequence, int start) {
 int utf16Length = sequence.length();
 int utf8Length = 0;
 for (int i = start; i < utf16Length; i++) {
  char c = sequence.charAt(i);
  if (c < 0x800) {
   utf8Length += (0x7f - c) >>> 31; // branch free!
  } else {
   utf8Length += 2;
   // jdk7+: if (Character.isSurrogate(c)) {
   if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
    // Check that we have a well-formed surrogate pair.
    if (Character.codePointAt(sequence, i) == c) {
     throw new IllegalArgumentException(unpairedSurrogateMsg(i));
    }
    i++;
   }
  }
 }
 return utf8Length;
}

代码示例来源:origin: wildfly/wildfly

/**
 * Returns {@code true} if {@code bytes} is a <i>well-formed</i> UTF-8 byte sequence according to
 * Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
 * decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
 * sequences, but encoding never reproduces these. Such byte sequences are <i>not</i> considered
 * well-formed.
 *
 * <p>This method returns {@code true} if and only if {@code Arrays.equals(bytes, new
 * String(bytes, UTF_8).getBytes(UTF_8))} does, but is more efficient in both time and space.
 */
public static boolean isWellFormed(byte[] bytes) {
 return isWellFormed(bytes, 0, bytes.length);
}

代码示例来源:origin: google/guava

public void testEncodedLength_validStrings() {
 assertEquals(0, Utf8.encodedLength(""));
 assertEquals(11, Utf8.encodedLength("Hello world"));
 assertEquals(8, Utf8.encodedLength("Résumé"));
 assertEquals(
   461,
   Utf8.encodedLength(
     "威廉·莎士比亞(William Shakespeare,"
       + "1564年4月26號—1616年4月23號[1])係隻英國嗰演員、劇作家同詩人,"
       + "有時間佢簡稱莎翁;中國清末民初哈拕翻譯做舌克斯毕、沙斯皮耳、筛斯比耳、"
       + "莎基斯庇尔、索士比尔、夏克思芘尔、希哀苦皮阿、叶斯壁、沙克皮尔、"
       + "狹斯丕爾。[2]莎士比亞編寫過好多作品,佢嗰劇作響西洋文學好有影響,"
       + "哈都拕人翻譯做好多話。"));
 // A surrogate pair
 assertEquals(4, Utf8.encodedLength(newString(MIN_HIGH_SURROGATE, MIN_LOW_SURROGATE)));
}

代码示例来源:origin: org.jboss.eap/wildfly-client-all

/**
 * Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
 * {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
 * isWellFormed(bytes)} is true.
 *
 * @param bytes the input buffer
 * @param off the offset in the buffer of the first byte to read
 * @param len the number of bytes to read from the buffer
 */
public static boolean isWellFormed(byte[] bytes, int off, int len) {
 int end = off + len;
 checkPositionIndexes(off, end, bytes.length);
 // Look for the first non-ASCII character.
 for (int i = off; i < end; i++) {
  if (bytes[i] < 0) {
   return isWellFormedSlowPath(bytes, i, end);
  }
 }
 return true;
}

代码示例来源:origin: wildfly/wildfly

utf8Length += ((0x7f - c) >>> 31); // branch free!
} else {
 utf8Length += encodedLengthGeneral(sequence, i);
 break;

代码示例来源:origin: com.diffplug.guava/guava-core

private static int encodedLengthGeneral(CharSequence sequence, int start) {
  int utf16Length = sequence.length();
  int utf8Length = 0;
  for (int i = start; i < utf16Length; i++) {
    char c = sequence.charAt(i);
    if (c < 0x800) {
      utf8Length += (0x7f - c) >>> 31; // branch free!
    } else {
      utf8Length += 2;
      // jdk7+: if (Character.isSurrogate(c)) {
      if (MIN_SURROGATE <= c && c <= MAX_SURROGATE) {
        // Check that we have a well-formed surrogate pair.
        if (Character.codePointAt(sequence, i) == c) {
          throw new IllegalArgumentException(unpairedSurrogateMsg(i));
        }
        i++;
      }
    }
  }
  return utf8Length;
}

代码示例来源:origin: google/guava

private static void assertWellFormed(int... bytes) {
 assertTrue(Utf8.isWellFormed(toByteArray(bytes)));
}

代码示例来源:origin: google/guava

public void testEncodedLength_validStrings2() {
 HashMap<Integer, Integer> utf8Lengths = new HashMap<>();
 utf8Lengths.put(0x00, 1);
 utf8Lengths.put(0x7f, 1);
 utf8Lengths.put(0x80, 2);
 utf8Lengths.put(0x7ff, 2);
 utf8Lengths.put(0x800, 3);
 utf8Lengths.put(MIN_SUPPLEMENTARY_CODE_POINT - 1, 3);
 utf8Lengths.put(MIN_SUPPLEMENTARY_CODE_POINT, 4);
 utf8Lengths.put(MAX_CODE_POINT, 4);
 Integer[] codePoints = utf8Lengths.keySet().toArray(new Integer[] {});
 StringBuilder sb = new StringBuilder();
 Random rnd = new Random();
 for (int trial = 0; trial < 100; trial++) {
  sb.setLength(0);
  int utf8Length = 0;
  for (int i = 0; i < 6; i++) {
   Integer randomCodePoint = codePoints[rnd.nextInt(codePoints.length)];
   sb.appendCodePoint(randomCodePoint);
   utf8Length += utf8Lengths.get(randomCodePoint);
   if (utf8Length != Utf8.encodedLength(sb)) {
    StringBuilder repro = new StringBuilder();
    for (int j = 0; j < sb.length(); j++) {
     repro.append(" ").append((int) sb.charAt(j)); // GWT compatible
    }
    assertEquals(repro.toString(), utf8Length, Utf8.encodedLength(sb));
   }
  }
 }
}

代码示例来源:origin: org.kill-bill.billing/killbill-platform-osgi-bundles-logger

/**
 * Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
 * {@link #isWellFormed(byte[])}. Note that this can be false even when {@code
 * isWellFormed(bytes)} is true.
 *
 * @param bytes the input buffer
 * @param off the offset in the buffer of the first byte to read
 * @param len the number of bytes to read from the buffer
 */
public static boolean isWellFormed(byte[] bytes, int off, int len) {
 int end = off + len;
 checkPositionIndexes(off, end, bytes.length);
 // Look for the first non-ASCII character.
 for (int i = off; i < end; i++) {
  if (bytes[i] < 0) {
   return isWellFormedSlowPath(bytes, i, end);
  }
 }
 return true;
}

相关文章