在ggrepel中多个点具有相同的标签-避免冗余标签

uqjltbpv  于 5个月前  发布在  其他
关注(0)|答案(1)|浏览(81)

我想用几个具有相同标签的点来注解散点图。我想标记所有的点(而不仅仅是其中的一部分),但是有这么多冗余标签会很混乱。有没有办法用ggrepel::geom_text_repel让一个标签指向所有具有相同标签的点?
我附上最简单的可能情况:

df <- data.frame(
  group = c("A", "A", "B", "B"),
  x = c(1, 2, 3, 4),
  y = c(2, 3, 4, 5)
)
ggplot(df, aes(x, y)) +
  geom_point() +
  geom_text_repel(data=df, aes(label=group), box.padding = 0.5, force = 4)

字符串
现在:x1c 0d1x
我想要的:

PS:@user2862862在2019年发布了同样的问题,但在One label for multiple points中没有正确的答案

hc8w905p

hc8w905p1#

这里有一个方法来实现你想要的。然而,正如@AllanCameron在评论中提到的,这种方法的适当性 * 严重 * 依赖于数据。我已经包括了额外的例子来说明潜在的问题。
该方法包括计算每个组的平均xy,然后再创建两个三角形:一个用于线(df 1),一个用于标签(df 2):

library(dplyr)
library(tidyr)
library(ggplot2)

# Your example data (Example1): IMPORTANT: note modified x and y column names,
# you will first need to change your x and y columns to x_1 and y_1
df <- data.frame(
  group = c("A", "A", "B", "B"),
  x_1 = c(1, 2, 3, 4),
  y_1 = c(2, 3, 4, 5)
)

# Create df for plotting lines from original points to group's mean point
df1 <- df %>%
  group_by(group) %>%
  mutate(x_2 = mean(x_1),
         y_2 = mean(y_1)) %>%
  pivot_longer(-group,
               names_to = c(".value", "var"),
               names_sep = "_")

# Create df for single group label
df2 <- df %>%
  group_by(group) %>%
  mutate(x_2 = mean(x_1),
         y_2 = mean(y_1)) %>%
  select(-ends_with("1")) %>%
  distinct()

# Plot
ggplot() +
  geom_path(data = df1,
            aes(x, y, group = group),
            colour = "grey") +
  geom_point(data = df,
             aes(x_1, y_1),
             size = 2) +
  geom_text(data = df2,
            aes(x_2, y_2, label = group),
            size = 5)

字符串
现在考虑这两个其他的示例框架:

# Example2
df <- data.frame(
  group = rep(c("A", "B"), each = 3),
  x_1 = c(1, 1.5, 2, 3, 3.5, 4),
  y_1 = c(2, 4, 3, 4, 2.5, 5)
)

# Example3
set.seed(1)
df <- data.frame(group = rep(c("A", "B", "C"), each = 10),
                 x_1 = runif(n = 30, min = 1, max = 4),
                 y_1 = runif(n = 30, min = 2, max = 5))


x1c 0d1x的数据
Example 1和Example 2看起来不错,但您提出的方法不能很好地扩展像Example 3这样的数据。每个组的线交叉,这使得解释变得困难。如果您的完整数据更复杂,并且包含很多像Example 3这样的点,使用颜色(或形状)在传达数据中发生的事情方面要有效得多:

ggplot() +
  geom_point(data = df,
             aes(x_1, y_1, colour = group),
             size = 2) +
  labs(x = "x", y = "y")


相关问题