R语言如何将数据框中的所有非数字单元格转换为NA

rpppsulh 于 8个月前发布在其他

关注(0)|答案(3)|浏览(105)

我正在尝试将所有包含非数值的单元格转换为缺失数据（NA）。我尝试了类似的沿着将特定值转换为缺失数据的方法，比如：

recode_missing <- function (g, misval)
{
  a <- g == misval
  temp = g
  temp [a] <- NA
  return (temp)
}

效果很好：优雅的R解决方案
我试着解码像a <- g == is.numeric ()（语法错误），a <- is.numeric (g): (Error: (list) object cannot be coerced to type 'double'), or even a [，] <- is.numeric（g[，]`（相同）。我知道移除列的解决方案

remove_nn <- function (data)
{
  # removes all non-numeric columns
  numeric_columns <- sapply (data, is.numeric)
  return (data [, numeric_columns])
} ### remove_nn ###

但这会删除列并将数据框转换为某种矩阵。
有人能建议如何将单个非数字单元格转换为NA，同时保持数据结构不变吗？

编辑

正如注解正确指出的那样，在数值的海洋中没有单独的字符串值。只是一些数字或其他的向量。我现在想知道是什么导致了medians <- apply (data, 2, median)中的非数字错误。我有很多病媒，用眼睛检查是没用的。我发布了num <- sapply (data, is.numeric)和下一个data [,!num]。这给了我非数字的列。在一种情况下，这是由于 one 单元格值包含多余的“。该文件由电子表格进行预处理，如果只有一个单元格是非数字的，则整个矢量将被视为非数字。

来源：https://stackoverflow.com/questions/43086050/how-to-convert-all-non-numeric-cells-in-data-frame-to-na

3条答案

按热度按时间

w46czmvw1#

根据您的编辑，您有应该是数字的矢量，但由于在读入过程中引入了一些错误数据，数据已转换为另一种格式（可能是character或factor）。
这里有一个例子。mydf1 <- mydf2 <- mydf3 <- data.frame(...)只是用相同的数据创建了三个data.frame。

# I'm going to show three approaches
mydf1 <- mydf2 <- mydf3 <- data.frame(
  A = c(1, 2, "x", 4),
  B = c("y", 3, 4, "-")
)

str(mydf1)
# 'data.frame': 4 obs. of  2 variables:
#  $ A: Factor w/ 4 levels "1","2","4","x": 1 2 4 3
#  $ B: Factor w/ 4 levels "-","3","4","y": 4 2 3 1

一种方法是让R将任何不能转换为数值的值强制转换为NA：

## You WILL get warnings
mydf1[] <- lapply(mydf1, function(x) as.numeric(as.character(x)))
# Warning messages:
# 1: In FUN(X[[i]], ...) : NAs introduced by coercion
# 2: In FUN(X[[i]], ...) : NAs introduced by coercion

str(mydf1)
# 'data.frame': 4 obs. of  2 variables:
#  $ A: num  1 2 NA 4
#  $ B: num  NA 3 4 NA

另一种选择是从my SOfun package使用makemeNA：

library(SOfun)
makemeNA(mydf2, "[^0-9]", FALSE)
#    A  B
# 1  1 NA
# 2  2  3
# 3 NA  4
# 4  4 NA

str(.Last.value)
# 'data.frame': 4 obs. of  2 variables:
#  $ A: int  1 2 NA 4
#  $ B: int  NA 3 4 NA

这个函数有点不同，因为它使用type.convert来进行转换，并且可以处理更具体的规则来转换为NA（就像您可以在将数据阅读到R中时使用na.strings的向量）。
关于你的错误，我相信你会在你的data.frame上尝试as.numeric来得到你所显示的错误。
范例：

# Your error...
as.numeric(mydf3)
# Error: (list) object cannot be coerced to type 'double'

你不会在matrix上得到这个错误（但你仍然会得到警告）。

# You'll get a warning
as.numeric(as.matrix(mydf3))
# [1]  1  2 NA  4 NA  3  4 NA
# Warning message:
# NAs introduced by coercion

为什么我们不需要显式地使用as.character？as.matrix为您提供：

str(as.matrix(mydf3))
#  chr [1:4, 1:2] "1" "2" "x" "4" "y" "3" "4" "-"
#  - attr(*, "dimnames")=List of 2
#   ..$ : NULL
#   ..$ : chr [1:2] "A" "B"

你怎么能使用这些信息？

mydf3[] <- as.numeric(as.matrix(mydf3))
# Warning message:
# NAs introduced by coercion 

str(mydf3)
# 'data.frame': 4 obs. of  2 variables:
#  $ A: num  1 2 NA 4
#  $ B: num  NA 3 4 NA

赞(0）回复(0）举报 8个月前

j91ykkif2#

简单是最好的。选择列-我选择了第4列到第31列。

df[,4:31] <- as.numeric(as.factor(as.character(df[,4:31])))

赞(0）回复(0）举报 8个月前

nnvyjq4y3#

Tidyverse的解决方案。

df <- data.frame(
  A = c(1, 2, "x", 4),
  B = c("y", 3, 4, "-"),
  C = 4:7
)

使用来自dqr的mutate和across，我们可以说“使任何非数字列成为数字”。这将 * 隐式地 * 将任何非数字的值转换为NA，并会给你一个给予警告。

library(dplyr)

df |>
  mutate(across(!where(is.numeric), as.numeric))

要显式地做到这一点，我们可以用正则表达式做一些事情。
也就是说，对于任何非数字列，如果一个值中有任何不是0-9的字符，则将其设置为NA。

library(dplyr)
library(stringr)

# This would leave the columns as character vectors.
df |>
  mutate(across(
    !where(is.numeric),
    ~ if_else(str_detect(.x, "[^0-9]"), NA, .x)
  ))

# To make them numeric vectors (as in the other examples):
df |>
  mutate(across(
    !where(is.numeric),
    ~ if_else(str_detect(.x, "[^0-9]"), NA, .x) |>
      as.numeric()
  ))

请注意，这些解决方案比给出的其他答案稍微严格一些，使用!where(is.numeric)我们只对非数字列进行操作，它不会对输出产生任何影响，但如果你有很多列，速度会更快。
为了完整起见，这里是tidyverse副本，

lapply(mydf1, function(x) as.numeric(as.character(x)))

library(dplyr)

df |>
  mutate(across(everything(), ~ as.numeric(as.character(.x))))

赞(0）回复(0）举报 8个月前

我来回答

R语言如何将数据框中的所有非数字单元格转换为NA

3条答案

相关问题

热门标签

最新问答

R语言 如何将数据框中的所有非数字单元格转换为NA

3条答案

相关问题

热门标签

最新问答

R语言如何将数据框中的所有非数字单元格转换为NA