R中使用多个条件的子集变量名

sc4hvdpw 于 5个月前发布在其他

关注(0)|答案(2)|浏览(66)

我有以下代码：

df <- data.frame(
  check.names = FALSE,
  `Att-Bissen P [mm]` = c(57937.8),
  `Att-Bissen PET [mm]` = c(39472.9),
  `Att-Bissen Q [mm]` = c(26501.2),
  `Rau. Merl P [mm]` = c(53443.6),
  `Rau. Merl PET [mm]` = c(40535.45),
  `Rau. Merl Q [mm]` = c(15489.5),
  `Syre Felsmuhle/Mertert P [mm]` = c(46020.3),
  `Syre Felsmuhle/Mertert PET [mm]` = c(42196.4),
  `Syre Felsmuhle/Mertert Q [mm]` = c(16210.69079),
  `Wiltz-Winseler P [mm]` = c(63389.7),
  `Wiltz-Winseler PET [mm]` = c(42703.3),
  `Wiltz-Winseler Q [mm]` = c(33576.8),
  `Our-Gemund/Vianden P [mm]` = c(63389.7),
  `Our-Gemund/Vianden PET [mm]` = c(42834.5),
  `Our-Gemund/Vianden Q [mm]` = c(12588.9))

# Define the formula as a function
calc_formula <- function(P, PET, Q, n) {
  1 - (1 + (P / PET)^n) ^ -((n + 1) / (n + Q))
}

# Define the n value
n <- 2.5

# Extract the site names from the column names
site_names <- sub(" .*", "", names(df)[-1])

# Loop through each site and calculate the formula
results <- list()
for (site in site_names) {
  site_data <- df[, grepl(site, names(df))]
  results[[site]] <- calc_formula(site_data[[paste0(site, " P [mm]")]], 
                                  site_data[[paste0(site, " PET [mm]")]], 
                                  site_data[[paste0(site, " Q [mm]")]], n)
}

# Combine the results into a data frame
results_df <- data.frame(Site = names(results), Result = unlist(results))

字符串
以下错误：

Error in data.frame(Site = names(results), Result = unlist(results)) : 
  arguments imply differing number of rows: 5, 3

型
我认为这是因为我不能在“site_names”中指定它也应该考虑“.”和““（空格）。但是我不能确定R如何识别空格，例如，在某些名称处，而不是在P，PET或Q之前。因此，它在“Rau. Merl”和“Syre Felsmuhle/Mertert”的站点中返回NA。
我可以很容易地更改.csv文件中的名称，但当我有一个强大的数据集是相当繁琐的。
我如何修复这部分代码？
任何帮助将不胜感激。谢谢！！

来源：https://stackoverflow.com/questions/77621835/subset-variable-names-in-r-using-more-than-one-condition

2条答案

按热度按时间

r6vfmomb1#

你有两个问题。首先，让我们看看你的网站名称：

site_names
#  [1] "Att-Bissen"         "Att-Bissen"         "Rau."               "Rau."              
#  [5] "Rau."               "Syre"               "Syre"               "Syre"              
#  [9] "Wiltz-Winseler"     "Wiltz-Winseler"     "Wiltz-Winseler"     "Our-Gemund/Vianden"
# [13] "Our-Gemund/Vianden" "Our-Gemund/Vianden"
# Loop through each site and calculate the formula

字符串
这里有两个问题：1）重复是不好的，因为你只想每个网站一个结果。我们可以在最后放一个... |> unique()来删除重复的内容。2）你已经通过删除第一个空格后的所有内容来提取网站名称。正如你所说，你的一些网站名称中有空格，比如"Rau. Merl"。
让我们通过删除字符串中你真正想要删除的部分来解决这个问题。我们将使用正则表达式或：|和合适的方括号转义来完成这个操作。

site_names <- sub(" P \\[mm\\]| PET \\[mm\\]| Q \\[mm\\]", "", names(df)[-1]) |>
  unique()
site_names
# [1] "Att-Bissen"             "Rau. Merl"              "Syre Felsmuhle/Mertert" "Wiltz-Winseler"        
# [5] "Our-Gemund/Vianden"

型
运行剩下的代码，它现在可以工作了：

# ...
results_df
#                                          Site       Result
# Att-Bissen                         Att-Bissen 0.0001695120
# Rau. Merl                           Rau. Merl 0.0002478667
# Syre Felsmuhle/Mertert Syre Felsmuhle/Mertert 0.0001742918
# Wiltz-Winseler                 Wiltz-Winseler 0.0001359271
# Our-Gemund/Vianden         Our-Gemund/Vianden 0.0003609042

型

赞(0）回复(0）举报 5个月前

1cklez4t2#

我的感觉是，以长格式保存数据可能是有利的。
请注意，reshape()或来自{tidyr}的函数（如pivot_longer()）当然能够以更健壮的方式做到这一点。

重塑

new = data.frame(
  unique(sub(" P \\[mm\\]| PET \\[mm\\]| Q \\[mm\\]", "", names(df))), 
  matrix(as.numeric(df[1L, ]), ncol = 3L, byrow = TRUE)) |> 
  setNames(object = _, nm = c("stations", "P", "PET", "Q"))

字符串
sub()-方法是从@GregorThomas的answer中偷来的。然后我们逐行应用calc_formula()的修改版本。

申请

# slightly modified
calc_formula <- function(row, n = 2.5) {
  1L - (1L + (row[["P"]] / row[["PET"]])^n) ^ -((n + 1L) / (n + row[["Q"]]))
}
new$result = apply(X = new[-1L], MARGIN = 1L, FUN = calc_formula)

型

结果

> new
                stations       P      PET        Q       result
1             Att-Bissen 57937.8 39472.90 26501.20 0.0001695120
2              Rau. Merl 53443.6 40535.45 15489.50 0.0002478667
3 Syre Felsmuhle/Mertert 46020.3 42196.40 16210.69 0.0001742918
4         Wiltz-Winseler 63389.7 42703.30 33576.80 0.0001359271
5     Our-Gemund/Vianden 63389.7 42834.50 12588.90 0.0003609042

型

数据

df <- data.frame(
  check.names = FALSE,
  `Att-Bissen P [mm]` = c(57937.8),
  `Att-Bissen PET [mm]` = c(39472.9),
  `Att-Bissen Q [mm]` = c(26501.2),
  `Rau. Merl P [mm]` = c(53443.6),
  `Rau. Merl PET [mm]` = c(40535.45),
  `Rau. Merl Q [mm]` = c(15489.5),
  `Syre Felsmuhle/Mertert P [mm]` = c(46020.3),
  `Syre Felsmuhle/Mertert PET [mm]` = c(42196.4),
  `Syre Felsmuhle/Mertert Q [mm]` = c(16210.69079),
  `Wiltz-Winseler P [mm]` = c(63389.7),
  `Wiltz-Winseler PET [mm]` = c(42703.3),
  `Wiltz-Winseler Q [mm]` = c(33576.8),
  `Our-Gemund/Vianden P [mm]` = c(63389.7),
  `Our-Gemund/Vianden PET [mm]` = c(42834.5),
  `Our-Gemund/Vianden Q [mm]` = c(12588.9))

型

赞(0）回复(0）举报 5个月前

我来回答

R中使用多个条件的子集变量名

2条答案

相关问题

热门标签

最新问答