R中使用多个条件的子集变量名

sc4hvdpw  于 5个月前  发布在  其他
关注(0)|答案(2)|浏览(66)

我有以下代码:

df <- data.frame(
  check.names = FALSE,
  `Att-Bissen P [mm]` = c(57937.8),
  `Att-Bissen PET [mm]` = c(39472.9),
  `Att-Bissen Q [mm]` = c(26501.2),
  `Rau. Merl P [mm]` = c(53443.6),
  `Rau. Merl PET [mm]` = c(40535.45),
  `Rau. Merl Q [mm]` = c(15489.5),
  `Syre Felsmuhle/Mertert P [mm]` = c(46020.3),
  `Syre Felsmuhle/Mertert PET [mm]` = c(42196.4),
  `Syre Felsmuhle/Mertert Q [mm]` = c(16210.69079),
  `Wiltz-Winseler P [mm]` = c(63389.7),
  `Wiltz-Winseler PET [mm]` = c(42703.3),
  `Wiltz-Winseler Q [mm]` = c(33576.8),
  `Our-Gemund/Vianden P [mm]` = c(63389.7),
  `Our-Gemund/Vianden PET [mm]` = c(42834.5),
  `Our-Gemund/Vianden Q [mm]` = c(12588.9))

# Define the formula as a function
calc_formula <- function(P, PET, Q, n) {
  1 - (1 + (P / PET)^n) ^ -((n + 1) / (n + Q))
}

# Define the n value
n <- 2.5

# Extract the site names from the column names
site_names <- sub(" .*", "", names(df)[-1])

# Loop through each site and calculate the formula
results <- list()
for (site in site_names) {
  site_data <- df[, grepl(site, names(df))]
  results[[site]] <- calc_formula(site_data[[paste0(site, " P [mm]")]], 
                                  site_data[[paste0(site, " PET [mm]")]], 
                                  site_data[[paste0(site, " Q [mm]")]], n)
}

# Combine the results into a data frame
results_df <- data.frame(Site = names(results), Result = unlist(results))

字符串
以下错误:

Error in data.frame(Site = names(results), Result = unlist(results)) : 
  arguments imply differing number of rows: 5, 3


我认为这是因为我不能在“site_names”中指定它也应该考虑“.”和““(空格)。但是我不能确定R如何识别空格,例如,在某些名称处,而不是在P,PET或Q之前。因此,它在“Rau. Merl”和“Syre Felsmuhle/Mertert”的站点中返回NA。
我可以很容易地更改.csv文件中的名称,但当我有一个强大的数据集是相当繁琐的。
我如何修复这部分代码?
任何帮助将不胜感激。谢谢!!

r6vfmomb

r6vfmomb1#

你有两个问题。首先,让我们看看你的网站名称:

site_names
#  [1] "Att-Bissen"         "Att-Bissen"         "Rau."               "Rau."              
#  [5] "Rau."               "Syre"               "Syre"               "Syre"              
#  [9] "Wiltz-Winseler"     "Wiltz-Winseler"     "Wiltz-Winseler"     "Our-Gemund/Vianden"
# [13] "Our-Gemund/Vianden" "Our-Gemund/Vianden"
# Loop through each site and calculate the formula

字符串
这里有两个问题:1)重复是不好的,因为你只想每个网站一个结果。我们可以在最后放一个... |> unique()来删除重复的内容。2)你已经通过删除第一个空格后的所有内容来提取网站名称。正如你所说,你的一些网站名称中有空格,比如"Rau. Merl"
让我们通过删除字符串中你真正想要删除的部分来解决这个问题。我们将使用正则表达式或:|和合适的方括号转义来完成这个操作。

site_names <- sub(" P \\[mm\\]| PET \\[mm\\]| Q \\[mm\\]", "", names(df)[-1]) |>
  unique()
site_names
# [1] "Att-Bissen"             "Rau. Merl"              "Syre Felsmuhle/Mertert" "Wiltz-Winseler"        
# [5] "Our-Gemund/Vianden"


运行剩下的代码,它现在可以工作了:

# ...
results_df
#                                          Site       Result
# Att-Bissen                         Att-Bissen 0.0001695120
# Rau. Merl                           Rau. Merl 0.0002478667
# Syre Felsmuhle/Mertert Syre Felsmuhle/Mertert 0.0001742918
# Wiltz-Winseler                 Wiltz-Winseler 0.0001359271
# Our-Gemund/Vianden         Our-Gemund/Vianden 0.0003609042

1cklez4t

1cklez4t2#

我的感觉是,以长格式保存数据可能是有利的。
请注意,reshape()或来自{tidyr}的函数(如pivot_longer())当然能够以更健壮的方式做到这一点。

重塑

new = data.frame(
  unique(sub(" P \\[mm\\]| PET \\[mm\\]| Q \\[mm\\]", "", names(df))), 
  matrix(as.numeric(df[1L, ]), ncol = 3L, byrow = TRUE)) |> 
  setNames(object = _, nm = c("stations", "P", "PET", "Q"))

字符串
sub()-方法是从@GregorThomas的answer中偷来的。然后我们逐行应用calc_formula()的修改版本。

申请

# slightly modified
calc_formula <- function(row, n = 2.5) {
  1L - (1L + (row[["P"]] / row[["PET"]])^n) ^ -((n + 1L) / (n + row[["Q"]]))
}
new$result = apply(X = new[-1L], MARGIN = 1L, FUN = calc_formula)

结果

> new
                stations       P      PET        Q       result
1             Att-Bissen 57937.8 39472.90 26501.20 0.0001695120
2              Rau. Merl 53443.6 40535.45 15489.50 0.0002478667
3 Syre Felsmuhle/Mertert 46020.3 42196.40 16210.69 0.0001742918
4         Wiltz-Winseler 63389.7 42703.30 33576.80 0.0001359271
5     Our-Gemund/Vianden 63389.7 42834.50 12588.90 0.0003609042

数据

df <- data.frame(
  check.names = FALSE,
  `Att-Bissen P [mm]` = c(57937.8),
  `Att-Bissen PET [mm]` = c(39472.9),
  `Att-Bissen Q [mm]` = c(26501.2),
  `Rau. Merl P [mm]` = c(53443.6),
  `Rau. Merl PET [mm]` = c(40535.45),
  `Rau. Merl Q [mm]` = c(15489.5),
  `Syre Felsmuhle/Mertert P [mm]` = c(46020.3),
  `Syre Felsmuhle/Mertert PET [mm]` = c(42196.4),
  `Syre Felsmuhle/Mertert Q [mm]` = c(16210.69079),
  `Wiltz-Winseler P [mm]` = c(63389.7),
  `Wiltz-Winseler PET [mm]` = c(42703.3),
  `Wiltz-Winseler Q [mm]` = c(33576.8),
  `Our-Gemund/Vianden P [mm]` = c(63389.7),
  `Our-Gemund/Vianden PET [mm]` = c(42834.5),
  `Our-Gemund/Vianden Q [mm]` = c(12588.9))

相关问题