R语言中的Marascuilo过程

zzoitvuj  于 8个月前  发布在  R语言
关注(0)|答案(3)|浏览(122)

我在进行马拉斯奎里奥程序,以比较不同比例之间的差异。我使用以下代码(复制并改编自this tutorial

## Set the proportions of interest.
p = c(0.3481, 0.1730, 0.4788)
N = length(p)
value = critical.range = c()

## Compute critical values.
for (i in 1:(N-1))
{ for (j in (i+1):N)
{
  value = c(value,(abs(p[i]-p[j])))
  critical.range = c(critical.range,
                     sqrt(qchisq(.95,3))*sqrt(p[i]*(1-p[i])/12000 + p[j]*(1-p[j])/12000))
}
}
round(cbind(value,critical.range),3)

我需要的输出也将打印类别的标签(例如:哪些类别正被准确地比较)。
因此,如果类别被列在一个单独的向量中,例如categories <- c("cat1", "cat2", cat"3),则比较结果为cat1-cat2cat1-cat3cat2-cat3
如何将这些标签附加到输出中?

value critical.range
[1,] 0.175          0.016
[2,] 0.131          0.018
[3,] 0.306          0.016
kfgdxczn

kfgdxczn1#

试试这个:

## Set the proportions of interest.
p = c(0.3481, 0.1730, 0.4788)
N = length(p)
value = critical.range = tag = c()
categories <- c("cat1", "cat2", "cat3")

## Compute critical values.
for (i in 1:(N-1)){ 
    for (j in (i+1):N){

    value <- c(value,(abs(p[i]-p[j])))
    critical.range = c(critical.range,
                       sqrt(qchisq(.95,N-1))*sqrt(p[i]*(1-p[i])/12000 + p[j]*(1-p[j])/12000))
    tag = c(tag, paste(categories[i], categories[j], sep = "-"))

    }
}
df <- as.data.frame(cbind(value,critical.range, tag), stringsAsFactors = F)
df$value <- round(as.numeric(df$value),3)
df$critical.range <- round(as.numeric(df$critical.range),3)

输出量:

value critical.range       tag
1 0.175          0.016 cat1-cat2
2 0.131          0.018 cat1-cat3
3 0.306          0.016 cat2-cat3
b0zn9rqh

b0zn9rqh2#

在计算临界范围(12000)时要小心分母......这是基于每个类别的样本量-如果每个类别没有12000个观察值,那么需要进行调整-如果你的观察值远远少于12000个,你的临界值可能远远低于该函数给你的值(因此,你应该有更少的符号。差异)。

cczfrluj

cczfrluj3#

下面是R代码的Python翻译,有4个类别和每个类别不同的样本大小(n)。代码由Bing AI助手从R翻译成Python,并由我进行了轻微的更正

import math
import pandas as pd
from scipy.stats import chi2

p = [0.681818182, 0.816513761, 0.65625, 0.518518519]
n = [22, 109, 32, 27]
N = len(p)
value = []
critical_range = []
tag = []
categories = ["cat1", "cat2", "cat3", "cat4"]
critical_value = chi2.ppf(0.95, 4)

for i in range(N-1):
    for j in range(i+1, N):
        value.append(abs(p[i] - p[j]))
        critical_range.append(math.sqrt(critical_value) * math.sqrt((p[i] * 
                (1 - p[i]) / n[i]) + (p[j] * (1 - p[j]) / n[j])))
        tag.append(categories[i] + "-" + categories[j])

df = pd.DataFrame({"value": value, "critical.range": critical_range, "tag": tag})
df["value"] = df["value"].round(3)
df["critical.range"] = df["critical.range"].round(3)
df["significance"] = df.apply(lambda row: "yes" if row["value"] > 
row["critical.range"] else "no", axis=1)
print(df)

pythonscipy.statsbingai

相关问题