对整个数组和两个向量使用scipy.stats.ttest_ind时的结果不同

我使用多个t检验在两组（男性和女性）之间进行体素比较。为此，我使用scipy.stats.ttest_ind。我有541443个体素作为因变量，我想对其进行独立t检验。一切似乎都很好，但是当我检查结果并对随机挑选的体素进行单t检验时，我得到了不同的结果。
下面是我的代码：

statistics,p_values = ttest_ind(male_data,female_data)

字符串
生成：t值的数组（[-5.23764997e-16，-1.59544316e-15，7.88216339e-16，...，1.11783465e-15，-2.22874323e-16，-9.35188323e-16]）

single_stat,single_p_value = ttest_ind(female_data[:,0],male_data[:,0])

型
产生：-4.3805762173832176e-16作为t值
我希望第一个t检验的输出是相等的（所以要么~ -5.237要么~ -4.380）。有人知道这里可能出了什么问题吗？
我还尝试了第三种方法，即使用for循环进行多个t检验，似乎有一个一致的效果，即for循环方法的输出总是与单t检验方法的输出相同（这是有道理的，因为它基本上是做多个单ttest_ind，然后将每个t检验的输出附加到列表中）。然而，对于整个数组，这两个结果都不同于ttest_ind的输出。我还按照注解中的建议将数据切成不同的大小，我发现了一个悖论效应，即三种方法的结果（或者两个，因为for循环和单个t检验似乎总是相同的）变得越来越相似，但是返回的t值和p值变得难以置信的小或大。在最后一种情况下（n=5），当我想为整个数组单击ttest_ind的输出时，Spyder 崩溃。

n_rows = 150
Output from for-loop method (first t-value in the list):
t-value: -0.050583527798906465
p-value: 0.9596912767683707

Output from the t-test performed on only the first column:
t-value: -0.050583527798906465
p-value: 0.9596912767683707

Output from the t-test performed on the whole array (first t-value):
t-value: -0.050583527798907256
p-value: 0.9596912767683701

---------------------------------------
n_rows = 75
Output from for-loop method (first t-value in the list):
t-value: 0.9760289069224989
p-value: 0.33064277748038773

Output from the t-test performed on only the first column:
t-value: 0.9760289069224989
p-value: 0.33064277748038773

Output from the t-test performed on the whole array (first t-value):
t-value: 0.9760289069224984
p-value: 0.33064277748038795

---------------------------------------
n_rows = 5
Output from for-loop method (first t-value in the list):
t-value: 6111430044112607.0
p-value: 5.755396703077005e-124

Output from the t-test performed on only the first column:
t-value: 6111430044112607.0
p-value: 5.755396703077005e-124

Output from the t-test performed on the whole array (first t-value):
t-value: 6111430044112607.0
p-value: 5.755396703077005e-124

---------------------------------------

型

浮点运算的结果可能取决于执行操作的顺序，并且该顺序可能会根据数组在内存中的布局而改变。影响的大小取决于样本大小。

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
rng = np.random.default_rng(7435282483469345)

func = np.mean

# For several different sample sizes
ns = np.logspace(1, 7, 30).astype(int)

err = []
for n in ns:
  # Generate n x 2 samples
  x = rng.random(size=(n, 2))

  # Compute the mean along axis 0
  y = func(x, axis=0)

  # Compute the mean of each column separately
  z = [func(x[:, i]) for i in range(x.shape[-1])]

  # Record the difference in the 0th result
  err.append(abs(y[0] - z[0]))

plt.loglog(ns, err)
plt.xlabel('Sample size')
plt.ylabel(f'Absolute difference')
plt.title(f'The effect of sample size on np.{func.__name__} discrepancy')

字符串

的数据
如果保持默认的行优先顺序并使用沿着axis=-1，或者将数组转换为列优先顺序（np.asfortranarray）并保持axis=0，则效果将消失。

对整个数组和两个向量使用scipy.stats.ttest_ind时的结果不同

1条答案

相关问题

热门标签

最新问答