如何计算从多列派生的混淆矩阵？

voase2hg 于 2021-07-13 发布在 Java

关注(0)|答案(2)|浏览(291)

我有两个Dataframe，我想计算一个混淆矩阵。
下面是“dfèu responses”的结构示例：

Question 1      |  Red  |  Blue  | Yellow | None of the Above |   
Participant ID  |       |        |        |                   |
1               |   1   |    1   |    1   |       0           |
2               |   0   |    0   |    0   |       1           |
3               |   1   |    0   |    1   |       0           |

下面是“df\u actual”的结构示例：

Question 1      |  Red  |  Blue  | Yellow | None of the Above |   
                |       |        |        |                   |
1               |   1   |    0   |    1   |       0           |
2               |   1   |    0   |    1   |       0           |
3               |   1   |    0   |    1   |       0           |

理想情况下，我还想创建一个新的数据框，其中包含每个参与者的真阳性和假阴性分数，如下所示：

Question 1      | True Positive | False Negative | 
Participant ID  |               |                | 
1               |     2         |       0        |
2               |     0         |       2        |
3               |     2         |       0        |

我试过（@john mommers）：

for x in range(len(df_responses)):
    tn, fp, fn, tp = confusion_matrix(df_responses, df_actual).ravel()
    print (f'Nr:{i}  true neg:{tn}  false pos:{fp}   false neg:{fn}   true pos:{tp}')

然而，我得到了一个

ValueError: multilabel-indicator is not supported.

有没有其他方法可以计算tp和fn？
添加（数据为文本）：

df_responses

{'Red': {1: 1, 2: 0, 3: 1},
'Blue': {1: 1, 2: 0, 3: 0},
'Yellow': {1: 1, 2: 0, 3: 1},
'None of the above': {1: 0, 2: 1, 3: 0}}

df_actual

{'Red': {1: 1, 2: 1, 3: 1},
'Blue': {1: 0, 2: 0, 3: 0},
'Yellow': {1: 1, 2: 1, 3: 1},
'None of the above': {1: 0, 2: 0, 3: 0}}

python scikit-learn confusion-matrix

来源：https://stackoverflow.com/questions/67289326/how-to-compute-a-confusion-matrix-derived-from-multiple-columns

2条答案

按热度按时间

yquaqz181#

您可以通过以下方式创建所需的df：

df = pd.DataFrame()   
df["tp"] = np.sum((df_actual == 1) & (df_responses == 1), axis=1)
df["fp"] = np.sum((df_actual == 0) & (df_responses == 1), axis=1)

请注意，这并不是一个真正的混淆矩阵—在这种情况下，行是预测的，列是标记值的（反之亦然），值是计数。对于多值标签/响应，这可能没有很好的定义，这就是sklearn出现错误的原因。

赞(0）回复(0）举报 2021-07-13

xkftehaa2#

你不能使用 sklearn 功能 confusion_matrix 这样做是因为它只支持一维标签，在您的例子中，您有四个标签。所以你才会出错 multilabel-indicator is not supported .
所以必须将Dataframe的每一行传递给这个函数。

for x in range(len(df_responses)):
   y_responses = df_responses.iloc[x].to_numpy()
   y_actual = df_actual.iloc[x].to_numpy()
   tn, fp, fn, tp = confusion_matrix(y_responses, y_actual).ravel()
   print (f'Nr:{i} true neg:{tn} false pos:{fp} false neg:{fn} true pos:{tp}')

赞(0）回复(0）举报 2021-07-13

我来回答

如何计算从多列派生的混淆矩阵？

2条答案

相关问题

热门标签

最新问答