scipy 如何使用scikit-learn实现Poisson回归进行计数数据预测

我正在寻找一种方法来耦合Linear Regression与Poisson distribution .经过简单的线性回归，它的结果是一个数值，我想在泊松分布中使用，因为我需要 * 概率 *，我需要 * 概率离散的事件数连续和独立地在给定的时间间隔 *.

**我的案例是：**有一个自变量Team_A__goal_scored = [1, 2, 2, 3, 1]和一个因变量Team_B__goal_conceded = [2, 1, 3, 0, 1]，我想计算A队可以得分2 goals精确到B队的概率（因此将A队的进攻和B队的防守联系起来）。运行回归并设置2个目标作为test = myfunc(2)的输出，我得到1.28：

import numpy
from scipy import stats

Team_B__goal_conceded = [1, 2, 2, 3, 1] #Var indipendent (x)
Team_A__goal_scored = [2, 1, 3, 0, 1] ##Var dipendent (y)

slope, intercept, r, p, std_err = stats.linregress(Team_B__goal_conceded, Team_A__goal_scored)

def myfunc(Team_B__goal_conceded):
   return slope * Team_B__goal_conceded + intercept

regression_scored_2 = myfunc(2) #result: 1.28

字符串

**问题：**具体的问题是，我需要一个概率，而不是通过执行一个简单的线性回归，我得到例如1.28，这不是一个概率。

概括一下，我知道Team_B__goal_conceded和Team_A__goal_scored的初始起点，我知道我想要实现什么（A队的得分正好是B队的2个进球的概率），但我不知道我如何才能实现这一目标。我认为“也许”的解决方案可能是Poisson Regression（也称为对数线性回归）可能对我有用，但我使用它有困难。我不确定它是否是我正在寻找的解决方案，因此，我正在寻找一个人谁知道如何使用它，并告诉我，如果它是有用的，我想实现，谁告诉我如何使用它.如果不是这样的话，那么肯定有其他方法来使用的结果，线性回归的泊松分布.

**期望值：**我想要得到的最终输出是在运行回归之后，足球队得分0个进球，1个进球，2个进球，3个进球等的个体概率。准确地说，我只需要2个进球，所以我想要这个最终输出：

The probability that Team A scores exactly 2 goals against Team B is: x %

型

**我的代码：**我尝试使用sklearn.linear_model.PoissonRegressor，但它不能正常工作。正如我上面所说，我甚至不知道这是否是最适合我的情况的解决方案，或者是否有更好，更合适的方法。如果不是这样，那么肯定有其他方法可以在泊松分布中使用线性回归的结果。

from sklearn import linear_model
clf = linear_model.PoissonRegressor()
Team_A__goal_scored = [1, 2, 2, 3, 1] #Var indipendent (x)
Team_B__goal_conceded = [2, 1, 3, 0, 1] #Var dipendent (y)

a=clf.fit(Team_A__goal_scored, Team_B__goal_conceded)
print("Fit a Generalized Linear Model: ", a, "\n")

b = clf.score(Team_A__goal_scored, Team_B__goal_conceded)
print("Compute D^2, the percentage of deviance explained: ", b, "\n")

c = clf.coef_
print("Estimated coefficients for the linear predictor: ", c, "\n")

d= clf.intercept_
print("Intercept (a.k.a. bias) added to linear predictor: ", d, "\n")

e=clf.predict([[1, 1], [3, 4]])
print("Predict using GLM with feature matrix X: ", e, "\n")

型
我得到这个错误：

ValueError: Expected 2D array, got 1D array instead:
array=[1. 2. 2. 3. 1.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

型
因为列表应该像（如scikit-learn link所示）例如：X = [[1, 2], [2, 3], [3, 4], [4, 3]] and y = [12, 17, 22, 21]，但我不明白我如何以这种方式使用我的列表。
我也不知道如何设置结果必须正好是2目标（A队对B队进2球的概率）
谢谢你

你使用PoissonRegressor的方法适合你的问题，但是它需要一个2D数组来表示特征（X）。你可以使用numpy.reshape来重塑你的1D数组：

import numpy as np
from sklearn.linear_model import PoissonRegressor

Team_A__goal_scored = np.array([1, 2, 2, 3, 1]).reshape(-1, 1)
Team_B__goal_conceded = np.array([2, 1, 3, 0, 1])

字符串
然后你可以拟合模型：

clf = PoissonRegressor()
clf.fit(Team_A__goal_scored, Team_B__goal_conceded)

型
然后你可以进行预测（例如Team_A__goal_scored = 2）：

lambda_pred = clf.predict(np.array([[2]]))[0]

型
然后，你可以使用泊松概率质量函数来找到恰好打进2球的概率：

from scipy.stats import poisson

probability_two_goals = poisson.pmf(2, lambda_pred)

型
这给出了A队对B队恰好进2球的概率。

scipy 如何使用scikit-learn实现Poisson回归进行计数数据预测

1条答案

相关问题

热门标签

最新问答