博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
用彼得斯堡简单地估计等级事件
阅读量:2518 次
发布时间:2019-05-11

本文共 5962 字,大约阅读时间需要 19 分钟。

So far the project has been primarily focused on static modeling of complex decisions with uncertainty.  The longer-term goal is for it to to be a fully-feature toolbox for supporting such decisions.  In that context, I’ve posted previously about:

到目前为止, 项目主要集中于对不确定性复杂决策的静态建模。 长期目标是成为支持此类决策的功能齐全的工具箱。 在这种情况下,我之前发布过以下内容:

And more.  In this post, we will begin to explore how we can use the petersburg framework in a forward looking sense, for estimation.

和更多。 在本文中,我们将开始探索如何以前瞻性的方式使用彼得斯堡框架进行估算。

Petersburg represents complex decisions as directed acyclic graphs, which allows the user to enforce a known structure to the problem.  In this post, we will use a medical diagnosis as the example.  Take the petersburg DAG (pDAG) below:

彼得斯堡将复杂的决定表示为有向无环图,这使用户可以强制解决问题的已知结构。 在本文中,我们将以医疗诊断为例。 采取以下彼得斯堡DAG(pDAG):

Upon entrance into our hypothetical ER, the patient can be either sick or injured.  We know of two sickness and two injuries.  So our hierarchy has just two layers (1 and 2).  We could encode this numerically as:

进入我们假设的急诊室后,患者可能生病或受伤。 我们知道有两种疾病和两种伤害。 因此,我们的层次结构只有两层(第一层和第二层)。 我们可以将其数字编码为:

{S|I} {S | I} {A|B|C|D} {A | B | C | D}
1 1个 1 1个
1 1个 2 2
2 2 3 3
2 2 4 4

In the absolute simplest case, we could have a person sit at the door of our ER and record what people had on their way out.  A new row indicating what the diagnosis was could be added in, and gradually we would build a frequentist dataset from which we could draw some conclusions.

在绝对最简单的情况下,我们可以让一个人坐在我们急诊室的门口,并记录人们出门时的状况。 可以添加一个新行,指示要添加的诊断内容,然后逐步建立一个常客数据集,从中得出一些结论。

In practice, that means that we want to stream data into an adjacency matrix of the graph. We can give each node an index, and represent an observed traversal by incrementing the value in entry (from_node, to_node).  So the above data would look like:

实际上,这意味着我们想将数据流传输到图的邻接矩阵中。 我们可以给每个节点一个索引,并通过增加条目(from_node,to_node)中的值来表示观察到的遍历。 因此以上数据看起来像:

0 0 S 小号 I 一世 A 一个 B C C D d
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
S 小号 2 2 0 0 0 0 0 0 0 0 0 0 0 0
I 一世 2 2 0 0 0 0 0 0 0 0 0 0 0 0
A 一个 0 0 1 1个 0 0 0 0 0 0 0 0 0 0
B 0 0 1 1个 0 0 0 0 0 0 0 0 0 0
C C 0 0 0 0 1 1个 0 0 0 0 0 0 0 0
D d 0 0 0 0 1 1个 0 0 0 0 0 0 0 0

As people come through the ER, we can just update this adjacency matrix, and use petersburg’s new from_adj_matrix() to generate a petersburg graph object whenever we want to run simulations.

当人们通过ER时,我们只要更新此邻接矩阵,就可以在需要运行模拟时使用petersburg的新from_adj_matrix()生成petersburg图对象。

Let’s say that as we continue on, 100 people walk in with sickness A.  Our updated matrix looks like:

假设继续进行下去,有100人患上了A病。我们更新后的矩阵如下:

0 0 S 小号 I 一世 A 一个 B C C D d
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
S 小号 102 102 0 0 0 0 0 0 0 0 0 0 0 0
I 一世 2 2 0 0 0 0 0 0 0 0 0 0 0 0
A 一个 0 0 101 101 0 0 0 0 0 0 0 0 0 0
B 0 0 1 1个 0 0 0 0 0 0 0 0 0 0
C C 0 0 0 0 1 1个 0 0 0 0 0 0 0 0
D d 0 0 0 0 1 1个 0 0 0 0 0 0 0 0

And a simulation of the pDAG will show that A is the most likely outcome.

pDAG的模拟将显示A是最可能的结果。

This is, of course, a very simplistic case.  The interesting part here is that the estimation problem gets implicitly broken down into many smaller conditional probability problems, so we may not be able to well predict P(A|S), but if we have a good model for P(S), then we can use the frequency estimates to augment that good prediction of P(S) with actionable estimates at the lower level.

当然,这是一个非常简单的情况。 这里有趣的部分是,估计问题被隐式分解为许多较小的条件概率问题,因此我们可能无法很好地预测P(A | S),但是如果我们有一个好的P(S)模型,则我们可以使用频率估算值,以较低级别的可行估算值来增强对P(S)的良好预测。

This first step has been implemented in the python library for petersburg as a scikit-learn style estimator, which supports partial fit for streaming in data as incremental updates.  Using randomly generated data, we have an example here:

第一步已在彼得斯堡的python库中作为scikit-learn样式估算器实现,它支持部分适合流式传输数据作为增量更新。 使用随机生成的数据,我们在这里有一个示例:

import numpy as npimport randomimport jsonfrom petersburg import FrequencyEstimatorzero_count = 11one_count = 32two_count = 55three_count = 50four_count = 21total = zero_count + one_count + two_count + three_count + four_count# generate some datay = np.array([[    random.choice([0, 1, 2]),    random.choice([0, 1, 1, 1, 2]),    random.choice([0, 1, 2, 2, 2, 2]),    random.choice(        [0 for _ in range(zero_count)] +        [1 for _ in range(one_count)] +        [2 for _ in range(two_count)] +        [3 for _ in range(three_count)] +        [4 for _ in range(four_count)]    )] for _ in range(10000)])# train a frequency estimatorclf = FrequencyEstimator(verbose=True)clf.fit(None, y)freq_matrix = clf._frequency_matrixy_hat = clf.predict(np.zeros((10000, 10)))# print out what we've learned from itprint('nCategory Labels')labels = clf._cateogry_labelsprint(labels)print('nUnique Predicted Outcomes')outcomes = sorted([str(labels[int(x)]) for x in set(y_hat.reshape(-1, ).tolist())])print(outcomes)print('nHistogram')histogram = dict(zip(outcomes, [float(x) for x in np.histogram(y_hat, bins=[8.5, 9.5, 10.5, 11.5, 12.5, 13.5])[0]]))print(json.dumps(histogram, sort_keys=True, indent=4))

Which will output:

将输出:

Category Labels{0: (0, 0), 1: (0, 1), 2: (0, 2), 3: (1, 0), 4: (1, 1), 5: (1, 2), 6: (2, 0), 7: (2, 1), 8: (2, 2), 9: (3, 0), 10: (3, 1), 11: (3, 2), 12: (3, 3), 13: (3, 4)}Unique Predicted Outcomes['(3, 0)', '(3, 1)', '(3, 2)', '(3, 3)', '(3, 4)']Histogram{    "(3, 0)": 706.0,    "(3, 1)": 1898.0,    "(3, 2)": 3211.0,    "(3, 3)": 3019.0,    "(3, 4)": 1166.0}

As development on the library continues we are targeting:

随着库开发的继续,我们的目标是:

  1. Petersburg graph object as an accessible attribute of the estimator
  2. Selective inclusion of weak classifiers for estimation of conditional probabilities (where CV scores support their use, otherwise fall back to frequency)
  3. Different outputs (currently output is a single simulated outcome)
  1. 彼得斯堡图对象作为估计量的可访问属性
  2. 选择性地包含弱分类器,以评估条件概率(CV分数支持其使用,否则返回频率)
  3. 不同的输出(当前输出是单个模拟结果)

All of this is on , and development is slated to continue onward, so try out the library or contact me if you’d like to help develop it further.

所有这些都在 ,并且开发将继续进行,所以请尝试使用该库,或者如果您想进一步开发它,请与我联系。

翻译自:

转载地址:http://igqwd.baihongyu.com/

你可能感兴趣的文章
sqlserver database常用命令
查看>>
rsync远程同步的基本配置与使用
查看>>
第二天作业
查看>>
访问属性和访问实例变量的区别
查看>>
Spring MVC 异常处理 - SimpleMappingExceptionResolver
查看>>
props 父组件给子组件传递参数
查看>>
【loj6038】「雅礼集训 2017 Day5」远行 树的直径+并查集+LCT
查看>>
十二种获取Spring的上下文环境ApplicationContext的方法
查看>>
UVA 11346 Probability 概率 (连续概率)
查看>>
linux uniq 命令
查看>>
Openssl rand命令
查看>>
HDU2825 Wireless Password 【AC自动机】【状压DP】
查看>>
BZOJ1015: [JSOI2008]星球大战starwar【并查集】【傻逼题】
查看>>
HUT-XXXX Strange display 容斥定理,线性规划
查看>>
mac修改用户名
查看>>
一道关于员工与部门查询的SQL笔试题
查看>>
Canvas基础
查看>>
[Hive - LanguageManual] Alter Table/Partition/Column
查看>>
可持久化数组
查看>>
去除IDEA报黄色/灰色的重复代码的下划波浪线
查看>>