What is probabilistic programming? | 中文翻译

Probabilistic languages can free developers from the complexities of high-performance probabilistic inference.

概率语言可以使开发人员从高性能概率推理的复杂性中解放出来。

By Beau Cronin
April 16, 2013

Probabilistic programming languages are in the spotlight. This is due to the announcement of a new DARPA program to support their fundamental research. But what is probabilistic programming? What can we expect from this research? Will this effort pay off? How long will it take?

概率编程语言正在世人们的注视下。这是由于新一期的DARPA(美国国防高级研究计划局)项目公告,宣布支持概率编程语言的基础研究。但什么是概率编程?我们可以期待它什么?概率编程会取得成功吗?这将会花费多长时间?

A probabilistic programming language is a high-level language that makes it easy for a developer to define probability models and then “solve” these models automatically. These languages incorporate random events as primitives and their runtime environment handles inference. Now, it is a matter of programming that enables a clean separation between modeling and inference. This can vastly reduce the time and effort associated with implementing new models and understanding data. Just as high-level programming languages transformed developer productivity by abstracting away the details of the processor and memory architecture, probabilistic languages promise to free the developer from the complexities of high-performance probabilistic inference.

概率编程语言是一种高级语言,开发者使用它可以轻松地定义概率模型,然后程序自动地求解模型。概率编程语言包含随机事件作为它们的原语(primitives),运行时环境(runtime environment)处理推断。概率编程语言使得建模和推断之间有了清晰的分界线。这样就可以极大地减少实现新模型和理解数据的工作量。这就像高级语言一样,通过抽象出处理器和内存结构的细节来提升开发者的生产力。概率语言承诺把开发者从高性能复杂推理中解放出来。

What does it mean to perform inference automatically? Let’s compare a probabilistic program to a classical simulation such as a climate model. A simulation is a computer program that takes some initial conditions such as historical temperatures, estimates of energy input from the sun, and so on, as an input. Then it uses the programmer’s assumptions about the interactions between these variables that are captured in equations and code to produce forecasts about the climate in the future. Simulations are characterized by the fact that they only run in one direction: forward, from causes to hypothesized effects.

自动进行推理意味着什么?让我们来在概率语言和经典模拟之间这个比较,例如气候模型。模拟是一种计算机程序,它采取一些初始条件,如历史温度、从太阳输入能量的估计等等作为输入。然后,它使用程序员关于这些变量之间的相互作用的假设——这些变量被捕获在方程和代码中——以产生对未来气候的预测。模拟的特点是它们只朝一个方向前进:从原因到假设效果。

A probabilistic program turns this around. Given a universe of possible interactions between different elements of the climate system and a collection of observed data, we could automatically learn which interactions are most effective in explaining the observations — even if these interactions are quite complex. How does this work? In a nutshell, the probabilistic language’s runtime environment runs the program both forward and backward. It runs forward from causes to effects (data) and backward from the data to the causes. Clever implementations will trade off between these directions to efficiently home in on the most likely explanations for the observations.

概率程序正好相反。考虑到气候系统不同元素和观测数据集合之间可能存在的相互作用,我们可以自动了解哪种相互作用在解释观测中最有效——即使这些相互作用相当复杂。这是怎样工作的?简单来说,概率语言的运行时环境运行着既向前也向后的程序。概率语言巧妙的实现,将在这两个方向之间进行折衷,从而有效地对观测结果作出概率最大的解释。

Better climate models are but one potential application of probabilistic programming. Other models include: shorter and more humane clinical trials with fewer unneeded side effects and more accurate outcomes; machine perception that transcends the capabilities of the now-ubiquitous quadcopters and even Google’s self-driving cars; and “nervous systems” that fuse data from massively distributed and noisy sensor networks to better understand both the natural world and artificial environments.

上述更好的气候模型只是概率编程潜在的应用之一。其他模型包括:较短缺和更人性化,轻微副作用和结果更准确的临床试验;机器感知,超越了现在无处不在的无人机甚至谷歌的自动驾驶汽车;“神经系统”,从大规模分布式和嘈杂的传感器网络,更好地理解自然世界和人工环境中的数据融合。

Of course, any technology this general carries a lot of uncertainty around its development path and eventual impact. So much depends on complex interactions with other technology threads and, ultimately, social factors and regulation. With all possible humility, here is one sample from the predictive distribution, conditioned on what we know so far:

当然,任何技术,一般都会有许多发展道路和最终影响的不确定性。这依赖于和许多其他技术线复杂的交织,最终还是取决于社会因素和规则。在所有可能的中,这里有一个来自预测分布的样本,它局限于我们目前所知道的:

  • Phase I — Probabilistic programming will transform the practice of data science by unifying anecdotal reasoning with more reliable statistical approaches. If data science is first and foremost about telling stories, then probabilistic programming is in many ways the perfect tool. Practitioners will be able to leverage the persuasive power of narrative, while staying on firm quantitative ground.

    第一阶段 ——概率编程将会把数据推理与更可靠的统计方法结合起来,从而改变数据科学的方法。如果数据科学最重要的是讲故事,那么概率编程在许多方面都是完美的工具。从业者将既能够利用叙事的说服力,同时又能保持坚实的定量基础。

  • Phase II — Practitioners will really start to push the boundaries of modeling in fundmental ways in order to address many applications that don’t fit well into the current machine learning, text mining, or graph analysis paradigms. Many real-world datasets are a mixture of tabular, relational, textual, geospatial, audiovisual, and other data types. Probabilistic programs can weave all of these pieces together in natural ways. Current solutions that claim to integrate heterogeneous data typically do so by beating it all into a similar form, losing much of the underlying structure along the way.

    第二阶段 ——从业者将真正地开始在根本途径上推动建模的发展,为了能够说明目前机器学习在许多应用上的不适用:比如文本挖掘、图范式分析。许多真实世界的数据集是表格的、关系的、文本的、地理空间的、声音影响的以及其他数据格式。概率编程可以通过自然的方式编排所有的这些片段到一起。目前声称集成各种各样数据的解决方法,通常是这样做的:把数据全部“打成”相似的结构,这时会损失许多数据的潜在结构。

  • Phase III — Probabilistic programming will push well into territory that is universally recognized as artificial intelligence. As we’re often reminded, intelligent systems are very application-specific. Good chess algorithms are unlike Google’s self-driving car, which is totally different from IBM’s Watson. But probabilistic programs can be layered and modularized, with subsystems that specialize in particular problem domains, but embedded in a shared fabric that recognizes the current context and brings appropriate modeling subsystems to bear.

    第三阶段 ——概率编程将会很好地推动那些被普遍地被认为是人工智能的领域。就像我们常说的,智能系统是非常“应用程序特定”的。好的下棋算法与谷歌的自动驾驶不同,和 IBM 的沃森系统也大相径庭。但是概率编程可以被分成化和模块化,子系统专攻特定问题领域,但是嵌入在一个共享的结构中,它承担识别当前的上下文并带来适当的建模子系统。

What will it take to make all this real? The conceptual underpinnings of probabilistic programming languages are well in hand, thanks to trailblazing work by research groups at MIT, UMass Amherst, Microsoft Research, Harvard, and elsewhere. The core challenge at this point is developing performant inference engines that can efficiently solve the very wide range of models that these languages can express. We’ll also need new debugging, optimization, and visualization tools to help developers get the most from these systems.

如果全部实现将会带来什么?概率编程概念的基础学说在掌控之下,这要感谢那些开创性的工作,它们来自麻省理工、马萨诸塞大学安姆斯特分校、微软研究院、哈佛和其他组织机构。此时此刻最关键的挑战是,开发高性能的推理引擎,使其可以有效解决大范围的,这些语言可以表示的模型。我们还需要新的调试器、优化器和可视化工具,以帮助开发人员从这些系统中获得最大效能。

This story will take years to play out in full, but I expect we’ll see real progress over the next three to four years. I’m excited.

这个故事的全部实现将会花费数年,但是我期待我们将在下一个三到四年里看到真正的进步。我非常激动。

Want to learn more? BUGS is a probabilistic programming language originally developed by statisticians more than 20 years ago. While it has a number of limitations around expressivity and dataset size, it’s a great way to get your feet wet. Also check out Rob Zinkov’s tutorial post, which includes examples of several models. Church is the most ambitious probabilistic programming language. Don’t miss the tutorials, though it may not be the most accessible or practical option until the inference engine and toolset mature. For that reason, factorie might be a better bet in the short term, especially if you like Scala, or Microsoft Research’s infer.net with C# and F# bindings. The proceedings from a recent academic workshop provide a great snapshot of the field as of late 2012. Finally, this video from a long-defunct startup that I co-founded contains one stab at explaining many of the concepts underlying probabilistic programming referred to under the more general term probabilistic computing:

想要了解更多?BUGS是原本一门20年前由统计学家开发的概率编程语言。同时它有很多表达能力和数据集尺寸上的限制,它是很好的入门语言。也要通读一遍 Rob Zinkov 的教程,涵盖了几个模型的例子。Church 是最雄心勃勃的概率编程语言。不要错过教程,尽管在其推理机和工具集成熟之前,它可能不是最容易上手或实用的选项。因此,factorie 可能会在短期内是个更好的选择,尤其是如果你喜欢 Scala,或微软研究院的 infer.net,它是 C#和 F# 的捆绑。最近一次学术研讨会的记录为2012年底,为该领域提供了一个极好的快照。最后,这段视频来自我创建的一个废弃已久的初创公司。视频包含了一个例子,它解释了在更一般的概率计算术语下,所指的概率编程的许多概念:

What is probabilistic programming? | 中文翻译的更多相关文章

  1. Objective-C 编程艺术 (Zen and the Art of the Objective-C Craftsmanship 中文翻译)

    # 禅与 Objective-C 编程艺术 (Zen and the Art of the Objective-C Craftsmanship 中文翻译) - 原文 <https://githu ...

  2. [中文翻译] ASP.NET 5 简介(Introducing ASP.NET 5,原作ScottGu 2015/2/23)

    本文出处  [中文翻译] ASP.NET 5 简介(Introducing ASP.NET 5,原作ScottGu 2015/2/23) 这是我的文章备份 http://www.dotblogs.co ...

  3. JavaScript中函数式编程中文翻译

    JavaScript 中的函数式编程 原著由 Dan Mantyla 编写 近几年来,随着 Haskell.Scala.Clojure 等学院派原生支持函数式编程的偏门语言越来越受到关注,同时主流的 ...

  4. 《Entity Framework 6 Recipes》中文翻译系列 目录篇 -持续更新

    为了方便大家的阅读和学习,也是响应网友的建议,在这里为这个系列做一个目录.在目录开始这前,我先来回答之前遇到的几个问题. 1.为什么要学习EF? 这个问题很简单,项目需要.这不像学校,没人强迫你学习! ...

  5. Spark官方文档 - 中文翻译

    Spark官方文档 - 中文翻译 Spark版本:1.6.0 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述(Overview) 2 引入Spark(Linki ...

  6. PS网页设计教程——30个优秀的PS网页设计教程的中文翻译教程

    PS网页设计教程--30个优秀的PS网页设计教程的中文翻译教程   作为编码者,美工基础是偏弱的.我们可以参考一些成熟的网页PS教程,提高自身的设计能力.套用一句话,"熟读唐诗三百首,不会作 ...

  7. Spark SQL 官方文档-中文翻译

    Spark SQL 官方文档-中文翻译 Spark版本:Spark 1.5.2 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述(Overview) 2 Data ...

  8. Learning Spark: Lightning-Fast Big Data Analysis 中文翻译

    Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...

  9. 苹果App Store审核指南中文翻译(2014.9.1更新)

    转:http://www.cocoachina.com/appstore/20140901/9500.html CocoaChina对<苹果应用商店审核指南>中文翻译最近一次更新时间为20 ...

随机推荐

  1. Python字典和集合的内部实现

    1. 哈希表(Hash tables) 在Python中,字典是通过哈希表实现的.也就是说,字典是一个数组,而数组的索引是经过哈希函数处理后得到的.哈希函数的目的是使键均匀地分布在数组中.由于不同的键 ...

  2. django模型(增删改查等)

    Django提供的模型操作起来数据库非常方便 以自定义模型名字Demo为例: 获取所有数据:all() demo.objects.all() #返回queryset对象 #这个语句和Select * ...

  3. git add -A -u . 的区别

    git add -u:将文件的修改.文件的删除,添加到暂存区. git add .:将文件的修改,文件的新建,添加到暂存区. git add -A:将文件的修改,文件的删除,文件的新建,添加到暂存区.

  4. CSS笔记之Grid网格系统

    Grid布局已经不是新鲜的技术了,但一直都是使用了Flex布局,如今需要了边学习边做些常用的笔记.首先grid和flex一样都不支持IE10以下的浏览器 基本布局: 一般是所有子元素都横向排列或者都纵 ...

  5. c#之字符串,列表,接口,队列,栈,多态

    1.字符串的用法 using System; using System.Collections.Generic; using System.Linq; using System.Text; using ...

  6. Hello world!(内含自己编写的C语言二叉树同学录)

      修改:刷了一段时间的题,水平渐涨,发现同学录真的要做成市面可行的应用的话,应该按学号建立二叉平衡树,红黑树是一个可行的选择. 在同学的推荐下,来到博客园来找志同道合的人交流代码.3个月后参加蓝桥杯 ...

  7. 源码解读 Laravel PHP artisan config:cache

    来源 https://laravel-china.org/articles/5101/source-code-reading-laravel-php-artisan-configcache 源码在哪 ...

  8. RRDtool 安装和使用

    一.RRDtool 的功能及使用介绍 定义:RRDtool(Round Robin Database Tool)是一个用来处理定量数据的开源高性能数据库. 1.RRDtool 的特性 由于 RRDto ...

  9. How to export a model from SolidWorks to Google SketchUp

    How to export a model from SolidWorks to Google SketchUp While Google SketchUp is not a professional ...

  10. PGPDesktop在win7环境下的安装和使用

    PGPDesktop在win7环境下的安装和使用 PGP的简介 PGP(Pretty Good Privacy),是一个基于RSA公钥加密体系的邮件加密软件,它提供了非对称加密和数字签名,是目前非常流 ...