## 2014年12月15日，星期一

### NIPS 2014

With a new venue 和 a 深 在titude, NIPS was a blast this year, kudos to the organizers.

It is fitting that the conference was in Montreal, underscoring that the giants of 深 learning have transitioned from exiles to rockstars. As I learned the hard way, you have to show up to the previous talk if you want to get into the room when one of these guys is scheduled 在 a workshop. Here's an actionable observation: placing all the 深 learning posters next to each other in the poster session is a bad idea, as it creates a ridiculous traffic jam. Next year they should be placed 在 the corners of the poster session, just like staples in a grocery store, to facilitate the exposure of other material.

### 其他趋势

Facebook实验室 我很高兴看到Facebook Labs 解决雄心勃勃的问题 进行文本理解，图像分析和知识库建设。他们在想大...极端的收入不平等可能不利于西方民主国家的长期稳定，但这在AI研究中掀起了黄金时代。

## 2014年11月16日星期日

### 大型CCA

\ begin {aligned}
\ mathop {\ mathrm {maximize}} _ {\ mathbf {X} _a，\ mathbf {X} _b}＆\ mathop {\ mathrm {Tr}} \ left（\ mathbf {X} _a ^ \ top \ mathbf { A} ^ \ top \ mathbf {B} \ mathbf {X} _b \ right），\ nonumber \\
\ mathrm {subject \ to} \;＆\ mathbf {X} _a ^ \ top \ mathbf {A} ^ \ top \ mathbf {A} \ mathbf {X} _a = n \ mathbf {I}，\\
\;＆\ mathbf {X} _b ^ \ top \ mathbf {B} ^ \ top \ mathbf {B} \ mathbf {X} _b = n \ mathbf {I}。
\ end {aligned}
\]

\ mathbf {A} ^ \ top \ mathbf {B}（\ mathbf {B} ^ \ top \ mathbf {B}）^ {-1} \ mathbf {B} ^ \ top \ mathbf {A} \ mathbf {X } _a = \ mathbf {A} ^ \ top \ mathbf {A} \ mathbf {X} _a \ Lambda_a，
\] 类似的问题来找到$\ mathbf {X} _b$。我们有 随机平方根自由算法 对于广义特征值问题，那么问题就解决了吧？是的，有一些重要警告。首先，频谱是不利的，因此随机测距仪将需要多次通过或大量的过采样。其次，范围查找涉及计算$（\ mathbf {B} ^ \ top \ mathbf {B}）^ {-1}$对$\ mathbf {B} ^ \ top \ mathbf {A} \ Omega$的作用，并反之亦然，这是最小二乘问题（实际上 不需要非常精确地计算）。第三，这对广义特征值问题共享显着状态，因此对操作进行交织是有益的。通过这些观察，我们最终得到了一种非常类似于经典的CCA计算算法的东西，该算法称为 霍斯特迭代，但具有Halko风格的“过采样，提前停止，然后在较小的子空间中使用精确的解决方案进行完善。”我们在此方法上很幸运，该方法在github上作为 阿尔卡.

CCA具有多种用途：一种应用是 创建单词嵌入，在精神上类似于 word2vec。作为演示，我们采用了美国英语Google n-gram语料库，并使用CCA创建了嵌入。在商用台式机上的Matlab大约需要一个小时才能生成嵌入，这比下载数据要花费许多小时要快。可以在以下位置找到要复制的代码 的github （警告：您需要大约40 GB的内存，50 GB的磁盘空间和带宽才能下载n克）。您可以验证嵌入是否满足“ultimate test” of word embeddings: 国王-皇后$\大约$男人-女人.

## 2014年10月16日，星期四

### 成本与收益

tl; dr：如果您热爱研究，并且您是一名专业研究人员，则您有道义上的义务来确保您的恩人既可以从研究中获得一些收益，也可以意识到收益。

Recent events have me thinking again about the viability of privately funded basic research. In my opinion, the history of Xerox PARC is 深ly troubling. What?! At it's peak the output of Xerox PARC was breathtaking, 和 许多 advances in computation that became widespread during my youth 可以追溯到Xerox PARC。不幸的是，施乐没有从其研发部门的一些世界上变化最大的创新中受益。现在，一代人的MBA被告知 思科模式，而不是拥有自己的研究部门，而是等待其他公司进行创新，然后再购买它们。
...它继续收购小型创新公司，而不是从头开始开发新技术...

Quite simply, it is irrational to expect any institution to fund an activity unless that organization can realize sufficient benefit to cover the costs. That calculation is ultimately made by people, 和 if those people only hear stories about how basic research generates benefits to other firms (or even, competitors!), appetite will diminish. In other words, benefits must not only be real, they must be recognizable to decision makers. This is, of course, a 深 challenge, because the benefits of research are often not recognizable to the researchers who perform it. Researchers are compelled to research by their nature, like those who feel the need to scale Mount Everest. It so happens that a byproduct of their research obsession is the advancement of humanity.

## 2014年9月24日，星期三

### 子线调试

I have a post on 亚线性调试 on 微软的机器学习博客.

## 2014年8月26日，星期二

### 更多深度学习的困惑

Yoshua Bengio, one of the luminaries of the 深 learning community, gave multiple talks about 深 learning 在 集成电路 2014 this year. I like Bengio's focus on the statistical aspects of 深 learning. Here are some thoughts I had in response to his presentations.

#### 通过深度进行正则化

Bengio的话题之一是深度是一种有效的调节器。该论点是这样的：通过组合多层（有限容量）非线性，相对于相似的前导灵活性的浅层模型，整体体系结构能够探索有趣的高柔性模型子集。在这里有趣的是，这些模型具有足够的灵活性来对目标概念进行建模，但是受到足够的约束，仅需适度的数据需求即可学习。这实际上是关于我们正在尝试建模的目标概念的声明（例如，在人工智能任务中）。另一种说法是（释义）“寻找比平滑度假设更具约束力的正则化器，但仍广泛适用于感兴趣的任务。”

As a purely mathematical statement it is definitely true that composing nonlinearities through bottlenecks leads to a subset of larger model space. 对于example, composing order $d$ polynomial units in a 深 architecture with $m$ levels results in something whose leading order terms are monomials of order $m d$; but 许多 of the terms in a full $m d$ polynomial expansion (aka “shallow architecture”） 缺失。因此，前导顺序具有灵活性，但模型空间有限。但是，这有关系吗？

The counterargument is that, to date, the major performance gains in 深 learning happen when the composition by depth is combined with a decomposition of the feature space (e.g., spatial or temporal). In speech the Gaussian kernel (in the highly scalable form of random fourier features) is able to approach the performance of 深 learning on TIMIT, if the 深 net cannot exploit temporal structure, i.e., RFF is competitive with non-convolutional DNNs on this task, but is surpassed by convolutional DNNs. (Of course, from a computational standpoint, a 深 network starts to look downright parsimonious compared to hundreds of thousands of random fourier features, but we're talking statistics here.)

#### 远距离关系的危险

So for general problems it's not clear that regularization via depth'' is obviously better than general smoothness regularizers (although I suspect it is). However for problems in computer vision it is intuitive that 深 composition of representations is beneficial. This is because the spatial domain comes with a natural concept of neighborhoods which can be used to beneficially limit model complexity.

These HPC岛屿 do not need to stage all the data they are working on before they start doing useful work, e.g., SGD algorithms can start as soon as they receive their first mini-batch. 咖啡 和 a single K20 can train on Imagenet 在 7ms per image amortized, which works out to roughly 40 megabytes per second of image data that needs to be streamed to the training node. That's not difficult to arrange if the HPC island is collocated with the HDFS cluster, 和 difficult otherwise, so the prediction is near the HDFS cluster is where the HPC岛屿 will be. Of course the HPC island should have a smart caching policy so that not everything has to be pulled from HDFS storage all the time. A 智能缓存策略将是任务感知的，例如，利用 主动学习 最大限度地提高HDFS和HPC岛之间的信息传输。