为什么这个Java方法泄漏 - 为什么内联它修复泄漏？

Question

我写了一个有点懒惰的（int）序列类GarbageTest.java，作为一个实验，看看我是否可以用Clojure中的方式处理Java中非常长的懒惰序列。

给定一个naturals()方法返回惰性，无限，自然数序列;一个drop(n,sequence)方法，下降n的第一个sequence元素，并返回其余的sequence;和一个简单回归的nth(n,sequence)方法：drop(n, lazySeq).head()，我写了两个测试：

static int N = (int)1e6;

// succeeds @ N = (int)1e8 with java -Xmx10m
@Test
public void dropTest() {
    assertThat( drop(N, naturals()).head(), is(N+1));
}

// fails with OutOfMemoryError @ N = (int)1e6 with java -Xmx10m
@Test
public void nthTest() {
    assertThat( nth(N, naturals()), is(N+1));
}

请注意，dropTest()的主体是通过复制nthTest()的主体然后在nth(N, naturals())调用上调用IntelliJ的“内联”重构而生成的。所以在我看来，dropTest()的行为应该与nthTest()的行为相同。

但它不完全相同！ dropTest()运行完成，N达1e8，而nthTest()失败，OutOfMemoryError为N，小到1e6。

我避免内心阶级。我已经尝试了我的代码变体ClearingArgsGarbageTest.java，它在调用其他方法之前使方法参数为空。我已经应用了YourKit分析器。我查看了字节码。我只是找不到导致nthTest()失败的泄漏。

哪里是“泄漏”？为什么nthTest()有泄漏而dropTest()没有？

这是GarbageTest.java的其余代码，如果你不想点击进入Github项目：

/**
 * a not-perfectly-lazy lazy sequence of ints. see LazierGarbageTest for a lazier one
 */
static class LazyishSeq {
    final int head;

    volatile Supplier<LazyishSeq> tailThunk;
    LazyishSeq tailValue;

    LazyishSeq(final int head, final Supplier<LazyishSeq> tailThunk) {
        this.head = head;
        this.tailThunk = tailThunk;
        tailValue = null;
    }

    int head() {
        return head;
    }

    LazyishSeq tail() {
        if (null != tailThunk)
            synchronized(this) {
                if (null != tailThunk) {
                    tailValue = tailThunk.get();
                    tailThunk = null;
                }
            }
        return tailValue;
    }
}

static class Incrementing implements Supplier<LazyishSeq> {
    final int seed;
    private Incrementing(final int seed) { this.seed = seed;}

    public static LazyishSeq createSequence(final int n) {
        return new LazyishSeq( n, new Incrementing(n+1));
    }

    @Override
    public LazyishSeq get() {
        return createSequence(seed);
    }
}

static LazyishSeq naturals() {
    return Incrementing.createSequence(1);
}

static LazyishSeq drop(
        final int n,
        final LazyishSeq lazySeqArg) {
    LazyishSeq lazySeq = lazySeqArg;
    for( int i = n; i > 0 && null != lazySeq; i -= 1) {
        lazySeq = lazySeq.tail();
    }
    return lazySeq;
}

static int nth(final int n, final LazyishSeq lazySeq) {
    return drop(n, lazySeq).head();
}

Answer 1

在你的方法

static int nth(final int n, final LazyishSeq lazySeq) {
    return drop(n, lazySeq).head();
}

参数变量lazySeq在整个drop操作期间保存对序列的第一个元素的引用。这可以防止整个序列被垃圾收集。

与...对比

public void dropTest() {
    assertThat( drop(N, naturals()).head(), is(N+1));
}

序列的第一个元素由naturals()返回并直接传递给drop的调用，因此从操作数堆栈中删除，并且在执行drop期间不存在。

您尝试将参数变量设置为null，即

static int nth(final int n, /*final*/ LazyishSeq lazySeqArg) {
    final LazyishSeq lazySeqLocal = lazySeqArg;
    lazySeqArg = null;
    return drop(n,lazySeqLocal).head();
}

没有帮助，因为现在，lazySeqArg变量是null，但lazySeqLocal持有对第一个元素的引用。

局部变量通常不会阻止垃圾收集，允许收集其他未使用的对象，但这并不意味着特定的实现能够执行此操作。

对于HotSpot JVM，只有优化的代码才能清除这些未使用的引用。但在这里，nth不是一个热点，因为重要的事情发生在drop方法。

这就是为什么在drop方法中没有出现相同问题的原因，尽管它也包含对其参数变量中第一个元素的引用。 drop方法包含执行实际工作的循环，因此很可能通过JVM进行优化，这可能导致它消除未使用的变量，从而允许收集已处理的序列部分。

有许多因素可能会影响JVM的优化。除了代码的不同形状之外，似乎在未优化阶段期间的快速存储器分配也可能减少优化器的改进。事实上，当我与-Xcompile一起运行时，完全禁止解释执行，两个变体都成功运行，甚至int N = (int)1e9也不再是问题。当然，强制编译会增加启动时间。

我不得不承认，我不明白为什么混合模式表现得更糟，我会进一步调查。但一般来说，您必须意识到垃圾收集器的效率取决于实现，因此在一个环境中收集的对象可能会留在另一个环境中。

Answer 2

Clojure实施了一种策略来处理这种被称为“本地清理”的场景。在编译器中支持它，使其在纯Clojure代码中需要时自动启动（除非在编译时禁用 - 这有时对调试很有用）。然而，Clojure还在其Java运行时中的各个地方清除了本地人，并且它可以在Java库甚至应用程序代码中使用它的方式，尽管它无疑会有点麻烦。

在我进入Clojure所做的之前，这里是这个例子中发生的事情的简短摘要：

nth(int, LazyishSeq)是根据drop(int, LazyishSeq)和LazyishSeq.head()实施的。
nth将其论据传递给drop并且没有进一步使用它们。
drop可以很容易地实现，以避免保持传入序列的头部。

在这里，nth仍然坚持其序列论证的头部。运行时可能会丢弃该引用，但不保证它会。

Clojure处理这个问题的方法是在控制权移交给drop之前明确清除对序列的引用。这是使用一个相当优雅的技巧（link to the below snippet on GitHub as of Clojure 1.9.0）完成的：

//  clojure/src/jvm/clojure/lang/Util.java

/**
 *   Copyright (c) Rich Hickey. All rights reserved.
 *   The use and distribution terms for this software are covered by the
 *   Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
 *   which can be found in the file epl-v10.html at the root of this distribution.
 *   By using this software in any fashion, you are agreeing to be bound by
 *   the terms of this license.
 *   You must not remove this notice, or any other, from this software.
 **/

// … beginning of the file omitted …

// the next line is the 190th in the file as of Clojure 1.9.0
static public Object ret1(Object ret, Object nil){
        return ret;
}

static public ISeq ret1(ISeq ret, Object nil){
        return ret;
}

// …

鉴于上述情况，drop内部对nth的调用可以更改为

drop(n, ret1(lazySeq, lazySeq = null))

在控制转移到lazySeq = null之前，ret1被评估为表达式;值是null，并且还有将lazySeq引用设置为null的副作用。然而，ret1的第一个参数将通过这一点进行评估，因此ret1在其第一个参数中接收对序列的引用并按预期返回，然后将该值传递给drop。

因此，drop获得lazySeq本地所持有的原始价值，但在控制转移到drop之前，当地本身已被清除。

因此，nth不再坚持序列的头部。

为什么这个Java方法泄漏 - 为什么内联它修复泄漏？

问题描述投票：5回答：2

2个回答

最新问题

为什么这个Java方法泄漏 - 为什么内联它修复泄漏？

问题描述 投票：5回答：2

2个回答

最新问题

问题描述投票：5回答：2