HotSpot内联函数注解@IntrinsicCandidate简介

sofia 1年前 (2023-11-27) Java系列 | 抢沙发 440

文章评分 0 次，平均分 0.0 ：

有时，编译器会对某些函数实现进行特殊处理。简单地说，它们将默认实现替换为另一个可能经过优化的实现。这样的函数在编译器理论中被称为内联/内在函数。

在本文中，我们将通过几个示例来了解内联函数在HotSpot JVM中是如何工作的：

Java中的Math.log（）方法计算任何给定数字的自然对数。以下是这个方法在OpenJDK中的实现：

@IntrinsicCandidate
public static double log(double a) {
    return StrictMath.log(a); // default impl. delegates to StrictMath
}

如上所示，Math.log（）方法本身在后台调用了另一个名为StrictMath.log（）的方法。尽管有这种委托，我们通常倾向于使用Math.log（），而不是严格的、更直接的！

好吧，我们都知道，当Math.log（）方法变得足够热（即被频繁调用）时，HotSpot JVM将内联这个委托。因此，期望两个方法调用都表现出相似的性能特征是很自然的，至少在性能很重要的时候是这样！。

为了证明这一假设，让我们进行一个简单的基准测试，比较两种实现：

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
public class IntrinsicsBenchmark {

    @Param("12346545756.54634")
    double value;

    @Benchmark
    public double indirect() {
        return Math.log(value); // Calls the StrictMath.log(value) under the hood.
    }

    @Benchmark
    public double direct() {
        return StrictMath.log(value);
    }

    // typical stuff
}

结果应该是可以预测的，对吧？

如果我们打包基准测试并运行以下命令：

>> java -jar intrinsics.jar -f 2 -t 8

过一段时间，JMH将打印基准测试结果，如下所示：

Benchmark                               (value)   Mode  Cnt          Score          Error  Units
IntrinsicsBenchmark.direct    12346545756.54634  thrpt   20  151571897.277 ±  7878104.343  ops/s
IntrinsicsBenchmark.indirect  12346545756.54634  thrpt   20  309745064.598 ± 12678366.349  ops/s

我们没想到会这样，是吗？！在吞吐量方面，间接的Math.log（）实现比直接的、据说性能更高的实现高出近105%！

让我们再次仔细研究一下Math.log（）的实现，以确保我们没有遗漏一些内容：

@IntrinsicCandidate
public static double log(double a) {
    return StrictMath.log(a); // default impl. delegates to StrictMath
}

委派确实存在。非常有趣的是，该方法上还有一个@IntrinsicCandidate注释。在进一步讨论之前，值得一提的是，在Java16之前，同样的方法看起来是这样的：

@HotSpotIntrinsicCandidate
public static double log(double a) {
    return StrictMath.log(a); // default impl. delegates to StrictMath
}

因此，基本上，从Java 16开始，jdk.internal.HotSpotIntrinsicCandidate被重新打包并重命名为jdk.internal_vm.annotation.IntrinsicCandidate。

无论如何，@IntrinsicCandidate可能会揭示这个令人震惊的基准测试结果背后的实际原因。让我们来看看注释Javadoc：

/**
 * The {@code @IntrinsicCandidate} annotation is specific to the
 * HotSpot Virtual Machine. It indicates that an annotated method
 * may be (but is not guaranteed to be) intrinsified by the HotSpot VM. A method
 * is intrinsified if the HotSpot VM replaces the annotated method with hand-written
 * assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve
 * performance. The {@code @IntrinsicCandidate} annotation is internal to the
 * Java libraries and is therefore not supposed to have any relevance for application
 * code.
 *
 * @since 16
 */
@Target({ElementType.METHOD, ElementType.CONSTRUCTOR})
@Retention(RetentionPolicy.RUNTIME)
public @interface IntrinsicCandidate {
}

基于此，HotSpot JVM可能会用一个可能更高效的内在编译器来取代Math.log（）Java实现，以提高性能。

事实证明，Math.log（）方法实际上有一个内部函数！

HotSpot JVM在vmIntrnsics.hpp文件1中定义其所有内部函数。在HotSpot中，有两种类型的内部函数：

库内部函数：这些是典型的编译器内部函数，因为它们将取代方法实现。
字节码内部：这些方法不会被取代，而是会有特殊的处理。

HotSpot JVM源代码记录了这两种类型，如下所示：

// There are two types of intrinsic methods: (1) Library intrinsics and (2) bytecode intrinsics.
//
// (1) A library intrinsic method may be replaced with hand-crafted assembly code,
// with hand-crafted compiler IR, or with a combination of the two. The semantics
// of the replacement code may differ from the semantics of the replaced code.
//
// (2) Bytecode intrinsic methods are not replaced by special code, but they are
// treated in some other special way by the compiler. For example, the compiler
// may delay inlining for some String-related intrinsic methods (e.g., some methods
// defined in the StringBuilder and StringBuffer classes, see
// Compile::should_delay_string_inlining() for more details).

紧接着，他们一个接一个地列出了所有可能的VM内部。例如：

// Here are all the intrinsics known to the runtime and the CI.
// omitted
/* Math & StrictMath intrinsics are defined in terms of just a few signatures: */           \
do_class(java_lang_Math,                "java/lang/Math") 
/* here are the math names, all together: */                                                \
do_name(abs_name,"abs")       do_name(sin_name,"sin")         do_name(cos_name,"cos")       \
do_name(tan_name,"tan")       do_name(atan2_name,"atan2")     do_name(sqrt_name,"sqrt")     \
do_name(log_name,"log")       do_name(log10_name,"log10")     do_name(pow_name,"pow")       \
do_name(exp_name,"exp")       do_name(min_name,"min")         do_name(max_name,"max")       \
do_name(floor_name, "floor")  do_name(ceil_name, "ceil")      do_name(rint_name, "rint")
do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S)

如最后一行所示，实际上Math.log（）有一个内在的替换。例如，在x86-64体系结构上，Math.log（）将被内部化如下：

if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dlog)) {
    StubRoutines::_dlog = generate_libmLog();
}

// the generator
address generate_libmLog() {
    StubCodeMark mark(this, "StubRoutines", "libmLog");

    address start = __ pc();

    const XMMRegister x0 = xmm0;
    const XMMRegister x1 = xmm1;
    const XMMRegister x2 = xmm2;
    const XMMRegister x3 = xmm3;

    const XMMRegister x4 = xmm4;
    const XMMRegister x5 = xmm5;
    const XMMRegister x6 = xmm6;
    const XMMRegister x7 = xmm7;

    const Register tmp1 = r11;
    const Register tmp2 = r8;

    BLOCK_COMMENT("Entry:");
    __ enter(); // required for proper stackwalking of RuntimeStub frame

    __ fast_log(x0, x1, x2, x3, x4, x5, x6, x7, rax, rcx, rdx, tmp1, tmp2);

    __ leave(); // required for proper stackwalking of RuntimeStub frame
    __ ret(0);

    return start;

}

vmIntrnsics.hpp只定义了这样一个事实，即某些方法可能具有内部实现。实际的内在例程是在其他地方提供的，通常取决于底层架构。在上面的示例中，src/hotspot/cpu/x86/stubGenerator_x86_64.cpp负责为64位x86体系结构提供实际的内在特性。

除了特定于体系结构之外，还可以禁用内部函数。因此，JVM编译器（C1或C2）在应用内部函数之前会检查以下两个条件：

virtual bool is_intrinsic_available(const methodHandle& method, DirectiveSet* directive) {
    return is_intrinsic_supported(method) &&
           !directive->is_intrinsic_disabled(method) &&
           !vmIntrinsics::is_disabled_by_flags(method);
}

基本上，内在的是可用的，如果：

通常通过使用可调标志来启用内部函数。
底层平台支持内在的。

让我们来了解更多关于这些可调参数的信息：

与JVM的许多其他方面类似，我们可以使用可调标志在一定程度上控制内部。

对于初学者来说，-XX:+UnlockDiagnosticsVMOptions和-XX:+PrintIntrinsics的组合使HotSpot在引入它们的同时打印所有内部信息。例如，如果我们使用这些标志运行相同的基准测试，我们将看到许多与Math.log（）相关的日志：

>> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintIntrinsics -jar intrinsics.jar -f 2 -t 8
// truncated logs
@ 4   java.lang.Math::log (5 bytes)   (intrinsic)
@ 4   java.lang.Math::log (5 bytes)   (intrinsic)
@ 55  java.lang.Math::min (11 bytes)   (intrinsic)
@ 58  java.lang.System::arraycopy (0 bytes)   (intrinsic)

此外，我们可以使用-XX:-InlineMathNatives可调函数禁用所有与数学相关的内部函数：

>> java -XX:+UnlockDiagnosticVMOptions -XX:-InlineMathNatives -jar intrinsics.jar -f 1 -t 8
Benchmark                               (value)   Mode  Cnt          Score          Error  Units
IntrinsicsBenchmark.direct    12346545756.54634  thrpt   20  171611762.349 ±  4203913.645  ops/s
IntrinsicsBenchmark.indirect  12346545756.54634  thrpt   20  169765587.934 ±  9555128.466  ops/s

如上所示，由于JVM不再为Math.log（）应用内部函数，因此吞吐量几乎相同！

像往常一样，使用一个简单的grep，我们可以看到与特定主题相关的所有可调内容：

>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | grep Intrinsic
bool CheckIntrinsics                          = true                                   
ccstrlist DisableIntrinsic                    =                              
bool PrintIntrinsics                          = false                               
bool UseAESCTRIntrinsics                      = true                                   
bool UseAESIntrinsics                         = true                                 
bool UseAdler32Intrinsics                     = false                               
bool UseBASE64Intrinsics                      = false                                    
bool UseCRC32CIntrinsics                      = true                                   
bool UseCRC32Intrinsics                       = true                                   
bool UseCharacterCompareIntrinsics            = false                               
bool UseGHASHIntrinsics                       = true                                   
bool UseLibmIntrinsic                         = true                            
bool UseMathExactIntrinsics                   = true                               
bool UseMontgomeryMultiplyIntrinsic           = true                               
bool UseMontgomerySquareIntrinsic             = true                                
bool UseMulAddIntrinsic                       = true                               
bool UseMultiplyToLenIntrinsic                = true                                
bool UseSHA1Intrinsics                        = false                                  
bool UseSHA256Intrinsics                      = true                                   
bool UseSHA512Intrinsics                      = true                                 
bool UseSSE42Intrinsics                       = true                                 
bool UseSquareToLenIntrinsic                  = true                               
bool UseVectorizedMismatchIntrinsic           = true

还有一件事：

>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | grep Native
bool CriticalJNINatives                       = true                                     
bool InlineClassNatives                       = true                                
bool InlineMathNatives                        = true                                  
bool InlineNatives                            = true                                   
bool InlineThreadNatives                      = true

原文链接：https://alidg.me/blog/2020/12/10/hotspot-intrinsics

源码地址：https://github.com/alimate/intrinsics