有时,编译器会对某些函数实现进行特殊处理。简单地说,它们将默认实现替换为另一个可能经过优化的实现。这样的函数在编译器理论中被称为内联/内在函数。
在本文中,我们将通过几个示例来了解内联函数在HotSpot JVM中是如何工作的:
Java中的Math.log()
方法计算任何给定数字的自然对数。以下是这个方法在OpenJDK中的实现:
@IntrinsicCandidate
public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}
如上所示,Math.log()
方法本身在后台调用了另一个名为StrictMath.log()
的方法。尽管有这种委托,我们通常倾向于使用Math.log()
,而不是严格的、更直接的!
好吧,我们都知道,当Math.log()
方法变得足够热(即被频繁调用)时,HotSpot JVM将内联这个委托。因此,期望两个方法调用都表现出相似的性能特征是很自然的,至少在性能很重要的时候是这样!。
为了证明这一假设,让我们进行一个简单的基准测试,比较两种实现:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
public class IntrinsicsBenchmark {
@Param("12346545756.54634")
double value;
@Benchmark
public double indirect() {
return Math.log(value); // Calls the StrictMath.log(value) under the hood.
}
@Benchmark
public double direct() {
return StrictMath.log(value);
}
// typical stuff
}
结果应该是可以预测的,对吧?
如果我们打包基准测试并运行以下命令:
>> java -jar intrinsics.jar -f 2 -t 8
过一段时间,JMH将打印基准测试结果,如下所示:
Benchmark (value) Mode Cnt Score Error Units
IntrinsicsBenchmark.direct 12346545756.54634 thrpt 20 151571897.277 ± 7878104.343 ops/s
IntrinsicsBenchmark.indirect 12346545756.54634 thrpt 20 309745064.598 ± 12678366.349 ops/s
我们没想到会这样,是吗?!在吞吐量方面,间接的Math.log()
实现比直接的、据说性能更高的实现高出近105%!
让我们再次仔细研究一下Math.log()
的实现,以确保我们没有遗漏一些内容:
@IntrinsicCandidate
public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}
委派确实存在。非常有趣的是,该方法上还有一个@IntrinsicCandidate
注释。在进一步讨论之前,值得一提的是,在Java16之前,同样的方法看起来是这样的:
@HotSpotIntrinsicCandidate
public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}
因此,基本上,从Java 16开始,jdk.internal.HotSpotIntrinsicCandidate
被重新打包并重命名为jdk.internal_vm.annotation.IntrinsicCandidate
。
无论如何,@IntrinsicCandidate
可能会揭示这个令人震惊的基准测试结果背后的实际原因。让我们来看看注释Javadoc:
/**
* The {@code @IntrinsicCandidate} annotation is specific to the
* HotSpot Virtual Machine. It indicates that an annotated method
* may be (but is not guaranteed to be) intrinsified by the HotSpot VM. A method
* is intrinsified if the HotSpot VM replaces the annotated method with hand-written
* assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve
* performance. The {@code @IntrinsicCandidate} annotation is internal to the
* Java libraries and is therefore not supposed to have any relevance for application
* code.
*
* @since 16
*/
@Target({ElementType.METHOD, ElementType.CONSTRUCTOR})
@Retention(RetentionPolicy.RUNTIME)
public @interface IntrinsicCandidate {
}
基于此,HotSpot JVM可能会用一个可能更高效的内在编译器来取代Math.log()
Java实现,以提高性能。
事实证明,Math.log()方法实际上有一个内部函数!
HotSpot JVM在vmIntrnsics.hpp文件1中定义其所有内部函数。在HotSpot中,有两种类型的内部函数:
- 库内部函数:这些是典型的编译器内部函数,因为它们将取代方法实现。
- 字节码内部:这些方法不会被取代,而是会有特殊的处理。
HotSpot JVM源代码记录了这两种类型,如下所示:
// There are two types of intrinsic methods: (1) Library intrinsics and (2) bytecode intrinsics.
//
// (1) A library intrinsic method may be replaced with hand-crafted assembly code,
// with hand-crafted compiler IR, or with a combination of the two. The semantics
// of the replacement code may differ from the semantics of the replaced code.
//
// (2) Bytecode intrinsic methods are not replaced by special code, but they are
// treated in some other special way by the compiler. For example, the compiler
// may delay inlining for some String-related intrinsic methods (e.g., some methods
// defined in the StringBuilder and StringBuffer classes, see
// Compile::should_delay_string_inlining() for more details).
紧接着,他们一个接一个地列出了所有可能的VM内部。例如:
// Here are all the intrinsics known to the runtime and the CI.
// omitted
/* Math & StrictMath intrinsics are defined in terms of just a few signatures: */ \
do_class(java_lang_Math, "java/lang/Math")
/* here are the math names, all together: */ \
do_name(abs_name,"abs") do_name(sin_name,"sin") do_name(cos_name,"cos") \
do_name(tan_name,"tan") do_name(atan2_name,"atan2") do_name(sqrt_name,"sqrt") \
do_name(log_name,"log") do_name(log10_name,"log10") do_name(pow_name,"pow") \
do_name(exp_name,"exp") do_name(min_name,"min") do_name(max_name,"max") \
do_name(floor_name, "floor") do_name(ceil_name, "ceil") do_name(rint_name, "rint")
do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S)
如最后一行所示,实际上Math.log()
有一个内在的替换。例如,在x86-64体系结构上,Math.log()
将被内部化如下:
if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dlog)) {
StubRoutines::_dlog = generate_libmLog();
}
// the generator
address generate_libmLog() {
StubCodeMark mark(this, "StubRoutines", "libmLog");
address start = __ pc();
const XMMRegister x0 = xmm0;
const XMMRegister x1 = xmm1;
const XMMRegister x2 = xmm2;
const XMMRegister x3 = xmm3;
const XMMRegister x4 = xmm4;
const XMMRegister x5 = xmm5;
const XMMRegister x6 = xmm6;
const XMMRegister x7 = xmm7;
const Register tmp1 = r11;
const Register tmp2 = r8;
BLOCK_COMMENT("Entry:");
__ enter(); // required for proper stackwalking of RuntimeStub frame
__ fast_log(x0, x1, x2, x3, x4, x5, x6, x7, rax, rcx, rdx, tmp1, tmp2);
__ leave(); // required for proper stackwalking of RuntimeStub frame
__ ret(0);
return start;
}
vmIntrnsics.hpp只定义了这样一个事实,即某些方法可能具有内部实现。实际的内在例程是在其他地方提供的,通常取决于底层架构。在上面的示例中,src/hotspot/cpu/x86/stubGenerator_x86_64.cpp负责为64位x86体系结构提供实际的内在特性。
除了特定于体系结构之外,还可以禁用内部函数。因此,JVM编译器(C1或C2)在应用内部函数之前会检查以下两个条件:
virtual bool is_intrinsic_available(const methodHandle& method, DirectiveSet* directive) {
return is_intrinsic_supported(method) &&
!directive->is_intrinsic_disabled(method) &&
!vmIntrinsics::is_disabled_by_flags(method);
}
基本上,内在的是可用的,如果:
- 通常通过使用可调标志来启用内部函数。
- 底层平台支持内在的。
让我们来了解更多关于这些可调参数的信息:
与JVM的许多其他方面类似,我们可以使用可调标志在一定程度上控制内部。
对于初学者来说,-XX:+UnlockDiagnosticsVMOptions
和-XX:+PrintIntrinsics
的组合使HotSpot在引入它们的同时打印所有内部信息。例如,如果我们使用这些标志运行相同的基准测试,我们将看到许多与Math.log()
相关的日志:
>> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintIntrinsics -jar intrinsics.jar -f 2 -t 8
// truncated logs
@ 4 java.lang.Math::log (5 bytes) (intrinsic)
@ 4 java.lang.Math::log (5 bytes) (intrinsic)
@ 55 java.lang.Math::min (11 bytes) (intrinsic)
@ 58 java.lang.System::arraycopy (0 bytes) (intrinsic)
此外,我们可以使用-XX:-InlineMathNatives
可调函数禁用所有与数学相关的内部函数:
>> java -XX:+UnlockDiagnosticVMOptions -XX:-InlineMathNatives -jar intrinsics.jar -f 1 -t 8
Benchmark (value) Mode Cnt Score Error Units
IntrinsicsBenchmark.direct 12346545756.54634 thrpt 20 171611762.349 ± 4203913.645 ops/s
IntrinsicsBenchmark.indirect 12346545756.54634 thrpt 20 169765587.934 ± 9555128.466 ops/s
如上所示,由于JVM不再为Math.log()
应用内部函数,因此吞吐量几乎相同!
像往常一样,使用一个简单的grep,我们可以看到与特定主题相关的所有可调内容:
>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | grep Intrinsic
bool CheckIntrinsics = true
ccstrlist DisableIntrinsic =
bool PrintIntrinsics = false
bool UseAESCTRIntrinsics = true
bool UseAESIntrinsics = true
bool UseAdler32Intrinsics = false
bool UseBASE64Intrinsics = false
bool UseCRC32CIntrinsics = true
bool UseCRC32Intrinsics = true
bool UseCharacterCompareIntrinsics = false
bool UseGHASHIntrinsics = true
bool UseLibmIntrinsic = true
bool UseMathExactIntrinsics = true
bool UseMontgomeryMultiplyIntrinsic = true
bool UseMontgomerySquareIntrinsic = true
bool UseMulAddIntrinsic = true
bool UseMultiplyToLenIntrinsic = true
bool UseSHA1Intrinsics = false
bool UseSHA256Intrinsics = true
bool UseSHA512Intrinsics = true
bool UseSSE42Intrinsics = true
bool UseSquareToLenIntrinsic = true
bool UseVectorizedMismatchIntrinsic = true
还有一件事:
>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | grep Native
bool CriticalJNINatives = true
bool InlineClassNatives = true
bool InlineMathNatives = true
bool InlineNatives = true
bool InlineThreadNatives = true
原文链接:https://alidg.me/blog/2020/12/10/hotspot-intrinsics
源码地址:https://github.com/alimate/intrinsics
除特别注明外,本站所有文章均为老K的Java博客原创,转载请注明出处来自https://javakk.com/2930.html
暂无评论