_mm256_macc_ps
Visual Studio 2010 SP1 is required.
Microsoft Specific
Generates the FMA4 YMM instruction vfmaddps to perform a single-round floating-point multiply-add of its sources.
__m256 _mm_macc_ps (
__m256 src1,
__m256 src2,
__m256 src3
);
Parameters
[in] src1
A 256-bit parameter that contains eight 32-bit floating-point values.[in] src2
A 256-bit parameter that contains eight 32-bit floating-point values.[in] src3
A 256-bit parameter that contains eight 32-bit floating-point values.
Return value
A 256-bit result r that contains eight 32-bit floating-point values.
r[i] := src1[i] * src2[i] + src3[i];
Requirements
Intrinsic |
Architecture |
---|---|
_mm256_macc_ps |
FMA4 |
Header file <intrin.h>
Remarks
Each of the eight single-precision floating-point values in src1 is multiplied by the corresponding value in src2 and added to the corresponding value in src3, and the result is stored as the corresponding value in the destination. Each multiply-add pair is performed with a single round at the end, as if intermediate results were computed to infinite precision.
The vfmaddps instruction is part of the FMA4 family of instructions. Before you use this intrinsic, you must ensure that the processor supports this instruction. To determine hardware support for this instruction, call the __cpuid intrinsic with InfoType = 0x80000001 and check bit 16 of CPUInfo[2] (ECX). This bit is 1 when the instruction is supported, and 0 otherwise.
Example
#include <stdio.h>
#include <intrin.h>
int main()
{
__m256 a, b, c, d;
int i;
for (i = 0; i < 8; i++) {
a.m256_f32[i] = i;
b.m256_f32[i] = 2.;
c.m256_f32[i] = 3.;
}
d = _mm256_macc_ps(a, b, c);
for (i = 0; i < 8; i++) printf_s(" %.3f", d.m256_f32[i]);
printf_s("\n");
}
3.000 5.000 7.000 9.000 11.000 13.000 15.000 17.000