From 96d6da4e252b06dcfdc041e7df23e86161c33007 Mon Sep 17 00:00:00 2001 From: rihab kouki Date: Tue, 28 Jul 2020 11:24:49 +0100 Subject: Official ARM version: v5.6.0 --- docs/DSP/html/group__MatrixMult.html | 98 ++++++++++++++++++------------------ 1 file changed, 50 insertions(+), 48 deletions(-) (limited to 'docs/DSP/html/group__MatrixMult.html') diff --git a/docs/DSP/html/group__MatrixMult.html b/docs/DSP/html/group__MatrixMult.html index 49e3695..d8655e2 100644 --- a/docs/DSP/html/group__MatrixMult.html +++ b/docs/DSP/html/group__MatrixMult.html @@ -32,7 +32,7 @@ Logo
CMSIS-DSP -  Version 1.5.2 +  Version 1.7.0
CMSIS DSP Software Library
@@ -120,10 +120,10 @@ Functions  Floating-point matrix multiplication. More...
  arm_status arm_mat_mult_fast_q15 (const arm_matrix_instance_q15 *pSrcA, const arm_matrix_instance_q15 *pSrcB, arm_matrix_instance_q15 *pDst, q15_t *pState) - Q15 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4. More...
+ Q15 matrix multiplication (fast variant). More...
  arm_status arm_mat_mult_fast_q31 (const arm_matrix_instance_q31 *pSrcA, const arm_matrix_instance_q31 *pSrcB, arm_matrix_instance_q31 *pDst) - Q31 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4. More...
+ Q31 matrix multiplication (fast variant). More...
  arm_status arm_mat_mult_q15 (const arm_matrix_instance_q15 *pSrcA, const arm_matrix_instance_q15 *pSrcB, arm_matrix_instance_q15 *pDst, q15_t *pState)  Q15 matrix multiplication. More...
@@ -179,10 +179,6 @@ Multiplication of two 3 x 3 matrices
Returns
The function returns either ARM_MATH_SIZE_MISMATCH or ARM_MATH_SUCCESS based on the outcome of size checking.
-

References ARM_MATH_SIZE_MISMATCH, ARM_MATH_SUCCESS, arm_matrix_instance_f32::numCols, arm_matrix_instance_f32::numRows, arm_matrix_instance_f32::pData, and status.

- -

Referenced by main().

- @@ -220,22 +216,24 @@ Multiplication of two 3 x 3 matrices
+

Q15 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4.

Parameters
- - - - + + + +
[in]*pSrcApoints to the first input matrix structure
[in]*pSrcBpoints to the second input matrix structure
[out]*pDstpoints to output matrix structure
[in]*pStatepoints to the array for storing intermediate results
[in]pSrcApoints to the first input matrix structure
[in]pSrcBpoints to the second input matrix structure
[out]pDstpoints to output matrix structure
[in]pStatepoints to the array for storing intermediate results
-
Returns
The function returns either ARM_MATH_SIZE_MISMATCH or ARM_MATH_SUCCESS based on the outcome of size checking.
-

Scaling and Overflow Behavior:

-
The difference between the function arm_mat_mult_q15() and this fast variant is that the fast variant use a 32-bit rather than a 64-bit accumulator. The result of each 1.15 x 1.15 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.15 result.
-
The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 16 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down one of the input matrices by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are computed internally for each output element.
-
See arm_mat_mult_q15() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.
- -

References __SIMD32, __SMLAD(), ARM_MATH_SIZE_MISMATCH, ARM_MATH_SUCCESS, arm_matrix_instance_q15::numCols, arm_matrix_instance_q15::numRows, arm_matrix_instance_q15::pData, and status.

+
Returns
execution status +
+
Scaling and Overflow Behavior
The difference between the function arm_mat_mult_q15() and this fast variant is that the fast variant use a 32-bit rather than a 64-bit accumulator. The result of each 1.15 x 1.15 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.15 result.
+
The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 16 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down one of the input matrices by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are computed internally for each output element.
+
Remarks
Refer to arm_mat_mult_q15() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.
@@ -268,21 +266,23 @@ Multiplication of two 3 x 3 matrices
+

Q31 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4.

Parameters
- - - + + +
[in]*pSrcApoints to the first input matrix structure
[in]*pSrcBpoints to the second input matrix structure
[out]*pDstpoints to output matrix structure
[in]pSrcApoints to the first input matrix structure
[in]pSrcBpoints to the second input matrix structure
[out]pDstpoints to output matrix structure
-
Returns
The function returns either ARM_MATH_SIZE_MISMATCH or ARM_MATH_SUCCESS based on the outcome of size checking.
-

Scaling and Overflow Behavior:

-
The difference between the function arm_mat_mult_q31() and this fast variant is that the fast variant use a 32-bit rather than a 64-bit accumulator. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.31 result.
-
The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down one of the input matrices by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are computed internally for each output element.
-
See arm_mat_mult_q31() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.
- -

References __SMMLA(), ARM_MATH_SIZE_MISMATCH, ARM_MATH_SUCCESS, arm_matrix_instance_q31::numCols, arm_matrix_instance_q31::numRows, arm_matrix_instance_q31::pData, and status.

+
Returns
execution status +
+
Scaling and Overflow Behavior
The difference between the function arm_mat_mult_q31() and this fast variant is that the fast variant use a 32-bit rather than a 64-bit accumulator. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.31 result.
+
The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down one of the input matrices by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are computed internally for each output element.
+
Remarks
Refer to arm_mat_mult_q31() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.
@@ -323,19 +323,20 @@ Multiplication of two 3 x 3 matrices
Parameters
- - - - + + + +
[in]*pSrcApoints to the first input matrix structure
[in]*pSrcBpoints to the second input matrix structure
[out]*pDstpoints to output matrix structure
[in]*pStatepoints to the array for storing intermediate results (Unused)
[in]pSrcApoints to the first input matrix structure
[in]pSrcBpoints to the second input matrix structure
[out]pDstpoints to output matrix structure
[in]pStatepoints to the array for storing intermediate results (Unused)
-
Returns
The function returns either ARM_MATH_SIZE_MISMATCH or ARM_MATH_SUCCESS based on the outcome of size checking.
-

Scaling and Overflow Behavior:

-
The function is implemented using a 64-bit internal accumulator. The inputs to the multiplications are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.
-
Refer to arm_mat_mult_fast_q15() for a faster but less precise version of this function for Cortex-M3 and Cortex-M4.
- -

References __SIMD32, __SMLALD(), ARM_MATH_SIZE_MISMATCH, ARM_MATH_SUCCESS, arm_matrix_instance_q15::numCols, arm_matrix_instance_q15::numRows, arm_matrix_instance_q15::pData, and status.

+
Returns
execution status +
+
Scaling and Overflow Behavior
The function is implemented using an internal 64-bit accumulator. The inputs to the multiplications are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.
+
Refer to arm_mat_mult_fast_q15() for a faster but less precise version of this function.
@@ -370,18 +371,19 @@ Multiplication of two 3 x 3 matrices
Parameters
- - - + + +
[in]*pSrcApoints to the first input matrix structure
[in]*pSrcBpoints to the second input matrix structure
[out]*pDstpoints to output matrix structure
[in]pSrcApoints to the first input matrix structure
[in]pSrcBpoints to the second input matrix structure
[out]pDstpoints to output matrix structure
-
Returns
The function returns either ARM_MATH_SIZE_MISMATCH or ARM_MATH_SUCCESS based on the outcome of size checking.
-

Scaling and Overflow Behavior:

-
The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. The input is thus scaled down by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are performed internally. The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.
-
See arm_mat_mult_fast_q31() for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.
- -

References ARM_MATH_SIZE_MISMATCH, ARM_MATH_SUCCESS, clip_q63_to_q31(), arm_matrix_instance_q31::numCols, arm_matrix_instance_q31::numRows, arm_matrix_instance_q31::pData, and status.

+
Returns
execution status +
+
Scaling and Overflow Behavior
The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. The input is thus scaled down by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are performed internally. The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.
+
Remarks
Refer to arm_mat_mult_fast_q31() for a faster but less precise implementation of this function.
@@ -390,7 +392,7 @@ Multiplication of two 3 x 3 matrices