Subversion Repositories canSerial

Rev

Blame | Last modification | View Log | Download | RSS feed

  1. /* ----------------------------------------------------------------------
  2.  * Project:      CMSIS DSP Library
  3.  * Title:        arm_lms_f32.c
  4.  * Description:  Processing function for the floating-point LMS filter
  5.  *
  6.  * $Date:        27. January 2017
  7.  * $Revision:    V.1.5.1
  8.  *
  9.  * Target Processor: Cortex-M cores
  10.  * -------------------------------------------------------------------- */
  11. /*
  12.  * Copyright (C) 2010-2017 ARM Limited or its affiliates. All rights reserved.
  13.  *
  14.  * SPDX-License-Identifier: Apache-2.0
  15.  *
  16.  * Licensed under the Apache License, Version 2.0 (the License); you may
  17.  * not use this file except in compliance with the License.
  18.  * You may obtain a copy of the License at
  19.  *
  20.  * www.apache.org/licenses/LICENSE-2.0
  21.  *
  22.  * Unless required by applicable law or agreed to in writing, software
  23.  * distributed under the License is distributed on an AS IS BASIS, WITHOUT
  24.  * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  25.  * See the License for the specific language governing permissions and
  26.  * limitations under the License.
  27.  */
  28.  
  29. #include "arm_math.h"
  30.  
  31. /**
  32.  * @ingroup groupFilters
  33.  */
  34.  
  35. /**
  36.  * @defgroup LMS Least Mean Square (LMS) Filters
  37.  *
  38.  * LMS filters are a class of adaptive filters that are able to "learn" an unknown transfer functions.
  39.  * LMS filters use a gradient descent method in which the filter coefficients are updated based on the instantaneous error signal.
  40.  * Adaptive filters are often used in communication systems, equalizers, and noise removal.
  41.  * The CMSIS DSP Library contains LMS filter functions that operate on Q15, Q31, and floating-point data types.
  42.  * The library also contains normalized LMS filters in which the filter coefficient adaptation is indepedent of the level of the input signal.
  43.  *
  44.  * An LMS filter consists of two components as shown below.
  45.  * The first component is a standard transversal or FIR filter.
  46.  * The second component is a coefficient update mechanism.
  47.  * The LMS filter has two input signals.
  48.  * The "input" feeds the FIR filter while the "reference input" corresponds to the desired output of the FIR filter.
  49.  * That is, the FIR filter coefficients are updated so that the output of the FIR filter matches the reference input.
  50.  * The filter coefficient update mechanism is based on the difference between the FIR filter output and the reference input.
  51.  * This "error signal" tends towards zero as the filter adapts.
  52.  * The LMS processing functions accept the input and reference input signals and generate the filter output and error signal.
  53.  * \image html LMS.gif "Internal structure of the Least Mean Square filter"
  54.  *
  55.  * The functions operate on blocks of data and each call to the function processes
  56.  * <code>blockSize</code> samples through the filter.
  57.  * <code>pSrc</code> points to input signal, <code>pRef</code> points to reference signal,
  58.  * <code>pOut</code> points to output signal and <code>pErr</code> points to error signal.
  59.  * All arrays contain <code>blockSize</code> values.
  60.  *
  61.  * The functions operate on a block-by-block basis.
  62.  * Internally, the filter coefficients <code>b[n]</code> are updated on a sample-by-sample basis.
  63.  * The convergence of the LMS filter is slower compared to the normalized LMS algorithm.
  64.  *
  65.  * \par Algorithm:
  66.  * The output signal <code>y[n]</code> is computed by a standard FIR filter:
  67.  * <pre>
  68.  *     y[n] = b[0] * x[n] + b[1] * x[n-1] + b[2] * x[n-2] + ...+ b[numTaps-1] * x[n-numTaps+1]
  69.  * </pre>
  70.  *
  71.  * \par
  72.  * The error signal equals the difference between the reference signal <code>d[n]</code> and the filter output:
  73.  * <pre>
  74.  *     e[n] = d[n] - y[n].
  75.  * </pre>
  76.  *
  77.  * \par
  78.  * After each sample of the error signal is computed, the filter coefficients <code>b[k]</code> are updated on a sample-by-sample basis:
  79.  * <pre>
  80.  *     b[k] = b[k] + e[n] * mu * x[n-k],  for k=0, 1, ..., numTaps-1
  81.  * </pre>
  82.  * where <code>mu</code> is the step size and controls the rate of coefficient convergence.
  83.  *\par
  84.  * In the APIs, <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>.
  85.  * Coefficients are stored in time reversed order.
  86.  * \par
  87.  * <pre>
  88.  *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}
  89.  * </pre>
  90.  * \par
  91.  * <code>pState</code> points to a state array of size <code>numTaps + blockSize - 1</code>.
  92.  * Samples in the state buffer are stored in the order:
  93.  * \par
  94.  * <pre>
  95.  *    {x[n-numTaps+1], x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2]....x[0], x[1], ..., x[blockSize-1]}
  96.  * </pre>
  97.  * \par
  98.  * Note that the length of the state buffer exceeds the length of the coefficient array by <code>blockSize-1</code> samples.
  99.  * The increased state buffer length allows circular addressing, which is traditionally used in FIR filters,
  100.  * to be avoided and yields a significant speed improvement.
  101.  * The state variables are updated after each block of data is processed.
  102.  * \par Instance Structure
  103.  * The coefficients and state variables for a filter are stored together in an instance data structure.
  104.  * A separate instance structure must be defined for each filter and
  105.  * coefficient and state arrays cannot be shared among instances.
  106.  * There are separate instance structure declarations for each of the 3 supported data types.
  107.  *
  108.  * \par Initialization Functions
  109.  * There is also an associated initialization function for each data type.
  110.  * The initialization function performs the following operations:
  111.  * - Sets the values of the internal structure fields.
  112.  * - Zeros out the values in the state buffer.
  113.  * To do this manually without calling the init function, assign the follow subfields of the instance structure:
  114.  * numTaps, pCoeffs, mu, postShift (not for f32), pState. Also set all of the values in pState to zero.
  115.  *
  116.  * \par
  117.  * Use of the initialization function is optional.
  118.  * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.
  119.  * To place an instance structure into a const data section, the instance structure must be manually initialized.
  120.  * Set the values in the state buffer to zeros before static initialization.
  121.  * The code below statically initializes each of the 3 different data type filter instance structures
  122.  * <pre>
  123.  *    arm_lms_instance_f32 S = {numTaps, pState, pCoeffs, mu};
  124.  *    arm_lms_instance_q31 S = {numTaps, pState, pCoeffs, mu, postShift};
  125.  *    arm_lms_instance_q15 S = {numTaps, pState, pCoeffs, mu, postShift};
  126.  * </pre>
  127.  * where <code>numTaps</code> is the number of filter coefficients in the filter; <code>pState</code> is the address of the state buffer;
  128.  * <code>pCoeffs</code> is the address of the coefficient buffer; <code>mu</code> is the step size parameter; and <code>postShift</code> is the shift applied to coefficients.
  129.  *
  130.  * \par Fixed-Point Behavior:
  131.  * Care must be taken when using the Q15 and Q31 versions of the LMS filter.
  132.  * The following issues must be considered:
  133.  * - Scaling of coefficients
  134.  * - Overflow and saturation
  135.  *
  136.  * \par Scaling of Coefficients:
  137.  * Filter coefficients are represented as fractional values and
  138.  * coefficients are restricted to lie in the range <code>[-1 +1)</code>.
  139.  * The fixed-point functions have an additional scaling parameter <code>postShift</code>.
  140.  * At the output of the filter's accumulator is a shift register which shifts the result by <code>postShift</code> bits.
  141.  * This essentially scales the filter coefficients by <code>2^postShift</code> and
  142.  * allows the filter coefficients to exceed the range <code>[+1 -1)</code>.
  143.  * The value of <code>postShift</code> is set by the user based on the expected gain through the system being modeled.
  144.  *
  145.  * \par Overflow and Saturation:
  146.  * Overflow and saturation behavior of the fixed-point Q15 and Q31 versions are
  147.  * described separately as part of the function specific documentation below.
  148.  */
  149.  
  150. /**
  151.  * @addtogroup LMS
  152.  * @{
  153.  */
  154.  
  155. /**
  156.  * @details
  157.  * This function operates on floating-point data types.
  158.  *
  159.  * @brief Processing function for floating-point LMS filter.
  160.  * @param[in]  *S points to an instance of the floating-point LMS filter structure.
  161.  * @param[in]  *pSrc points to the block of input data.
  162.  * @param[in]  *pRef points to the block of reference data.
  163.  * @param[out] *pOut points to the block of output data.
  164.  * @param[out] *pErr points to the block of error data.
  165.  * @param[in]  blockSize number of samples to process.
  166.  * @return     none.
  167.  */
  168.  
  169. void arm_lms_f32(
  170.   const arm_lms_instance_f32 * S,
  171.   float32_t * pSrc,
  172.   float32_t * pRef,
  173.   float32_t * pOut,
  174.   float32_t * pErr,
  175.   uint32_t blockSize)
  176. {
  177.   float32_t *pState = S->pState;                 /* State pointer */
  178.   float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
  179.   float32_t *pStateCurnt;                        /* Points to the current sample of the state */
  180.   float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
  181.   float32_t mu = S->mu;                          /* Adaptive factor */
  182.   uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
  183.   uint32_t tapCnt, blkCnt;                       /* Loop counters */
  184.   float32_t sum, e, d;                           /* accumulator, error, reference data sample */
  185.   float32_t w = 0.0f;                            /* weight factor */
  186.  
  187.   e = 0.0f;
  188.   d = 0.0f;
  189.  
  190.   /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
  191.   /* pStateCurnt points to the location where the new input data should be written */
  192.   pStateCurnt = &(S->pState[(numTaps - 1U)]);
  193.  
  194.   blkCnt = blockSize;
  195.  
  196.  
  197. #if defined (ARM_MATH_DSP)
  198.  
  199.   /* Run the below code for Cortex-M4 and Cortex-M3 */
  200.  
  201.   while (blkCnt > 0U)
  202.   {
  203.     /* Copy the new input sample into the state buffer */
  204.     *pStateCurnt++ = *pSrc++;
  205.  
  206.     /* Initialize pState pointer */
  207.     px = pState;
  208.  
  209.     /* Initialize coeff pointer */
  210.     pb = (pCoeffs);
  211.  
  212.     /* Set the accumulator to zero */
  213.     sum = 0.0f;
  214.  
  215.     /* Loop unrolling.  Process 4 taps at a time. */
  216.     tapCnt = numTaps >> 2;
  217.  
  218.     while (tapCnt > 0U)
  219.     {
  220.       /* Perform the multiply-accumulate */
  221.       sum += (*px++) * (*pb++);
  222.       sum += (*px++) * (*pb++);
  223.       sum += (*px++) * (*pb++);
  224.       sum += (*px++) * (*pb++);
  225.  
  226.       /* Decrement the loop counter */
  227.       tapCnt--;
  228.     }
  229.  
  230.     /* If the filter length is not a multiple of 4, compute the remaining filter taps */
  231.     tapCnt = numTaps % 0x4U;
  232.  
  233.     while (tapCnt > 0U)
  234.     {
  235.       /* Perform the multiply-accumulate */
  236.       sum += (*px++) * (*pb++);
  237.  
  238.       /* Decrement the loop counter */
  239.       tapCnt--;
  240.     }
  241.  
  242.     /* The result in the accumulator, store in the destination buffer. */
  243.     *pOut++ = sum;
  244.  
  245.     /* Compute and store error */
  246.     d = (float32_t) (*pRef++);
  247.     e = d - sum;
  248.     *pErr++ = e;
  249.  
  250.     /* Calculation of Weighting factor for the updating filter coefficients */
  251.     w = e * mu;
  252.  
  253.     /* Initialize pState pointer */
  254.     px = pState;
  255.  
  256.     /* Initialize coeff pointer */
  257.     pb = (pCoeffs);
  258.  
  259.     /* Loop unrolling.  Process 4 taps at a time. */
  260.     tapCnt = numTaps >> 2;
  261.  
  262.     /* Update filter coefficients */
  263.     while (tapCnt > 0U)
  264.     {
  265.       /* Perform the multiply-accumulate */
  266.       *pb = *pb + (w * (*px++));
  267.       pb++;
  268.  
  269.       *pb = *pb + (w * (*px++));
  270.       pb++;
  271.  
  272.       *pb = *pb + (w * (*px++));
  273.       pb++;
  274.  
  275.       *pb = *pb + (w * (*px++));
  276.       pb++;
  277.  
  278.       /* Decrement the loop counter */
  279.       tapCnt--;
  280.     }
  281.  
  282.     /* If the filter length is not a multiple of 4, compute the remaining filter taps */
  283.     tapCnt = numTaps % 0x4U;
  284.  
  285.     while (tapCnt > 0U)
  286.     {
  287.       /* Perform the multiply-accumulate */
  288.       *pb = *pb + (w * (*px++));
  289.       pb++;
  290.  
  291.       /* Decrement the loop counter */
  292.       tapCnt--;
  293.     }
  294.  
  295.     /* Advance state pointer by 1 for the next sample */
  296.     pState = pState + 1;
  297.  
  298.     /* Decrement the loop counter */
  299.     blkCnt--;
  300.   }
  301.  
  302.  
  303.   /* Processing is complete. Now copy the last numTaps - 1 samples to the
  304.      satrt of the state buffer. This prepares the state buffer for the
  305.      next function call. */
  306.  
  307.   /* Points to the start of the pState buffer */
  308.   pStateCurnt = S->pState;
  309.  
  310.   /* Loop unrolling for (numTaps - 1U) samples copy */
  311.   tapCnt = (numTaps - 1U) >> 2U;
  312.  
  313.   /* copy data */
  314.   while (tapCnt > 0U)
  315.   {
  316.     *pStateCurnt++ = *pState++;
  317.     *pStateCurnt++ = *pState++;
  318.     *pStateCurnt++ = *pState++;
  319.     *pStateCurnt++ = *pState++;
  320.  
  321.     /* Decrement the loop counter */
  322.     tapCnt--;
  323.   }
  324.  
  325.   /* Calculate remaining number of copies */
  326.   tapCnt = (numTaps - 1U) % 0x4U;
  327.  
  328.   /* Copy the remaining q31_t data */
  329.   while (tapCnt > 0U)
  330.   {
  331.     *pStateCurnt++ = *pState++;
  332.  
  333.     /* Decrement the loop counter */
  334.     tapCnt--;
  335.   }
  336.  
  337. #else
  338.  
  339.   /* Run the below code for Cortex-M0 */
  340.  
  341.   while (blkCnt > 0U)
  342.   {
  343.     /* Copy the new input sample into the state buffer */
  344.     *pStateCurnt++ = *pSrc++;
  345.  
  346.     /* Initialize pState pointer */
  347.     px = pState;
  348.  
  349.     /* Initialize pCoeffs pointer */
  350.     pb = pCoeffs;
  351.  
  352.     /* Set the accumulator to zero */
  353.     sum = 0.0f;
  354.  
  355.     /* Loop over numTaps number of values */
  356.     tapCnt = numTaps;
  357.  
  358.     while (tapCnt > 0U)
  359.     {
  360.       /* Perform the multiply-accumulate */
  361.       sum += (*px++) * (*pb++);
  362.  
  363.       /* Decrement the loop counter */
  364.       tapCnt--;
  365.     }
  366.  
  367.     /* The result is stored in the destination buffer. */
  368.     *pOut++ = sum;
  369.  
  370.     /* Compute and store error */
  371.     d = (float32_t) (*pRef++);
  372.     e = d - sum;
  373.     *pErr++ = e;
  374.  
  375.     /* Weighting factor for the LMS version */
  376.     w = e * mu;
  377.  
  378.     /* Initialize pState pointer */
  379.     px = pState;
  380.  
  381.     /* Initialize pCoeffs pointer */
  382.     pb = pCoeffs;
  383.  
  384.     /* Loop over numTaps number of values */
  385.     tapCnt = numTaps;
  386.  
  387.     while (tapCnt > 0U)
  388.     {
  389.       /* Perform the multiply-accumulate */
  390.       *pb = *pb + (w * (*px++));
  391.       pb++;
  392.  
  393.       /* Decrement the loop counter */
  394.       tapCnt--;
  395.     }
  396.  
  397.     /* Advance state pointer by 1 for the next sample */
  398.     pState = pState + 1;
  399.  
  400.     /* Decrement the loop counter */
  401.     blkCnt--;
  402.   }
  403.  
  404.  
  405.   /* Processing is complete. Now copy the last numTaps - 1 samples to the
  406.    * start of the state buffer. This prepares the state buffer for the
  407.    * next function call. */
  408.  
  409.   /* Points to the start of the pState buffer */
  410.   pStateCurnt = S->pState;
  411.  
  412.   /*  Copy (numTaps - 1U) samples  */
  413.   tapCnt = (numTaps - 1U);
  414.  
  415.   /* Copy the data */
  416.   while (tapCnt > 0U)
  417.   {
  418.     *pStateCurnt++ = *pState++;
  419.  
  420.     /* Decrement the loop counter */
  421.     tapCnt--;
  422.   }
  423.  
  424. #endif /*   #if defined (ARM_MATH_DSP) */
  425.  
  426. }
  427.  
  428. /**
  429.    * @} end of LMS group
  430.    */
  431.