top of page

MicroZed Chronicles: Using a DSP48E2 as a Multiplexer

I am a regular reader of many FPGA notice boards. A few days ago, I saw a question about how the DSP48E2 could be used as a multiplexer. The question arose because the developer was running low on logic resources while the DSP elements were unused.

I had come across the Hoplite Network on Chip a few years ago and a version of this also used the DSP48 elements as multiplexers. The Hoplite-DSP version used the DSP48s as a mux to return logic resources to the FPGA designers.

The DSP48E2 is a very versatile feature. In our programmable logic, we mainly we use it to implement mathematical algorithms like filters, FFT, and so on.

Looking at the architecture of the DSP48, however, there are several multiplexors that can be used to switch the data that is fed into the ALU.

We can multiplex a signal by controlling the setting of the X and Y multiplexor and by setting the correct mode for the ALU.

We can do this by configuring the ALU to perform an addition by selecting the input we require from the X or Y mux while setting the other mux to a constant zero. As a result, we are using the addition of 0 to the desired signal to perform the multiplexing.

We can multiplex between signals on A:B and C within the DSP48. This enables multiplexing of 48 bits of data. Of course, inputs A is 30 bits and B is 18 bits. This is combined into signal A:B after the dual A and B registers.

Signal A:B is fed into to mux X, while signal C is fed into mux Z and Y. All multiplexors W, X, Y, and Z have an input that can be selected which is all zeros.

To perform the multiplexing, we can configure the following equations using inmode, opmode, and alumode commands.

P = A:B + 0

P = C + 0

To demonstrate this, I created a simple example in Vivado using the DSP48 template from the language templates. I configured this DSP template so that I could control the opmode to switch between inputs A:B and C.

The code can be seen below. At the top level, however, the DSP mux offers the user two 48-bit ports A, C, and a select signal. Internally the A signal is routed to DSP ports A and B while port C is connected to the DSP port C.

Library ieee;
use ieee.std_logic_1164.all;

Library UNISIM;
use UNISIM.vcomponents.all;

entity dspmux is port(
    clk : in std_logic;
    rst : in std_logic;
    a   : in std_logic_vector(47 downto 0);
    c   : in std_logic_vector(47 downto 0);
    sel : in std_logic;
    op  : out std_logic_vector(47 downto 0));
end entity;

architecture rtl of dspmux is

signal ain : std_logic_vector(29 downto 0); 
signal bin : std_logic_vector(17 downto 0); 
signal cin : std_logic_vector(47 downto 0); 
signal ALUMODE : std_logic_vector(3 downto 0);
signal INMODE : std_logic_vector (4 downto 0);
signal OPMODE : std_logic_vector(8 downto 0);

INMODE <= (others =>'0');
ALUMODE <= (others =>'0');
ain <= a(47 downto 18);
bin <= a(17 downto 0);   
cin <= c;

    if sel = '0' then
       OPMODE <= "000000011";
       OPMODE <= "000001100";
    end if;
end process;
   DSP48E2_inst : DSP48E2
   generic map (
      -- Feature Control Attributes: Data Path Selection
      AMULTSEL => "A",            -- Selects A input to multiplier (A, AD)
      A_INPUT => "DIRECT",        -- Selects A input source, 
      BMULTSEL => "B",            -- Selects B input to multiplier (AD, B)
      B_INPUT => "DIRECT",        -- Selects B input source, 
      PREADDINSEL => "A",         -- Selects input to pre-adder (A, B)
      RND => X"000000000000",     -- Rounding Constant
      USE_MULT => "NONE",         -- Select multiplier usage 
      USE_SIMD => "ONE48",        -- SIMD selection (FOUR12, ONE48, TWO24)
      USE_WIDEXOR => "FALSE",     -- Use the Wide XOR function
      XORSIMD => "XOR24_48_96",   -- Mode of operation for the Wide XOR 
      -- Pattern Detector Attributes: Pattern Detection Configuration
      AUTORESET_PRIORITY => "RESET",   -- Priority of AUTORESET vs. CEP 
      MASK => X"3fffffffffff",         -- 48-bit mask value for pattern 
      PATTERN => X"000000000000",      -- 48-bit pattern match for 
      SEL_MASK => "MASK",              -- C, MASK, 
      SEL_PATTERN => "PATTERN",        -- Select pattern value 
      USE_PATTERN_DETECT => "NO_PATDET", -- Enable pattern detect 
      -- Programmable Inversion Attributes: Specifies built-in 
      programmable inversion on specific pins
      IS_ALUMODE_INVERTED => "0000",     -- Optional inversion for ALUMODE
      IS_CARRYIN_INVERTED => '0',        -- Optional inversion for CARRYIN
      IS_CLK_INVERTED => '0',            -- Optional inversion for CLK
      IS_INMODE_INVERTED => "00000",     -- Optional inversion for INMODE
      IS_OPMODE_INVERTED => "000000000", -- Optional inversion for OPMODE
      IS_RSTALUMODE_INVERTED => '0',     -- Optional inversion RSTALU
      IS_RSTA_INVERTED => '0',           -- Optional inversion for RSTA
      IS_RSTB_INVERTED => '0',           -- Optional inversion for RSTB
      IS_RSTCTRL_INVERTED => '0',        -- Optional inversion for RSTCTRL
      IS_RSTC_INVERTED => '0',           -- Optional inversion for RSTC
      IS_RSTD_INVERTED => '0',           -- Optional inversion for RSTD
      IS_RSTINMODE_INVERTED => '0',      -- Optional inversion for 
      IS_RSTM_INVERTED => '0',           -- Optional inversion for RSTM
      IS_RSTP_INVERTED => '0',           -- Optional inversion for RSTP
      -- Register Control Attributes: Pipeline Register Configuration
      ACASCREG => 1,                     -- Number of pipeline stages(0-2)
      ADREG => 1,                        -- Pipeline stages for pre-adder 
      ALUMODEREG => 1,                   -- Pipeline stages for ALUMODE 
      AREG => 1,                         -- Pipeline stages for A (0-2)
      BCASCREG => 1,                     -- Number of pipeline stages(0-2)
      BREG => 1,                         -- Pipeline stages for B (0-2)
      CARRYINREG => 1,                   -- Pipeline stages for CARRYIN 
      CARRYINSELREG => 1,                -- Pipeline stages for CARRYINSEL 
      CREG => 1,                         -- Pipeline stages for C (0-1)
      DREG => 1,                         -- Pipeline stages for D (0-1)
      INMODEREG => 1,                    -- Pipeline stages for INMODE
      MREG => 1,                         -- Multiplier pipeline stages
      OPMODEREG => 1,                    -- Pipeline stages for OPMODE 
      PREG => 1                          -- Number of pipeline stages P 
   port map (
      -- Cascade outputs: Cascade Ports
      ACOUT => open,           -- 30-bit output: A port cascade
      BCOUT => open,           -- 18-bit output: B cascade
      CARRYCASCOUT => open,    -- 1-bit output: Cascade carry
      MULTSIGNOUT => open,     -- 1-bit output: Multiplier sign cascade
      PCOUT => open,           -- 48-bit output: Cascade output
      -- Control outputs: Control Inputs/Status Bits
      OVERFLOW => open,        -- 1-bit output: Overflow in add/acc
      PATTERNBDETECT => open,  -- 1-bit output: Pattern bar detect
      PATTERNDETECT => open,   -- 1-bit output: Pattern detect
      UNDERFLOW => open,       -- 1-bit output: Underflow in add/acc
      -- Data outputs: Data Ports
      CARRYOUT => open,         -- 4-bit output: Carry
      P => op,                  -- 48-bit output: Primary data
      XOROUT => open,           -- 8-bit output: XOR data
      -- Cascade inputs: Cascade Ports
      ACIN => (others =>'0'),   -- 30-bit input: A cascade data
      BCIN => (others =>'0'),   -- 18-bit input: B cascade
      CARRYCASCIN => '0',       -- 1-bit input: Cascade carry
      MULTSIGNIN => '0',        -- 1-bit input: Multiplier sign cascade
      PCIN => (others =>'0'),   -- 48-bit input: P cascade
      -- Control inputs: Control Inputs/Status Bits
      ALUMODE => ALUMODE,           -- 4-bit input: ALU control
      CARRYINSEL => (others =>'0'), -- 3-bit input: Carry select
      CLK => CLK,                   -- 1-bit input: Clock
      INMODE => INMODE,             -- 5-bit input: INMODE control
      OPMODE => OPMODE,             -- 9-bit input: Operation mode
      -- Data inputs: Data Ports
      A => AIN,                     -- 30-bit input: A data
      B => BIN,                     -- 18-bit input: B data
      C => CIN,                     -- 48-bit input: C data
      CARRYIN => '0',               -- 1-bit input: Carry-in
      D => (others =>'0'),          -- 27-bit input: D data
       -- Reset/Clock Enable inputs: Reset/Clock Enable Inputs
      CEA1 => '1',        -- 1-bit input: Clock enable for 1st stage AREG
      CEA2 => '1',        -- 1-bit input: Clock enable for 2nd stage AREG
      CEAD => '1',        -- 1-bit input: Clock enable for ADREG
      CEALUMODE => '1',   -- 1-bit input: Clock enable for ALUMODE
      CEB1 => '1',        -- 1-bit input: Clock enable for 1st stage BREG
      CEB2 => '1',         -- 1-bit input: Clock enable for 2nd stage BREG
      CEC => '1',          -- 1-bit input: Clock enable for CREG
      CECARRYIN => '1',    -- 1-bit input: Clock enable for CARRYINREG
      CECTRL => '1',       -- 1-bit input: Clock enable for OPMODEREG 
      CED => '1',          -- 1-bit input: Clock enable for DREG
      CEINMODE => '1',     -- 1-bit input: Clock enable for INMODEREG
      CEM => '1',          -- 1-bit input: Clock enable for MREG
      CEP => '1',          -- 1-bit input: Clock enable for PREG
      RSTA => rst,         -- 1-bit input: Reset for AREG
      RSTALLCARRYIN => rst,-- 1-bit input: Reset for CARRYINREG
      RSTALUMODE => rst,   -- 1-bit input: Reset for ALUMODEREG
      RSTB => rst,         -- 1-bit input: Reset for BREG
      RSTC => rst,         -- 1-bit input: Reset for CREG
      RSTCTRL => rst,      -- 1-bit input: Reset for OPMODEREG
      RSTD => rst,         -- 1-bit input: Reset for DREG and ADREG
      RSTINMODE => rst,    -- 1-bit input: Reset for INMODEREG
      RSTM => rst,         -- 1-bit input: Reset for MREG
      RSTP => rst          -- 1-bit input: Reset for PREG

end architecture;

Depending upon the state of the select signal, the op code is changed to select the correct channels on the X and Y multiplexor.

To output A:B which is connected to the X mux, we need to set opcode bits[1:0] to 11 and ensure all other multiplexors to output zero.

Similarly the same approach is taken for C which is connected to the Y mux. Its opmode[3:2] is set to 11 and all other multiplexors are set to output zero.

Running this in a simple simulation provides the results below where you can clearly see the output switching between the A and C inputs to the module.

Of course, implementing multiplexing in this way is not something we would do every day and would be done only in specific cases. It is a viable tool in the FPGA developer toolbox though, so I thought it would make for an interesting blog.

When considering implementations which use this approach, we also need to consider the width of the vector being multiplexed and routing penalties that apply to entering and leaving the DSP48E2 element. We can, however, always use techniques such as hand placement etc. to extract the best possible performance.



bottom of page