(l-onnx-doc-CausalConvWithState)=

# CausalConvWithState


(l-onnx-op-causalconvwithstate-27)=

## CausalConvWithState - 27

### Version

- **name**: [CausalConvWithState (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#CausalConvWithState)
- **domain**: `main`
- **since_version**: `27`
- **function**: `True`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 27**.

### Summary

Stateful causal 1D depthwise convolution.

Used by Gated DeltaNet (Qwen3.5) and Mamba (Jamba, FalconMamba) as a preprocessing step.
Replaces the 3-op pattern (Concat + Conv + Slice) with a single fused operation.

The convolution is causal (looks only at current and past positions) and depthwise
(each channel is convolved independently with its own kernel).

The input, weight, past_state, output, and present_state tensors are rank-3 with
shape (batch_size, channels, length). The optional bias input is rank-1 with
shape (channels). For higher-dimensional data, use Reshape nodes before and
after this operator to pack extra dimensions into the batch or channel axis.

Weight layout: (channels, 1, k) for depthwise convolution.
The carry state stores the last (k-1) positions for incremental decode.

The optional activation attribute supports fused SiLU/Swish activation.

### Attributes

* **activation - STRING** (default is `none`):

  Fused activation function. One of: &#39;silu&#39;, &#39;swish&#39;, &#39;none&#39;. Default is &#39;none&#39;.

### Inputs

Between 2 and 4 inputs.

- **input** (heterogeneous) - **T**:

  Input tensor with shape (batch_size, channels, length). Channels-first layout.
- **weight** (heterogeneous) - **T**:

  Depthwise convolution kernel with shape (channels, 1, k) where k is the kernel size. The middle dim of size 1 follows the ONNX `Conv` weight layout `(M, C/group, k1, ..., kn)`: since this op is always depthwise, `group = channels`, so `C/group = 1`. Keeping this layout makes the weight tensor a drop-in for a depthwise `Conv(group=channels)` weight, so `Conv` &lt;-&gt; `CausalConvWithState` rewrites require no reshape.
- **bias** (optional, heterogeneous) - **T**:

  Optional per-channel bias with shape (channels).
- **past_state** (optional, heterogeneous) - **T**:

  Carry state from previous step with shape (batch_size, channels, k - 1). If not provided, padding is zero.

### Outputs

- **output** (heterogeneous) - **T**:

  Convolution output with same shape as input.
- **present_state** (heterogeneous) - **T**:

  Updated carry state with shape (batch_size, channels, k - 1). Contains the last (k - 1) values of the effective padded/concatenated sequence along the causal axis, including any values from past_state or zero-padding when the current input is shorter than k - 1.

### Type Constraints

* **T** in ( `tensor(bfloat16)`, `tensor(float)`, `tensor(float16)` ):

  Constrain input and output types to float tensors.