From: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
Type | Activations | Learnable | Stride/channel | Total learnable |
---|---|---|---|---|
Image input | 96 × 64 × 1 | – | – | 0 |
Convolution 2D (Conv) | 48 × 32 × 32 | Weights: 3 × 3 × 1 × 32 Bias: 1 × 1 × 32 | 32 3 × 3 × 1 convolutions Stride: [2 2] Padding: same | 320 |
Instance normalization | 48 × 32 × 32 | Offset: 1 × 1 × 32 Scale: 1 × 1 × 32 | 32 Channels | 64 |
ReLU | 48 × 32 × 32 | – | – | 0 |
Grouped convolution depthwise (GConv DW) | 48 × 32 × 32 | Weights: 3 × 3 × 1 × 1 × 32 Bias: 1 × 1 × 1 × 32 | 32 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 320 |
Instance normalization | 48 × 32 × 32 | Offset: 1 × 1 × 32 Scale: 1 × 1 × 32 | – | 64 |
ReLU | 48 × 32 × 32 | – | – | 0 |
Conv | 48 × 32 × 64 | Weights: 1 × 1 × 32 × 64 Bias: 1 × 1 × 64 | 64 1 × 1 × 32 convolutions Stride: [1 1] Padding: same | 2112 128 0 |
GConv DW | 24 × 16 × 64 | Weights: 3 × 3 × 1 × 1 × 64 Bias: 1 × 1 × 1 × 64 | 64 groups of 1 33 × 1 Convolutions Stride: [2 2] Padding: same | 640 128 0 |
Conv | 24 × 16 × 128 | Weights: 1 × 1 × 64 × 128 Bias: 1 × 1 × 128 | 128 1 × 1 × 64 Convolutions Stride: [1 1] Padding: same | 8320 256 0 |
GConv DW | 24 × 16 × 128 | Weights: 3 × 3 × 1 × 1 × 128 Bias: 1 × 1 × 1 × 128 | 128 groups of 1 3 × 3 × 1 Convolutions Stride: [1 1] Padding: same | 1280 256 0 |
Conv | 24 × 16 × 128 | Weights: 1 × 1 × 128 × 128 Bias: 1 × 1 × 128 | 128 1 × 1 × 128 Convolutions Stride: [1 1] Padding: same | 16,512 256 0 |
GConv DW | 12 × 8 × 128 | Weights: 3 × 3 × 1 × 1 × 128 Bias: 1 × 1 × 1 × 128 | 128 groups of 1 3 × 3 × 1 Convolutions Stride: [2 2] Padding: same | 1280 256 0 |
Conv | 12 × 8 × 256 | Weights: 1 × 1 × 128 × 256 Bias: 1 × 1 × 256 | 256 1 × 1 × 128 Convolutions Stride: [1 1] Padding: same | 33,024 512 0 |
GConv DW | 12 × 8 × 256 | Weights: 3 × 3 × 1 × 1 × 256 Bias: 1 × 1 × 1 × 256 | 256 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 2560 512 0 |
Conv | 12 × 8 × 256 | Weights: 1 × 1 × 256 × 256 Bias: 1 × 1 × 256 | 256 1 × 1 × 256 Convolutions Stride: [1 1] Padding: same | 65,972 512 0 |
GConv DW | 6 × 4 × 256 | Weights: 3 × 3 × 1 × 1 × 256 Bias: 1 × 1 × 1 × 256 | 256 groups of 1 3 × 3 × 1 Convolutions Stride: [2 2] Padding: same | 2560 512 0 |
Conv | 6 × 4 × 512 | Weights: 1 × 1 × 256 × 512 Bias: 1 × 1 × 512 | 512 1 × 1 × 256 Convolutions Stride: [1 1] Padding: same | 131,584 1024 0 |
GConv DW | 6 × 4 × 512 | Weights: 3 × 3 × 1 × 1 × 512 Bias: 1 × 1 × 1 × 512 | 512 Groups Of 1 3 × 3 × 1 Convolutions Stride: [1 1] Padding: same | 5120 1024 0 |
Conv | 6 × 4 × 512 | Weights: 1 × 1 × 512 × 512 Bias: 1 × 1 × 512 | 512 1 × 1 × 512 convolutions Stride: [1 1] Padding: same | 262,656 1024 0 |
GConv DW | 6 × 4 × 512 | Weights: 3 × 3 × 1 × 1 × 512 Bias: 1 × 1 × 1 × 512 | 512 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 5120 1024 0 |
Conv | 6 × 4 × 512 | Weights: 1 × 1 × 512 × 512 Bias: 1 × 1 × 512 | 512 1 × 1 × 512 convolutions Stride: [1 1] Padding: same | 262,656 1024 0 |
GConv DW | 6 × 4 × 512 | Weights: 3 × 3 × 1 × 1 × 512 Bias: 1 × 1 × 1 × 512 | 512 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 5120 1024 0 |
Conv | 6 × 4 × 512 | Weights: 1 × 1 × 512 × 512 Bias: 1 × 1 × 512 | 512 1 × 1 × 512 convolutions Stride: [1 1] Padding: same | 262,656 1024 0 |
GConv DW | 6 × 4 × 512 | Weights: 3 × 3 × 1 × 1 × 512 Bias: 1 × 1 × 1 × 512 | 512 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 5120 1024 0 |
Conv | 6 × 4 × 512 | Weights: 1 × 1 × 512 × 512 Bias: 1 × 1 × 512 | 512 1 × 1 × 512 convolutions Stride: [1 1] Padding: same | 262,656 1024 0 |
GConv DW | 6 × 4 × 512 | Weights: 3 × 3 × 1 × 1 × 512 Bias: 1 × 1 × 1 × 512 | 512 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 5120 1024 0 |
Conv | 6 × 4 × 512 | Weights: 1 × 1 × 512 × 512 Bias: 1 × 1 × 512 | 512 1 × 1 × 512 convolutions Stride: [1 1] Padding: same | 262,656 1024 0 |
GConv DW | 3 × 2 × 512 | Weights: 3 × 3 × 1 × 1 × 512 Bias: 1 × 1 × 1 × 512 | 512 groups of 1 3 × 3 × 1 convolutions Stride: [2 2] Padding: same | 5120 1024 0 |
Conv | 3 × 2 × 1024 | Weights: 1 × 1 × 512 ×  × 1024 Bias: 1 × 1 × 1024 | 1024 1 × 1 × 512 convolutions Stride: [1 1] Padding: same | 525,312 2048 0 |
GConv DW | 3 × 2 × 1024 | Weights: 3 × 3 × 1 × 1 × 1024 Bias: 1 × 1 × 1 × 1024 | 1024 groups of 1 3 × 3 × 1 convolutions Stride: [1 1] Padding: same | 10,240 2048 0 |
Conv | 3 × 2 × 1024 | Weights: 1 × 1 × 1024 × 1024 Bias: 1 × 1 × 1024 | 1024 1 × 1 × 1024 convolutions Stride: [1 1] Padding: same | 1,049,600 2048 0 |
Conv | 3 × 2 × 1024 | Weights: 1 × 1 × 1024 × 1024 Bias: 1 × 1 × 1024 | 1024 1 × 1 × 1024 convolutions Stride: [1 1] Padding: same | 1,049,600 2048 0 |
Avg. Pooling | 1 × 1 × 1024 | – | – | 0 |
FC Layer | 1 × 1 × 2 | Weights: 2 × 1024 Bias: 2 × 1 | – | 2040 |
Softmax | 1 × 1 × 2 | – | Binary classifier |  |