Convolution

What is Convolution?

Convolution is a mathematical operation that is used to express the relation between input and output of an LTI system. It is a mathematical way of combining two signals to form a third signal.

It is the single most important technique in Digital Signal Processing, used to filter images, perform edge detection, perform blurring, and much more.

Convolution is a mathematical operation on two signals, \(f\) and \(h\), to produce a third signal, \(g\). It is defined as the integral of the product of the two functions after one is reversed and shifted. The integral is evaluated for all values of shift, producing the convolution function \(g\).

\[g(x) = (f \ast h)(x) = \int_{-\infty}^{\infty} f(\tau) h(x - \tau) d\tau = \int_{-\infty}^{\infty} f(x - \tau) h(\tau) d\tau\]

Convolution between an image and a kernel is performed by sliding the kernel over the image, multiplying the kernel’s values with the overlapping image pixel values, and summing them up.

\[(w \star f)(x, y) = \sum_{i=-a}^{a} \sum_{j=-b}^{b} w(i, j) f(x-i, y-j)\]

Properties of Convolution

Commutative

Replace \(x-\tau\) with \(t\), then \(\tau = x - t\), and \(d\tau = -dt\).

\[\begin{aligned} g(x) &= (f \ast h)(x) \\ &= \int_{-\infty}^{\infty} f(\tau) h(x - \tau) d\tau \\ &= \int_{\infty}^{-\infty} f(x - t) h(t) (-dt) \\ &= \int_{-\infty}^{\infty} f(x - t) h(t) dt \\ &= (h \ast f)(x) \\ \end{aligned}\]

Distributive over Addition

\[\begin{aligned} g(x) &= (f \ast (h_1 + h_2))(x) \\ &= \int_{-\infty}^{\infty} f(\tau) (h_1 + h_2)(x - \tau) d\tau \\ &= \int_{-\infty}^{\infty} f(\tau) (h_1(x - \tau) + h_2(x - \tau)) d\tau \\ &= \int_{-\infty}^{\infty} f(\tau) h_1(x - \tau) d\tau + \int_{-\infty}^{\infty} f(\tau) h_2(x - \tau) d\tau \\ &= (f \ast h_1)(x) + (f \ast h_2)(x) \\ \end{aligned}\]

Bilinear

If \(\alpha \in \mathbb{C}\), then

\[\begin{aligned} \alpha (f \ast h)(x) &= \alpha \int_{-\infty}^{\infty} f(\tau) h(x - \tau) d\tau \\ &= \int_{-\infty}^{\infty} \alpha f(\tau) h(x - \tau) d\tau \\ &= (\alpha f \ast h)(x) \\ &= \int_{-\infty}^{\infty} f(\tau) (\alpha h)(x - \tau) d\tau \\ &= (f \ast (\alpha h))(x) \end{aligned}\]

Linear

\[\begin{aligned} (f\ast (\alpha h_1 + \beta h_2))(x) &= (f \ast (\alpha h_1))(x) + (f \ast (\beta h_2))(x) \\ &= \alpha (f \ast h_1)(x) + \beta (f \ast h_2)(x) \\ \end{aligned}\]

Associative

Let \(\theta+\tau=t\), then \(\theta=t-\tau\), and \(d\theta=dt\).

\[\begin{aligned} (f\ast (h_1 \ast h_2))(x) &= \int_{-\infty}^{\infty} f(\tau) (h_1 \ast h_2)(x - \tau) d\tau \\ &= \int_{-\infty}^{\infty} f(\tau) \int_{-\infty}^{\infty} h_1(\theta) h_2(x - \tau - \theta) d\theta d\tau \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(\tau) h_1(\theta) h_2(x - \tau - \theta) d\theta d\tau \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(\tau) h_1(t-\tau) h_2(x - t) dt d\tau \\ &= \int_{-\infty}^{\infty} h_2(x - t) \int_{-\infty}^{\infty} f(\tau) h_1(t-\tau) d\tau dt \\ &= \int_{-\infty}^{\infty} h_2(x - t) (f \ast h_1)(t) dt \\ &= ((f \ast h_1) \ast h_2)(x) \\ \end{aligned}\]

Shift Invariant

\[\begin{aligned} g(x)&=S\left\{f(x)\right\} \\ S\left\{f(x-x_0)\right\}&=f(x-x_0)\ast h(x)\\ &=g(x-x_0)\\ \end{aligned}\]

Impulse Response

\[\begin{aligned} f(x) &= (f \ast \delta)(x) \\ &= \int_{-\infty}^{\infty} f(\tau) \delta(x - \tau) d\tau \\ g(x)&= S\left\{f(x)\right\} \\ &= S\left\{\int_{-\infty}^{\infty} f(\tau) \delta(x - \tau) d\tau\right\} \\ &= \int_{-\infty}^{\infty} f(\tau) S\left\{\delta(x - \tau)\right\} d\tau \\ &=(f \ast S\left\{\delta\right\})(x) \\ \end{aligned}\]

Thus, given the impulse response \(h\) of a system, the output \(g\) of the system is the convolution of the input \(f\) and the impulse response \(h\).

Separable Convolution

A separable convolution is a convolution that can be factored into one-dimensional convolutions, kernel \(h\) can be written as the convolution of two 1D kernels \(h_1\) and \(h_2\).

\[\underbrace{ \begin{bmatrix} \alpha a&\alpha b&\alpha c\\ \beta a&\beta b&\beta c\\ \gamma a&\gamma b&\gamma c\\ \end{bmatrix} }_{h}= \underbrace{ \begin{bmatrix} \alpha\\ \beta\\ \gamma\\ \end{bmatrix} }_{h_1}\ast \underbrace{ \begin{bmatrix} a&b&c\\ \end{bmatrix} }_{h_2}\]

Thus, the convolution between \(f\) and \(h\) can be calculated by first convolving \(f\) with \(h_1\), and then convolving the result with \(h_2\).

\[f \ast h = f \ast (h_1 \ast h_2) = (f \ast h_1) \ast h_2\]

Convolution in Image Processing

In image processing, convolution is performed by sliding a kernel over the image, multiplying the kernel’s values with the overlapping image pixel values, and summing them up.

\[(w \star f)(x, y) = \sum_{i=-a}^{a} \sum_{j=-b}^{b} w(i, j) f(x-i, y-j)\]

where \(w\) is the kernel, \(f\) is the image, and \(a\) and \(b\) are the kernel’s half-width and half-height, respectively.

Kernel

A kernel is a small matrix used in image processing to perform operations such as blurring, sharpening, edge detection, and more. The kernel is usually a square matrix with odd dimensions. The kernel is also known as a filter.

Padding

The output of the convolution operation is a matrix that is smaller than the input image. This is because the kernel cannot be centered on pixels at the edges of the image. To solve this problem, the image is padded with some pixels around the edges. The common padding methods are:

  • Zero Padding: The image is padded with zeros around the edges.
  • Replicate Padding: The pixels at the edges of the image are replicated to pad the image.
  • Symmetric Padding: The pixels at the edges of the image are mirrored to pad the image.

Stride

The stride is the number of pixels by which the kernel is shifted when it is moved over the image. The stride is usually set to 1, but it can be set to any value. A larger stride will result in a smaller output image.

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np

# gaussian kernel
def gaussian_kernel(size, sigma=1):
    """
    Generates a Gaussian kernel of size size x size and standard deviation sigma.
    """
    size = int(size) // 2
    x, y = np.mgrid[-size:size+1, -size:size+1]
    normal = 1 / (2.0 * np.pi * sigma**2)
    g =  np.exp(-((x**2 + y**2) / (2.0*sigma**2))) * normal
    return g

# convolution
def convolution(image, kernel, stride=1, padding=None):
  """
  Performs convolution between an image and a kernel.
  : param image: input image, a 2D numpy array
  : param kernel: kernel, a 2D numpy array
  : param stride: stride, default 1
  : param padding: padding method, default None, options: "zero", "replicate", "symmetric", "circular"
  : return: convolved image, a 2D numpy array
  """
  m, n = kernel.shape # kernel dimensions
  M, N = image.shape # image dimensions
  image_copy = image.copy() # copy of image
  assert m % 2 == 1 and n % 2 == 1, "Kernel dimensions must be odd"
  assert padding in ["zero", "replicate", "symmetric", "Circular", None], "Invalid padding"


  a = (m - 1) // 2 # half-width
  b = (n - 1) // 2 # half-height

  kernel = kernel[::-1, ::-1] # flip kernel

  if padding is None: # no padding
    res = image.copy()
    for i in range(a, M-a, stride): # loop over rows
      for j in range(b, N-b, stride): # loop over columns
        image_copy[i, j] = np.sum(res[i-a:i+a+1, j-b:j+b+1] * kernel) # convolution
    return image_copy[a:-a, b:-b] # return image without padding
  else:
    res = np.zeros((M + 2*a, N + 2*b)) # padded image
    res[a:-a, b:-b] = image # copy image to padded image
    if padding == "zero":
      pass
    elif padding == "replicate": # replicate padding
      res[:a, b:-b] = image[0]
      res[-a:, b:-b] = image[-1]
      res[:, :b] = res[:, b:2*b][:, ::-1]
      res[:, -b:] = res[:, -2*b:-b][:, ::-1]
    elif padding == "symmetric": # symmetric padding
      res[:a, b:-b] = image[a:0:-1]
      res[-a:, b:-b] = image[-2:-a-2:-1]
      res[:, :b] = res[:, b:2*b][:, ::-1]
      res[:, -b:] = res[:, -2*b:-b][:, ::-1]
    elif padding == "circular": # circular padding
      res[:a, b:-b] = image[-a:]
      res[-a:, b:-b] = image[:a]
      res[:, :b] = image[:, -b:]
      res[:, -b:] = image[:, :b]
    else:
      raise ValueError("Invalid padding")

  for i in range(a, M+a, stride): # loop over rows
    for j in range(b, N+b, stride): # loop over columns
      image_copy[i-a, j-b] = np.sum(res[i-a:i+a+1, j-b:j+b+1] * kernel) # convolution

  return image_copy