The image filtering process we will be using here is the type of image filtering that is primarily used in convolutional neural networks (CNNs).
Filter processing involves a "product-sum operation" in which the image data is multiplied by the filter value and then added together.
Furthermore, the filter is slid across the image while performing a product-sum operation on the entire image. This is called a convolution operation.
Filters that are expressed as two-dimensional matrices like this are also called kernels.
<Vertical and horizontal edge filters>
Below are filters that enhance vertical and horizontal edges.
■Padding
When the above filter processing is applied, the image data becomes smaller.
To prevent this, values are placed around the image data, which is called padding.
In particular, placing zeros around the image data is called zero padding.
There are also other padding techniques that place the same data around the edges of the image data.
■How to calculate image size after filtering and padding
as below.
■Implementation example using python
Below, we apply filtering to a 28x28 pixel image.
In CNN, the image filter coefficients are initially random values, and are updated toward the optimal value through learning.
Here, the coefficients are fixed and vertical and horizontal edge filters are used.
import numpy as np
import matplotlib.pyplot as plt
# Loading and preprocessing image data
train = np.loadtxt('mnist_fig.txt')
train = train/255 # Normalization
input_data = train.reshape(28,28) # Reshape to 28x28
img = np.pad(input_data, [(1, 1), (1, 1)], 'constant') # Zero padding
# Filtering processing
fil = np.array([[-2,1,1], [-2,1,1], [-2,1,1]]).reshape(9,1)# Vertical Edge
In case of horizontal edge:[[1,1,1],[1,1,1],[-2,-2,-2]]
col = np.zeros((3, 3, 28, 28))
for y in range(3):
y_max = y + 28
for x in range(3):
x_max = x + 28
col[y, x, :, :] = img[y:y_max, x:x_max]
<Source code explanation>
The top part of the program is where image data is read and padded.
The part that says "filter processing" is important, and it may be difficult to understand what the "for" part in particular is doing.
This is a necessary process for making calculations more efficient, but if you don't care about efficiency you can write code that is easier to understand.
Neural networks process a lot of data, so calculations overall take a very long time.
For that reason, there is a need for an efficient calculation method, even if it is a little difficult to understand.
To understand this, it is easier to see the results.
"col1" is as follows, and 9 numbers of items to filter are listed to make row calculations easier, and this program executes them all at once.
There are a total of 28x28=784 rows.
"fil_data" looks like this, and this is the part that does the filtering.