The general API is running tensor, usually, a batch of data is executed.
Suppose there are 100 pieces of data, each of which is a 10*10 color picture (so each piece of data is 3 pictures, respectively R, G, B), the size of the input data at this time is 100 × 10 ×10 × 3, the interpretation method batch × height × width × channel, this interpretation is very important, because, in addition to the length and width of the image, there are two more batch and channel nouns.
The term channel is very important. You can also interpret it as a dimension, but it is not exactly the same, because the amount of this thing after convolution calculation is related to the number of your kernel map settings, so it is important because Pulled a 1 × 1 convolution.
Not much nonsense, let’s look at a picture first. The following picture shows the general convolution calculation. This picture is for the reader to review the fusion between the convolution and the channel.
This example assumes that an image (R, G, B) is input, so there are 4 channels of 4×4 image input, and 2 kernel maps are set in the 1st convolution layer (Conv. 1), so the 1st convolution layer (Conv. 1) There will be 2 images output, and 3 kernel maps will be set in the 2nd convolution layer (Conv. 2), so the 2nd convolution layer (Conv. 2) will output 3 images.
So you can know that the number of channels after convolution is determined by the number of kernel maps you set.
This is mainly about what a 1×1 convolution is doing
If the single reading only looks at the 1×1 convolution, it just enlarges or shrinks the value in the input graph. It feels that there is no use of the fart. It is really useless, but the true focus of the 1×1 convolution is not convolution. this matter.
The biggest benefit of 1×1 convolution is to reduce the dimension or increase the dimension.
The dimension here is the channel mentioned above, (1×1 convolution is proposed in Network in the network, we don’t mention what it is doing in this article).
Generally speaking, the introduction of 1×1 convolutions is similar to this kind of diagram. In fact, people who are not very familiar are very unfriendly, and they see a push cube but they don’t understand.
The actual practice of 1×1 convolution (I think it is better to understand) is as follows:
So 1×1 convolution is actually a synthesis between channels.
Then according to the number of 1×1 kernel maps you set (that is, the number of channels output), then output an output of the same size but different channels.
Learn from the above example
The first 1×1 convolution is to increase the number of dimensions / increase the number of channels (the channel from 3 → 6)
Original picture (4×4×3) → Conv 1 (6 1×1 kernels) → Conv1 output picture (4×4×6):
The second 1×1 convolution is to reduce the number of dimensions / reduce the number of channels (the channel from 6 → 2)
Conv1 output picture (4×4×6) → Conv 2 (2 1×1 kernels) → Conv2 output picture (4×4×2)
You may not have any feelings in this example, but the actual application may have a channel number of 128. If you don’t want the model to continue to grow, you can use 32 1×1 convolutions to reduce the action to 32 channels. The learning of feature extraction was then carried out using a 3 x 3 convolution.