Use Convolutional Neural Network to Classify Images
Abstract: This tutorial will demonstrate how to use TensorLayerX's convolutional neural network to complete the image classification task. This is a relatively simple example that will use a network composed of three convolutional layers to complete the image classification task of the cifar10 dataset.
1. Environment Configuration
This tutorial is based on TensorLayerX 0.5.6. If your environment is not this version, please refer to the official website installation first.
TensorlayerX currently supports TensorFlow, Pytorch, PaddlePaddle, MindSpore as the computing backend. The method of specifying the computing backend is also very simple, just set the environment variable.
import os
os.environ['TL_BACKEND'] = 'paddle'
# os.environ['TL_BACKEND'] = 'tensorflow'
# os.environ['TL_BACKEND'] = 'mindspore'
# os.environ['TL_BACKEND'] = 'torch'
Import the required modules
import tensorlayerx as tlx
from tensorlayerx.nn import Module
from tensorlayerx.nn import (Conv2d, Linear, Flatten, MaxPool2d, BatchNorm2d)
from tensorlayerx.dataflow import Dataset, DataLoader
from tensorlayerx.vision.transforms import (
Compose, Resize, RandomFlipHorizontal, RandomContrast, RandomBrightness, StandardizePerImage, RandomCrop
)
2. Load Dataset
This case will use the API provided by TensorLayerX to complete the download of the dataset and prepare the data iterator for the subsequent training task. The cifar10 dataset consists of 60000 color images of size 32 * 32, of which 50000 images form the training set and the other 10000 images form the test set. These images are divided into 10 categories, and a model will be trained to classify the images correctly.
# prepare cifar10 data
X_train, y_train, X_test, y_test = tlx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3), plotable=False)
class CIFAR10Dataset(Dataset):
def __init__(self, data, label, transforms):
self.data = data
self.label = label
self.transforms = transforms
def __getitem__(self, idx):
x = self.data[idx].astype('uint8')
y = self.label[idx].astype('int64')
x = self.transforms(x)
return x, y
def __len__(self):
return len(self.label)
#Set data processing function
train_transforms = Compose(
[
RandomCrop(size=[24, 24]),
RandomFlipHorizontal(),
RandomBrightness(brightness_factor=(0.5, 1.5)),
RandomContrast(contrast_factor=(0.5, 1.5)),
StandardizePerImage()
]
)
test_transforms = Compose([Resize(size=(24, 24)), StandardizePerImage()])
#Build dataset and loader
train_dataset = CIFAR10Dataset(data=X_train, label=y_train, transforms=train_transforms)
test_dataset = CIFAR10Dataset(data=X_test, label=y_test, transforms=test_transforms)
3. Build Network
Next, use TensorLayerX to define a classification network composed of three two-dimensional convolution (Conv2D) and relu activation function after each convolution, two two-dimensional pooling layers (MaxPool2D), and two linear transformation layers to map a (32, 32, 3) shape image through the convolutional neural network to 10 outputs, which corresponds to 10 categories of classification.
class CNN(Module):
def __init__(self):
super(CNN, self).__init__()
# weights init
W_init = tlx.nn.initializers.truncated_normal(stddev=5e-2)
W_init2 = tlx.nn.initializers.truncated_normal(stddev=0.04)
b_init1 = tlx.nn.initializers.constant(value=0.1)
self.conv1 = Conv2d(32, (3, 3), (1, 1), padding='SAME', W_init=W_init, name='conv1', in_channels=3)
self.maxpool1 = MaxPool2d((2, 2), (2, 2), padding='SAME', name='pool1')
self.conv2 = Conv2d(
64, (3, 3), (1, 1), padding='SAME', act=tlx.nn.ReLU, W_init=W_init, b_init=b_init1, name='conv2', in_channels=32
)
self.conv3 = Conv2d(
64, (3, 3), (1, 1), padding='SAME', act=tlx.nn.ReLU, W_init=W_init, b_init=b_init1, name='conv3', in_channels=64
)
self.maxpool2 = MaxPool2d((2, 2), (2, 2), padding='SAME', name='pool2')
self.flatten = Flatten(name='flatten')
self.linear1 = Linear(1024, act=tlx.nn.ReLU, W_init=W_init2, b_init=b_init1, name='linear1relu', in_features=2304)
self.linear2 = Linear(10, act=None, W_init=W_init2, name='output', in_features=1024)
def forward(self, x):
z = self.conv1(x)
z = self.maxpool1(z)
z = self.conv2(z)
z = self.maxpool2(z)
z = self.flatten(z)
z = self.linear1(z)
z = self.linear2(z)
return z
# get the network
net = CNN()
Print model structure
[TL] Conv2d conv1: out_channels : 32 kernel_size: (3, 3) stride: (1, 1) pad: SAME act: No Activation
[TL] MaxPool2d pool1: kernel_size: (2, 2) stride: (2, 2) padding: SAME
[TL] Conv2d conv2: out_channels : 64 kernel_size: (3, 3) stride: (1, 1) pad: SAME act: ReLU
[TL] Conv2d conv3: out_channels : 64 kernel_size: (3, 3) stride: (1, 1) pad: SAME act: ReLU
[TL] MaxPool2d pool2: kernel_size: (2, 2) stride: (2, 2) padding: SAME
[TL] Flatten flatten:
[TL] Linear linear1relu: 1024 ReLU
[TL] Linear output: 10 No Activation
4. Model Training & Prediction
Next, use the Model high-level interface to quickly start training the model, which will:
Use the tlx.optimizers.Adam optimizer for optimization.
Use tlx.losses.softmax_cross_entropy_with_logits to calculate the loss value.
Use tensorlayerx.dataflow.DataLoader to load data and build batches.
Use the tlx.model.Model high-level model interface to build the model for training
#build batch data loader
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)
#build trainable model with high-level API
model = tlx.model.Model(network=net, loss_fn=loss_fn, optimizer=optimizer, metrics=metrics)
#train
model.train(n_epoch=n_epoch, train_dataset=train_loader, test_dataset=test_loader, print_freq=print_freq, print_train_batch=True)
Epoch 1 of 500 took 2.037001371383667
train loss: [2.4006615]
train acc: 0.0546875
Epoch 1 of 500 took 2.0650012493133545
train loss: [2.3827682]
train acc: 0.05859375
......
Epoch 30 of 500 took 11.347046375274658
train loss: [0.8383277]
train acc: 0.7114410166240409
val loss: Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=False,
[0.87068987])
val acc: 0.7016416139240507
The End
As shown in the example above, using a simple convolutional neural network on the cifar10 dataset, TensorLayerX can achieve an accuracy of more than 70%. You can also achieve better results by adjusting the network structure and parameters.