Contact Information
  • Telephone:13693115325
  • wechat:liuyiliang100
  • Email:quantumliu@pku.edu.cn
加入开发者微信群
加入开发者微信群

Use Convolutional Neural Network to Classify Images

Abstract: This tutorial will demonstrate how to use TensorLayerX's convolutional neural network to complete the image classification task. This is a relatively simple example that will use a network composed of three convolutional layers to complete the image classification task of the cifar10 dataset.

1. Environment Configuration

This tutorial is based on TensorLayerX 0.5.6. If your environment is not this version, please refer to the official website installation first.
TensorlayerX currently supports TensorFlow, Pytorch, PaddlePaddle, MindSpore as the computing backend. The method of specifying the computing backend is also very simple, just set the environment variable.

import os

os.environ['TL_BACKEND'] = 'paddle'

# os.environ['TL_BACKEND'] = 'tensorflow'

# os.environ['TL_BACKEND'] = 'mindspore'

# os.environ['TL_BACKEND'] = 'torch'

Import the required modules

import tensorlayerx as tlx

from tensorlayerx.nn import Module

from tensorlayerx.nn import (Conv2d, Linear, Flatten, MaxPool2d, BatchNorm2d)

 

from tensorlayerx.dataflow import Dataset, DataLoader

from tensorlayerx.vision.transforms import (

    Compose, Resize, RandomFlipHorizontal, RandomContrast, RandomBrightness, StandardizePerImage, RandomCrop

)

2. Load Dataset

This case will use the API provided by TensorLayerX to complete the download of the dataset and prepare the data iterator for the subsequent training task. The cifar10 dataset consists of 60000 color images of size 32 * 32, of which 50000 images form the training set and the other 10000 images form the test set. These images are divided into 10 categories, and a model will be trained to classify the images correctly.

# prepare cifar10 data

X_train, y_train, X_test, y_test = tlx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3), plotable=False)

 

class CIFAR10Dataset(Dataset):

 

    def __init__(self, data, label, transforms):

        self.data = data

        self.label = label

        self.transforms = transforms

 

    def __getitem__(self, idx):

        x = self.data[idx].astype('uint8')

        y = self.label[idx].astype('int64')

        x = self.transforms(x)

 

        return x, y

 

    def __len__(self):

 

        return len(self.label)

 

#Set data processing function

train_transforms = Compose(

    [

        RandomCrop(size=[24, 24]),

        RandomFlipHorizontal(),

        RandomBrightness(brightness_factor=(0.5, 1.5)),

        RandomContrast(contrast_factor=(0.5, 1.5)),

        StandardizePerImage()

    ]

)

 

test_transforms = Compose([Resize(size=(24, 24)), StandardizePerImage()])

 

#Build dataset and loader

train_dataset = CIFAR10Dataset(data=X_train, label=y_train, transforms=train_transforms)

test_dataset = CIFAR10Dataset(data=X_test, label=y_test, transforms=test_transforms)

 

3. Build Network

Next, use TensorLayerX to define a classification network composed of three two-dimensional convolution (Conv2D) and relu activation function after each convolution, two two-dimensional pooling layers (MaxPool2D), and two linear transformation layers to map a (32, 32, 3) shape image through the convolutional neural network to 10 outputs, which corresponds to 10 categories of classification.

class CNN(Module):

 

    def __init__(self):

        super(CNN, self).__init__()

        # weights init

        W_init = tlx.nn.initializers.truncated_normal(stddev=5e-2)

        W_init2 = tlx.nn.initializers.truncated_normal(stddev=0.04)

        b_init1 = tlx.nn.initializers.constant(value=0.1)

 

        self.conv1 = Conv2d(32, (3, 3), (1, 1), padding='SAME', W_init=W_init, name='conv1', in_channels=3)

        self.maxpool1 = MaxPool2d((2, 2), (2, 2), padding='SAME', name='pool1')

 

        self.conv2 = Conv2d(

            64, (3, 3), (1, 1), padding='SAME', act=tlx.nn.ReLU, W_init=W_init, b_init=b_init1, name='conv2', in_channels=32

        )

 

        self.conv3 = Conv2d(

            64, (3, 3), (1, 1), padding='SAME', act=tlx.nn.ReLU, W_init=W_init, b_init=b_init1, name='conv3', in_channels=64

        )

        self.maxpool2 = MaxPool2d((2, 2), (2, 2), padding='SAME', name='pool2')

 

        self.flatten = Flatten(name='flatten')

        self.linear1 = Linear(1024, act=tlx.nn.ReLU, W_init=W_init2, b_init=b_init1, name='linear1relu', in_features=2304)

        self.linear2 = Linear(10, act=None, W_init=W_init2, name='output', in_features=1024)

 

    def forward(self, x):

        z = self.conv1(x)

        z = self.maxpool1(z)

        z = self.conv2(z)

        z = self.maxpool2(z)

        z = self.flatten(z)

        z = self.linear1(z)

        z = self.linear2(z)

        return z

 

 

# get the network

net = CNN()

Print model structure

[TL] Conv2d conv1: out_channels : 32 kernel_size: (3, 3) stride: (1, 1) pad: SAME act: No Activation

[TL] MaxPool2d pool1: kernel_size: (2, 2) stride: (2, 2) padding: SAME

[TL] Conv2d conv2: out_channels : 64 kernel_size: (3, 3) stride: (1, 1) pad: SAME act: ReLU

[TL] Conv2d conv3: out_channels : 64 kernel_size: (3, 3) stride: (1, 1) pad: SAME act: ReLU

[TL] MaxPool2d pool2: kernel_size: (2, 2) stride: (2, 2) padding: SAME

[TL] Flatten flatten:

[TL] Linear  linear1relu: 1024 ReLU

[TL] Linear  output: 10 No Activation

4. Model Training & Prediction

Next, use the Model high-level interface to quickly start training the model, which will:

  • Use the tlx.optimizers.Adam optimizer for optimization.

  • Use tlx.losses.softmax_cross_entropy_with_logits to calculate the loss value.

  • Use tensorlayerx.dataflow.DataLoader to load data and build batches.

  • Use the tlx.model.Model high-level model interface to build the model for training

#build batch data loader

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

test_loader = DataLoader(test_dataset, batch_size=batch_size)

 

#build trainable model with high-level API

model = tlx.model.Model(network=net, loss_fn=loss_fn, optimizer=optimizer, metrics=metrics)

 

#train

model.train(n_epoch=n_epoch, train_dataset=train_loader, test_dataset=test_loader, print_freq=print_freq, print_train_batch=True)

Epoch 1 of 500 took 2.037001371383667

   train loss: [2.4006615]

   train acc:  0.0546875

Epoch 1 of 500 took 2.0650012493133545

   train loss: [2.3827682]

   train acc:  0.05859375

      ......

 

Epoch 30 of 500 took 11.347046375274658

   train loss: [0.8383277]

   train acc:  0.7114410166240409

   val loss: Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=False,

       [0.87068987])

   val acc:  0.7016416139240507

The End

As shown in the example above, using a simple convolutional neural network on the cifar10 dataset, TensorLayerX can achieve an accuracy of more than 70%. You can also achieve better results by adjusting the network structure and parameters.