ML, and in particular DL, methods have become the state of the art in many vision, language, and signal processing tasks, due to their ability to extract patterns from complex high-dimensional input data. Classical ML methods, such as random forests and support vector machines, have been used in many EO applications to analyse temporal series of remote sensing images. On the other hand, convolutional neural networks (CNNs) have been employed to analyse the spatial correlations between neighbouring observations, although mainly in single temporal scene applications.
This tool describes a deep learning architecture capable of simultaneously analysing the spatio-temporal relationships of satellite image series. The work is demonstrated on an application on the land cover classification of the Republic of Slovenia, using annual Copernicus Sentinel-2 satellite images for the year 2017.
The Temporal Fully-Convolutional Network (TFCN) extends the FCN architecture (e.g. U-Net), which is currently the state of the art in single-scene semantic segmentation. The architecture exploits spatio-temporal correlations to maximise the classification score, with the additional benefit of representing spatial relationships at different scales due to the encoding-decoding U-Net structure. The algorithm performs a 3D convolution in the spatial as well as temporal dimension. By default, max-pooling is performed in the spatial domain only. As the target land cover labels are not time dependent (i.e. one label per pixel is available for the entire time-series), 1D convolutions along the temporal dimension are performed in the decoding path of the architecture to linearly combine and reduce the temporal features. The schematic of the model architecture is shown in the figure. The output of the network results in a 2D label map that is then compared to the ground-truth labels.
The architecture performed three encoding and decoding steps (i.e. three max-pooling and three deconvolution layers), with a bank of two convolution layers at each encoding and decoding scale. The number of convolutional features was set to 16 at the original scale, with a factor of two applied at each deeper level, with a kernel width of three. The Adam optimiser (learning rate 0.001) was employed, and the TFCN model was implemented in TensorFlow.
The framework was used to generate a land cover map of the Republic of Slovenia for the year 2017. The inputs to the framework are a shape-file defining the geometry of the AOI, the Sentinel-2 L1C images for the entire year, and a set of training labels. By avoiding downloading and processing entire tile products (e.g. Sentinel-2 granules), EO-learn provides flexibility and facilitates an automation of the processing pipelines. A pipeline is defined as a connected acyclic graph of well-specified tasks to be performed on the data. EO-learn supports parallelisation of operations, such that the same workflow (e.g. data preparation for land cover classification) can be run in parallel for the smaller patches constituting the AOI. Logging and reporting allow monitoring and debugging of the execution of the processing pipeline.
The trained model was used to predict the labels on the test sample and the results obtained were then validated against the ground truth. An overall accuracy of 84.4% and a weighted F1 score of 85.4% were achieved. In general, poor prediction was obtained for under-represented classes such as wetlands and shrubland. These results represent preliminary work on a prototype architecture which was not optimised for the task at hand. Despite this, results in line with previously reported work were achieved. Optimisation of the architecture (e.g. number of features, depth of the network and number of convolutions) and of the hyper-parameters (e.g. learning rate, number of epochs and class weighting) could improve the results of TFCN even more.
Share