The land transformation model-cluster framework: Applying k-means and the Spark computing environment for large scale land change analytics

Omrani H., has a new article published in the EMS journal The paper explores the challenges of simulating land change across large regions and spans of time.  The new mode, called LTM-Cluster,  is a scalable modeling framework aimed to benefit research involving large datasets.

The paper uses the Spark environment to reduce the burden of high computational time when handling a huge amount of data. The implementation of LTM-Cluster is available free-of-charge. Our results consistently showed significant performance improvements of the LTM-Cluster compared to the traditionally parameterized LTM.

Fig. 2
Main components of the LTM-cluster model.

We used land use datasets from three case studies: 1) Muskegon County, Michigan; 2) Lynnfield, Massachusetts-Boston and; 3) South-eastern Wisconsin (see map) (hereafter as Muskegon, Boston, and SEWI).

Fig. 1
Urban-gain and non-urban persistence for (A) Muskegon County, (B) Boston, and SEWI (C) during 1978–1998, 1971–1999, and 1990–2000.


This study introduces a novel framework for land change simulation that combines the traditional Land Transformation Model (LTM) with data clustering tools for the purposes of conducting land change simulations of large areas (e.g., continental scale) and over multiple time steps. This framework, called “LTM-cluster”, subsets massive land use datasets which are presented to the artificial neural network-based LTM. LTM-cluster uses the k-means clustering algorithm implemented within the Spark high-performance compute environment. To illustrate the framework, we use three case studies in the United States which vary in simulation extents, cell size, time intervals, number of inputs, and quantity of urban change. Findings indicate consistent and substantial improvements in accuracy performance for all three case studies compared to the traditional LTM model implemented without input clustering. Specifically, the percent correct match, the area under the operating characteristics curve, and the error rate improved on average of 9%, 11%, and 4%. These results confirm that LTM-cluster has high reliability when handling large datasets. Future studies should expand on the framework by exploring other clustering methods and algorithms.

You can download the paper using this link

Leave a Reply