Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy over 85% and the CASTC model achieved an overall accuracy of 76%. We found that the cashew area in Benin almost doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 55%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

to be used as a biofuel (Sanjeeva et al., 2014), and cashew apples can be used to produce a range of beverages and animal feed (Gomes et al., 2018). Since cashew tree crops are typically cultivated by smallholder farmers, value additions in the cashew industry and poverty reduction are tightly connected.
Benin is among the top ten cashew growers in the world and the third largest cashew nut producer in West Africa (Duguma et al., 2021). Cashew nuts from Benin are renowned for their superior quality and bright white hue. The government of Benin recognizes the importance of cashew tree plantations in the fight against poverty for smallholder farmers, as evidenced by their strategy to double cashew production from 2016 to 2021 in the government action plan -Benin Revealed: Government Action Program (MAEP-Benin, 2017) -and the subsequent 2022-2026 plan (PNIASAN-Benin, 2022). There are two main ways to increase cashew nut production. The first strategy is to expand the cashew plantation area under cultivation by converting other land use types. The second is to improve the use of good agricultural practices (GAP) on existing cashew plantations to increase yield. The two strategies require an understanding of cashew plantation distribution, i.e., detailed knowledge regarding the location of cashew growing areas around the country and GAP implementation including planting density (tree spacing). Knowledge of the spatial distribution of cashew plantations and planting density is therefore essential to inform government policies regarding land use conversion and to efficiently manage field extension services. Periodic mapping can help the government understand how policies have been implemented and update land-use guidelines. In addition, for GAP related to planting density, identifying regions where cashews have been planted with below optimal GAP planting density guidelines can help best direct needed resources, including the provisioning of new cashew nurseries that are a core part of Benin's national cashew strategy and are crucial to increasing cashew production. In Benin and many other countries in sub-Saharan Africa, the traditional method of gathering information about cashew plantation distribution and GAP related to tree density is through extensive field data collection by conducting in-person farm visits and field surveys, which are inefficient and laborious on a large scale. Efficient and low-cost measures are urgently needed to assist the implementation of development programs on a national scale (Burke and Lobell, 2017;Chivasa et al., 2017).
Remote sensing offers a cost-effective and time-saving way to characterize ground objects and track surface changes over time at a large scale. Agricultural remote sensing has made significant contributions to fighting food insecurity through mapping crop types, predicting yields, and monitoring crop diseases/insect pests and more (Atzberger 2013;Maes and Steppe 2019;Nellis et al., 2009;Sishodia et al., 2020). However, research involving remote sensing and tree crops is mostly concerned with mapping tree crops and the effects of deforestation, with an emphasis on oil palm (Cheng et al., 2016;Gutiérrez-Vélez and DeFries, 2013;Xu et al., 2022) and rubber (Dong et al., 2013;Tridawati et al., 2020) trees. Rubber and oil palm trees have relatively prominent spatial and spectral features, which reduces the need for high spatial resolution imagery. For example, the crown diameter of a mature oil palm tree can be ~10 m and they are often grown in commercial farmlands spanning a few kilometers (Lin et al., 2021). In contrast, certain essential global commodity tree crops, such as cashew trees, have smaller crowns and are planted in irregular patterns within small plots (< 5 ha). In general, cashew plantations have received little attention in the past, with the exception of some local-scale studies in a specific community or national park with medium resolution (10-30 m) remote sensing imagery and traditional machine learning methods (Pereira et al., 2022;Singh et al., 2018). Those approaches are not transferable to nationalscale studies and are insufficient to resolve individual fields that are particularly relevant to smallholder management practices. Furthermore, mapping smallholder tree crops remains challenging given the fragmentary landscape and limited satellite observation capabilities.
Although a few studies have targeted smallholder plantation systems and complex landscapes (Ballester-Berman and Rastoll-Gimenez 2021;Descals et al., 2019;Dong et al., 2012;Maskell et al., 2021), the observational ability of medium-resolution spatiotemporal satellite imagery (e.g., MODIS, Landsat, and Sentinel) is limited for small-crown tree crops. Even fewer studies address different intra-class management practices, e.g., coffee sub-categories (Hunt et al., 2020;Kawakubo and Pérez Machado, 2016;Maskell et al., 2021). In recent years, in addition to aerial and drone imagery, as a number of microsatellite constellations have been launched by private aerospace companies such as Planet, Airbus, and DigitalGlobe, more data sources are available for high-resolution mapping (< 3 m). However, only limited small-scale case studies of ~1000 km 2 or less have leveraged high-resolution data to map smallholder tree crops (Burnett et al., 2019;Cui et al., 2022).
To understand smallholder cashew plantations in Benin and fill the gap in large-scale smallcrown tree crop mapping with high-resolution imagery, we employed Planet Basemaps (2.4 m) and aerial imagery (0.5 m) with advanced spatiotemporal deep learning techniques to map cashew plantation distribution and planting density in Benin. Several aspects of smallholder cashew plantations made the two classification tasks challenging. First, cashew trees of different varieties and ages grow in the same plantation with irregular spacing, which results in tremendous intraplantation heterogeneity. Second, there is often large inter-plantation heterogeneity in plantation size, shape, and planting density. In this case, sensors with medium spatial resolution (e.g., Sentinel and Landsat) and traditional machine learning classification algorithms are not sufficient for mapping cashew plantations at the field level. Finer remote sensing products (such as the 2.4-m Planet Basemaps and 0.5-m aerial imagery used here) and advanced deep learning techniques have opened up new possibilities for monitoring smallholder cashew plantations. Third, omnipresent clouds and shadows distort the spectral signal captured by sensors. The daily revisit frequency of Planet imagery helps ensure the availability of cloud-free Planet Basemaps on a monthly basis.
To our knowledge, this is the first-of-its-kind large-scale cashew plantation map leveraging high-resolution remote sensing imagery. In this study, we (i) developed spatiotemporal tree classification algorithms for cashew, (ii) mapped cashew plantation spatial distributions for four years (2015, 2019, 2020, and 2021), (iii) tracked the spatiotemporal changes in cashew plantations, and (iv) distinguished high-and low-density cashew plantation management practices. While this study focuses on cashew tree crops in Benin as a special case, the derived classification algorithm can help inform cashew mapping in other countries or be used for other similar tree crops.

Study area
The Republic of Benin comprises 12 departments (the primary administrative units) and is subdivided into 77 communes. The study area located in central Benin (1-3°E, 7-10°N) is one of the primary cashew-growing regions in West Africa, which spans 12 communes in four departments -Donga, Borgou, Collines, and Zou. (Fig. 1(a)). Heterogeneous landscapes are prevalent here ( Fig. 1(b)), and cashew trees are typically cultivated in smallholder plantations of less than 5 ha. The study region has a tropical savanna climate, and its typical yearly temperature ranges from 24 to 31 °C. In the study region, the dry season lasts from November to April, while the rest of the year makes up the rainy season (Table 1). Drought conditions are concentrated between December and January during the dry season, while the wettest months occur from June to September during the rainy season. The planting time for cashew tree seedlings is during the rainy season, mainly between July and August. The cashew tree blossoms and produces fruit mainly during the dry season. The peak flowering period for the cashew tree lasts from December through January and cashew nut harvest typically takes place from February through March.

Planet Basemaps
Because the study region is located in the tropics, the image quality from optical sensors with low revisit frequencies is impaired significantly by year-round clouds and shadows. Furthermore, high spatial resolution is needed for mapping tasks because of the prevalence of irregular smallholder farms of less than 5 ha in the study region. For these reasons, we employed the monthly Planet Basemaps product to map cashew trees during less cloudy months (November to May) each year (2019-2021). Planet Basemaps is a 4-band (blue, green, red, and near infrared) surface reflectance (SR) product composed of images captured by the PlanetScope microsatellite constellation ( Fig. S1(b)-(c)). The PlanetScope consists of hundreds of microsatellites carrying three generations of sensors (Dove Classic, Dove-R, and SuperDove), and the raw imagery it captures has a resolution of about 3 m spatially and 1 day temporally (Planet, 2022a). To develop the Planet Basemaps, raw imagery has undergone radiometric correction, geometric correction, orthorectification, and quality imagery selection. To reduce the variance across images and mitigate the effects of the atmosphere, the Planet Basemaps have also been standardized to a mature MODIS SR product (Planet, 2022b). The Planet Basemaps are organized in tiles, with 569 tiles required each month to fully cover the study region. Two levels of the Planet Basemaps are available: Zoom Level 15 (4.77 m) and Zoom Level 16 (2.38 m), and we opted for the latter given its better observation capability.

Aerial imagery
Aerial imagery was employed for cashew plantation mapping in 2015 due to the fact that Planet Basemaps are not available prior to 2019. The aerial imagery came from a project supervised by the Benin government for the preservation and development of gallery forests and digital base mapping production. A fixed wing aircraft, a Piper PA-31 Navajo, was used for imagery collection in May 2015. A multispectral camera (UltraCam Eagle Mark 3) with a focal length of 40 mm was mounted on the airplane platform to collect ground information in four spectral bands (blue, green, red, near infrared) from 3 kilometers above the ground ( Fig. S1(a)-(b)). Radiometric correction, geometric correction, and orthorectification have been applied to develop a SR product that is organized in grids. The SR product has a spatial resolution of 0.5 m, and 2031 tiles were required to fully cover the study region. Airbus RGB imagery with manually delineated labels for comparison.

Training data
Our chosen region for training the classification algorithms is an area of 1,000 km 2 (Fig. 2), where the heterogeneous landscapes and irregular smallholder cashew plantations are appropriate for training and selecting the optimal deep learning model. For this training region, we launched a ground survey through the TechnoServe BeninCajù program to collect detailed land cover types, and aggregated them into four categories of cashew plantations, mixed trees/grassland, built-up land, and cropland/others. Based on this survey map, we manually delineated ground truth polygons of the four categories using 0.5-m aerial imagery in 2015 and Airbus imagery in 2020 for mapping cashew plantation distributions in 2015 and from 2019 to 2021, respectively. The training region in Fig. 2 shows labeled examples from Airbus imagery. The mixed trees/grassland class includes mixed scattered trees, grassland, and gallery forest, while the cropland/others class mainly includes cropland and bare land. Since there is some mismatch when the training labels based on 0.5-m imagery are directly applied to the 2.4-m Planet Basemaps, a resampling was applied.
Because resampling from the manual ground truth to the Planet-based ground truth would cause a mixture of boundary pixels between two land cover types, we performed a 2-pixel erosion for each class, then relabeled eroded pixels as the cropland/others class and removed connected pixel clusters of less than 30 pixels.
For mapping cashew planting density, we defined high-density and low-density cashew plantations as having greater than or less than 100 trees/ha, respectively. In production practice, optimal yield is achieved in high-density plantations with a planting density between 100 and 180 cashew trees/ha. Low-density planting (< 100 trees/ha) cannot fully exploit the productive potential of cropland, while a density of over 180 trees/ha can lead to declining marginal productivity due to quality issues caused by disease and infestation. Although here we used a threshold of 100 trees/ha to distinguish between low-density and high-density cashew plantations, future studies will use a more in-depth classification that includes "very-high-density" plantations above 180 trees/ha.

Validation data and sampling design
To generate unbiased area estimates with uncertainties, we adopted a kind of probability sampling design -cluster sampling (Olofsson et al., 2014;Song et al., 2017). The cluster sampling method consisted of two phases: the first phase performed a stratified random sampling method at the primary sampling unit level -cluster, and the second phase implemented a simple random sampling method at the secondary sampling unit level -pixel. In the first step, the study region was divided into 5 by 5 km clusters consisting of 1663 clusters, and 5-km clusters ensured a large number of smallholder fields to be contained. Among them, 241 clusters were discarded because they were on the study area boundary, and 1422 clusters were retained for probability sampling.
We acquired the 30-m cropland extent map of continental Africa in 2015 (Xiong et al., 2017) to calculate cropland percentage for each cluster. Then, clusters were sorted according to the cropland intensity and divided into three strata with equal sizehigh, medium, and low cropland intensity strata (Fig. 2). The reason we used cropland percentage as our basis for stratification is that there is no historical cashew plantation map and there are often more cashew plantations with more staple croplands in our study region. For each stratum, 40 sample clusters were randomly selected guided by affordability. In the second step, 10 pixels were selected using simple random sampling within each first-step sample cluster. A total of 1,200 sample pixels were selected for visual interpretation for each year. Five kinds of auxiliary information were alternatively displayed to help visually interpret sample pixels: 2.4-m temporal Planet Basemaps from 2019-2021, 0.5-m aerial imagery in 2015, high-resolution imagery from Google Earth Pro, 11,397 field-collected cashew plantation boundaries, and more than 1,100 field-collected and visually interpreted samples labeled by TechnoServe. Cluster sampling method can significantly reduce the workload of visual interpretation by limiting sample pixels into sample clusters.
Furthermore, as the cashew tree is a perennial plant, we used 196 cashew plantation ground truth samples with the same exact location across the four years to validate the consistency of our classification maps in each of the four years. We also used 348 field-collected samples with known planting density to assess the accuracy of cashew plantation planting density. We preplanned the locations of validation points prior to the ground survey based on transportation accessibility and widespread distribution of the samples. Enumerators then used GPS devices to locate the predetermined samples and noted the planting density. (4). This study proposes tree crop mapping algorithms for cashew plantations that address both the spatial distribution and planting density in two stages ( Fig. 3). At the distribution mapping stage, the SpatioTemporal Classification with Attention (STCA) model was developed and U-Net (Ronneberger et al., 2015) was applied to map cashew plantation distributions for multi-temporal imagery from 2019 to 2021 and for mono-temporal imagery in 2015, respectively. At the planting density mapping stage, the Clustering Augmented Self-supervised Temporal Classification (CASTC) model was developed to map cashew planting density for multi-temporal imagery in 2021.

Tree crop mapping algorithms for cashew plantations
The training and testing steps using multi-temporal Planet Basemaps were conducted in a NVIDIA V100 GPU with 32 GB memory. Training and testing using mono-temporal aerial imagery were conducted in two NVIDIA V100 GPUs with 64 GB memory. For both sets of imagery, we used the same PyTorch deep learning framework.

Smoothing SR differences between tiles
Each Planet Basemap is a composite image made up of images from different microsatellites in the PlanetScope constellation, and therefore frequently has issues with inconsistent SR for the same ground object between tiles, even after the correction based on the monthly MODIS product (Houborg and McCabe 2018;Rao et al., 2021). In addition, various levels of clouds and shadows also lead to multiform SR for the same ground object in different regions, despite the superior image quality of Planet Basemaps relative to the SR products of Sentinel-2 and Landsat imagery in the tropics. Some studies have used other relatively mature SR products to correct Planet imagery to address these problems. For example, Sentinel-2 and Landsat SR products were used to stretch the histograms of Planet data (Jain et al., 2016;Rao et al., 2021). However, this method is unsuitable for our study region because Sentinel-2 and Landsat data have much more cloud coverage than Planet Basemaps in the tropics. Given that a small area has a high chance of being repeatedly captured by the same sensor under similar cloud and shadow conditions, we split our study region into many small areas. To smooth SR differences between tiles, we then performed a normalization on each of them using Eq. (1) (Graves and Schmidhuber 2005) to extract phenological changes (Ghosh et al., 2021a). To better aggregate the information for each time step, we further added the attention mechanism to aggregate the hidden representations over the time series based on their contribution to the classification performance (Luong et al., 2015;Jia et al., 2019). The parameter set of STCA was trained to minimize the objective function of pixel-wise cross entropy on the limited number of manually annotated labeled patches: probability value for each class was obtained by taking the average of ten runs, as demonstrated in Eq. (3): where denotes the ℎ patch and f( ; θ ) are the predicted probabilities of each run r on the ℎ patch. Moreover, the standard deviation across the runs gives an estimate of the uncertainty unc in the prediction (Eq. (4)). In this study, the dropout rate is 0.3.
(4) Next, each pixel was allocated a class using a region growing strategy. Traditionally, for deep learning multi-classification tasks, each pixel is assigned to the class with the highest Softmax value, and we refer to this strategy as "Argmax" prediction. However, with Argmax prediction, misclassification between spectrally similar classes and fragmentation of fields can easily occur.
In our study, cashew plantations share similar spectral features with gallery forests and other tree crops, e.g., for some pixels which should be gallery forests or other tree crops, the highest Softmax value may be for cashew plantations. In this case, the Argmax prediction would wrongly classify these pixels as cashew plantation. In addition, the space between cashew trees in plantations may be identified as other land cover types, resulting in unrealistically fragmented fields. Therefore, the region growing strategy was chosen to process the pixel-wise Softmax output instead of directly taking Argmax prediction as a classification result. Specifically, step 1 in Fig. 4(a) shows the original Softmax output for one class, where darker blue colors represent larger Softmax values.
As shown in step 2, pixels having a Softmax value greater than 0.8 were assigned as seed pixels, and those between 0.4 and 0.8 were assigned as neighboring pixels. In step 3, neighboring pixels with a seed pixel in their neighborhood were reassigned as seed pixels.
Step 4 shows the region growing result generated by seed pixels. The neighboring pixels in case 1 would not be maintained in the end, which reduces misclassification, while the neighboring pixels in case 2 would be maintained to ensure the integrity of plots.
We employed the U-Net model to create the cashew plantation map for 2015 using aerial imagery. Each image tile was cut into image patches of 256 by 256 pixels before model training.
The U-Net model has been prevalent in the crop classification domain (M and Jayagopal 2021; Wei et al., 2019;Zou et al., 2021) because of its advantages for segmentation tasks, including the combined use of global location and context, fewer training samples, and good performance (Alom et al., 2018;Ronneberger et al., 2015). The same Monte Carlo dropout method was applied to create an average over ten predictions and express the prediction uncertainty. In this study, we used five convolution layers with the kernel of 3 by 3 pixels. Maxpool and transposed convolution layers were employed with the kernel of 2 by 2 pixels. The dropout rate is set to 0.3. Then, the region growing strategy was applied to produce the final cashew plantation map for 2015. 2

Mapping cashew planting density in plantations (tree-spacing practices)
Field-collecting tree planting density data on a large scale is a difficult task requiring a substantial commitment of time and resources, which renders supervised learning unrealistic.
Therefore, we need a learning strategy to categorize plantations into high-or low-planting density without the need of labels. Clustering is one such widely used unsupervised learning strategy.
However, directly clustering the spectral bands of mono-temporal remote sensing imagery can lead to suboptimal results for several reasons. First, the clustering using spectral values of a single pixel can be noisy due to the spatial noise of the sensors. Second, clustering using all spectral values of all the pixels in an image patch can lead to very high dimensions and thus lead to issues like correlated attributes and inaccurate calculated distances. Third, multi-temporal remote sensing imagery includes more information to depict cashew tree crops than mono-temporal imagery. Thus, to address these challenges, we extracted abstract featuresembeddingfrom the satellite imagery performance (Ghosh et al., 2022).
In this study, the Clustering Augmented Self-supervised Temporal Classification (CASTC) model was leveraged to distinguish two kinds of cashew plantation tree-spacing practices (highdensity versus low-density cashew plantations) using the temporal Planet Basemaps in 2021.
Specifically, the encoder and decoder parts in the autoencoder structure have a similar architecture to STCA without the skip connections. Removing skip connections can help the model to extract quality encoded spatiotemporal vectorsembeddingthat fully capture representative features without the assistance from the skip connections. The embedding output by the encoder is fed into an LSTM-based sequence decoder that generates a sequence of vectors, and a convolutional decoder then reconstructs back the input time-series satellite imagery based on the vector sequence.
Model training consists of two phases: the first phase involves model initialization, and the second phase is model refinement with optimized clustering objectives. In the first phase, the autoencoder generated the embeddings, which were then subjected to the K-means clustering algorithm to generate initial cluster centroids. In the second phase, the cluster centroids obtained from the first phase were refined by matching soft assignment to target distribution by Kullback-Leibler (KL) divergence (Joyce, 2011) minimization. The soft assignment measures the similarity between an embedding and a cluster centroid with the t-distribution ( Van der Maaten et al., 2008) to generate the probability of assigning the embedding to the cluster: where is the probability of assigning the ℎ embedding to the ℎ cluster, h( ; θ ℎ ) is the ℎ embedding, is the ℎ cluster centroid, α is the degree of freedom, and K is the number of cluster centroids. To promote model learning from high-confidence embeddings, the target distribution was computed by skewing the soft assignment to drive the embedding closer to the cluster with highest : Then, we matched the soft assignment to the target distribution by minimizing the KL divergence to refine cluster centroids and encoder: where is the number of image patches in the training set.
Before model training, each image tile was cut into image patches of 32 by 32 pixels. After training the model, we had ten clusters, and the model was used to assign a cluster for each image patch in the training set. For the image patches for each cluster, we then visually inspected the corresponding high-resolution Airbus satellite imagery and assigned the image patches as high density or low density (step 1 in fig. 4(b)). We applied this model to the entire study region and kept only the cashew plantation region using a cashew plantation distribution mask. Each separate cashew plantation was given a density score from 0 to 1 (step 2 in Fig. 4(b)) to indicate the ratio of high-density image patches according to Eq. (4). Then, a density score threshold of 0.5 was applied to distinguish between high-density and low-density plantations (step 3 in Fig. 4(b)).

Validation of cashew plantation classification maps
Two accuracy assessment methods were performed on the cashew plantation distribution in two dimensions: space and time. In the spatial dimension, we verified the accuracy for each year using 1,200 ground truth samples. A confusion matrix, overall accuracy (OA), user's accuracy (UA), and producer's accuracy (PA) were used to evaluate the accuracy. In the time dimension, we verified the consistency of the classification results from 2015 to 2021 using 196 cashew plantation samples located in the same place across years. Note that built-up areas were ignored in our accuracy assessment, as this information was derived from existing products and was not the focus of this study. To assess the accuracy of cashew plantation tree-spacing practices, 348 points with known planting density were used.

Benchmarking with other classification approaches
The performance of STCA was compared with three deep learning methods: 3D-CNN (Ji et al., 2018), U-Net with ConvLSTM (Ghosh et al., 2021a;Shi et al., 2015), and CALD (Jia et al., 2019), all of which were designed to learn spatiotemporal information and have been shown to perform well in crop classification tasks. We did not include traditional machine learning methods for comparison with STCA because they do not take spatial information into account in their classification, and the CALD approach has been shown to perform much better on cropland mapping than RF and SVM (Jia et al., 2019). Compared to the traditional CNN network, 3D-CNN exploits additional temporal information by conducting convolution in the time dimension to learn temporal information. U-Net with ConvLSTM utilizes the U-Net architecture but substitutes the convolution layers with ConvLSTM layers, which can capture spatial and temporal information with CNN and LSTM. CALD leverages a context-aware LSTM to capture temporal information and further learn spatial information from neighboring pixels. We compared the classification performance of these four approaches within the training region. UA and PA were used to assess classification performance for each class.
We also compared the performance of CASTC with two other standard self-supervised classification methods, i.e., Autoencoder with K-means (Ghosh et al., 2022) and Colorization with K-means (Vincenzi et al., 2021). Autoencoder is a standard spatiotemporal STCA architecture without the skip connections. Colorization is a self-supervised learning technique with two independent branches taking in the NIR and RGB channels. Both of the branches are trained by the autoencoder separately, and we averaged their respective embeddings from the final layer as final embeddings. K-means is performed on the embeddings from the final layer of the encoder for both methods. To compare the clustering performance of the three approaches at the embedding level, we measured the inter-and intra-cluster differences using two metrics, the Separability Index (SI) and Coefficient of Variation (CV). SI (Somers and Asner et al., 2013) was used to measure the inter-class difference (Eq. (10)): where i and j represent different embedding clusters; μ and μ refer to the mean value of cluster i and cluster j, respectively; σ and σ represent the standard deviation of cluster i and cluster j, respectively. The numerator can reflect the disparity between different clusters, while the denominator can indicate the degree of concentration within clusters (Hu et al., 2019;Yin et al., 2020). A larger SI indicates greater dissimilarity between the embeddings in the two clusters. CV was used to measure intra-cluster differences, and it is unitless (Eq. (11)):

= (11)
where i represents a cluster, and σ and μ refer to the mean value and standard deviation of a cluster, respectively. A smaller CV indicates a greater concentration of embeddings in the cluster.
Then, we compared the distribution of SI and CV for the three approaches. The accuracy assessment results in Table 2 Fig. 6(a) illustrates the classification comparison between CASTC and the two other benchmark self-supervising approaches. In terms of SI, although CASTC has a slightly lower median value than Colorization with K-means, the first quantile and the maximum are noticeably higher, which indicates much greater divergence of some cluster pairs. Autoencoder with K-means has the poorest clustering performance. In the comparison of CV, the maximum and median of CASTC are the lowest among the three methods, although the minimum is slightly higher than Colorization with K-means. Colorization with K-means has the greatest median and maximum values of the three approaches, indicating that more than half of the clusters it formed were highly dispersed. As shown in Fig. 6(b), the image patches from two clusters formed by CASTC show obvious differences in cashew tree density. High-density clusters stand out in sharp contrast to lowdensity clusters. However, for Autoencoder and Colorization with K-means, high-density and low-density cashew plantations are mixed in the same cluster, indicating that the two methods perform poorly in this task.

Accuracy assessment for cashew plantation distribution and plantation density
For each of the four years, the OA of the cashew plantation map was close to or exceeded 80% (Table S1-4) with 1,200 validation points. The cropland/others class was most frequently misclassified with cashew plantations. A small amount of Mixed trees/grassland labels were also found in points predicted to be cropland/others. The UA of cashew plantation was greater 0.8 and PA exceeded 0.9 from 2019 to 2021. The lowest UA in 2015 than other year was expected given that no temporal information could be used for the mono-temporal imagery. In terms of classification consistency, 83.7% of the samples were consistently classified as cashew plantation across the years, which demonstrated the stability of the classification along the time dimension (Table S6). The OA for cashew plantation density mapping was 77.9% (Table S7). Low-density cashew plantations were better categorized than high-density plantations, possibly because the number of high-density plantation samples, 128, was smaller than that of low-density samples, 220.

Cashew plantation expansion from 2015 to 2021
The classification results for the four years indicate that the area of cashew plantations almost doubled from 262 ± 34 kha to 514 ± 121 kha between 2015 to 2021. This confirms public statistics from the Benin Ministry of Agriculture, Livestock and Fisheries that the area under cultivation increased by 71% from 286 kha in 2016 to 488 kha in 2020 (MAEP-Benin, 2020). During the last seven years, there were clear signs of growth in west and east Collines, west Donga, and south Zou, as shown in Fig. S2. Fig. 8

Planting density map of cashew plantations for 2021
The map of cashew plantation planting density scores is shown in Fig. 9(a). A threshold planting density score of 0.5 was selected to distinguish high-density and low-density cashew plantations ( Fig. 9(b)), and details are shown for three sample sites in Fig. 9(c). As shown in Fig.   9(d), the proportion of high-density cashew plantations relative to the total reveals that 5 communes exceed 50%, namely N'Dali, Parakou, Bantè, Ouèssè, and Savalou. In the Tchaourou commune, nearly half of the area planted with cashew trees already consists of high-density cashew plantations. These six communes are all located in the Borgou and Collines departments, which means these two departments, especially Borgou, have a relatively mature cashew cultivation industry. On the other hand, although the three communes of Savè, Bassila, and Djougou have relatively large cashew plantation areas, the tree density is low, which presents an opportunity to increase cashew planting density (and therefore cashew production) on existing plantations.

The added value of Planet Basemaps in large-scale smallholder crop mapping
Although some previous studies have focused on smallholder crop mapping, the limited spatial and temporal resolution of the available remote sensing data was not adequate for creating field-level maps. Prior to the Sentinel mission launch, Landsat and MODIS data were used to map smallholder farms (Jain et al., 2013;Schneibel et al., 2017). Later, Sentinel-1 and 2 were widely used for crop functional mapping of all kinds, especially in Africa (Jin et al., 2019;Lambert et al., 2018;Masiza et al., 2020). However, the highly fragmented fields and frequent cloud coverage in some regions can cause satellite imagery with medium spatial/temporal resolution to lose its efficiency. Sentinel-2 imagery cannot adequately depict smallholder cashew plantation field boundaries, given its insufficient spatial and temporal resolution and the degradation of image quality by clouds and shadows ( Fig. S3(a)). Recently, researchers have realized the advantages of the high spatial and temporal resolution in Planet's microsatellite constellation for crop mapping.
Some promising smallholder crop mapping results using Planet Daily Scenes SR data (3 m) have been published, although they consist solely of small-scale (<1000 km 2 ) applications (Rafif et al., 2021;Rao et al., 2021). However, such daily images are still affected by clouds and shadows to varying degrees ( Fig. S3(b)). In comparison, Planet Basemaps generally contain less noise ( Fig.   S3(c)). This is because the monthly Planet Basemaps go through more post-processing steps than Planet Daily Scenes. First, the cloud cover and image sharpness are used as quality metrics to determine the best imagery in each month, and the imagery that ranks highest in the weighting of these metrics is selected. Then, the selected imagery is normalized to a monthly MODIS SR target to minimize variability between scenes and reduce atmospheric effects. Finally, all images processed for Planet Basemaps are manually inspected for quality (Planet, 2022b). To summarize, Planet Basemaps enable field-specific, and even sub-field, crop monitoring.
We compared the classification performance for Sentinel-2 Level-2A, Planet Daily Scenes, and Planet Basemaps in the training region (Fig. 10). The classification result generated by Sentinel-2 was only able to identify a small subset of cashew plantations. Planet Daily Scenes were able to capture more cashew plantations than Sentinel-2, although some pixels were misclassified as cashew plantations. By comparison, Planet Basemaps classified cashew plantations more accurately. The classification accuracy assessment using F1 scores also showed that Planet Basemaps produced superior classification results compared to the other two methods (Table 4).

Uncertainty analysis of classification results
Because the study region is located in the tropics, clouds and shadow weaken the observational capability of optical sensors. At the same time, the Planet microsatellite constellation consists of many satellites, which inevitably causes differences in sensor characteristics, thus affecting the ability to obtain consistent SR for a large region to some extent. Additionally, other factors such as satellite product versions, atmospheric and directional corrections, and BRDF effects can also cause classification uncertainty (Zeng et al., 2022). All of these factors not only have impacts on the direct monitoring ability for cashew plantations, but also introduce inter-class similarity and intra-class differences, resulting in poor classification performance. In order to explore the impacts of these factors on the classification results, we employed an uncertainty mask generated by the Monte Carlo dropout technique (Eq. (4)) to filter out the pixels impaired by these factors. Fig. 11(a) shows the real surface conditions with satellite imagery. The pixel-wise classification uncertainty mask from ten runs with Monte Carlo dropout (Fig. 11(c)) was applied to the averaged classification results of the ten runs ( Fig. 11(b)) with a threshold of 0.06 to generate a cashew plantation map without high-uncertainty pixels ( Fig. 11(d)). The threshold can be adjusted case by case.
We took the classification result in 2021 as an example and divided the study region into three parts based on the degree of cashew plantation area decline (Fig. 11(e)) after applying the uncertainty mask. Communes are relatively spatially concentrated for each class. There are four communes in the southeast of the study region that experienced less decline than other communes (i.e., Djidja, Dassa-Zoumè, Savè, and Glazoué) among which Savè declined the least. On the other hand, the three northwestern communes of Djougou, Bassila, and Bantè declined more than other communes, among which Bassila declined the most.

Policy guidance for the cashew industry in Benin
The findings of this study help document the progress of a major governmental initiative

Limitations and future work
The combination of tree crop classification algorithms for cashew and high-resolution imagery demonstrated the power of accurately mapping the distribution and planting density of cashew plantations, upon which we monitored the dynamics of cashew areas under cultivation from 2015 to 2021. The proposed classification algorithms will allow rapid mapping of cashew plantations going forward. However, some limitations remain to be optimized in the future. First, because of year-round clouds and shadows in the tropics and differences in sensor characteristics of the satellite constellation, intra-class spectral inconsistency is still an issue for some regions, even after Planet Basemaps have been normalized to a monthly MODIS SR product. This is common in the remote sensing field, and solutions are limited. A potential method is using domain adaptation, including invariant feature selection, representation matching, adaptation of classifiers, and selective sampling (Elshamli et al., 2017;Martini et al., 2021;Tuia et al., 2016). Recently, the new generation of PlanetScope instruments with 4 newly added bands (coastal blue, yellow, a second green, and red edge spectral bands) has launched and started publishing imagery (Planet, 2022c).
The new analytic product is calibrated to Sentinel-2 and has improved alignment, enabling accurate time-series analysis and machine learning models (Planet, 2022d). In future classification map updates, we will consider these new methods and inputs. Second, although the 2.4-m Planet Basemaps were leveraged to distinguish high-and low-density cashew plantations using the threshold value of 100 trees/ha, remote sensing imagery with higher spatial resolution can help differentiate additional levels of planting density (e.g., "very-high-density" plantations above 180 trees/ha) as a result of finer observational ability. In addition, more information about GAP adoption in cashew plantations can be captured, such as whether trees are pruned (i.e., tree crowns are not touching each other), which is important for improving cashew tree yields. In the future, more agricultural practices will be mapped.
With further work and close coordination between researchers and field teams, we will geographically expand the modeling techniques in this study to other cashew growing regions, and then to other major smallholder tree crops such as mango, avocado, shea, and macadamia, helping to improve the livelihoods of millions of smallholder farmers globally. In addition, based on tree crop maps, we can further understand the contribution of tree crops to carbon stocks and the benefits of expanding tree crop planting area for carbon sequestration, which could complement the research on the role of non-forest trees in mitigating climate change (Brandt et al., 2020;Mugabowindekwe et al., 2022;Skole et al., 2021).

Conclusion
In this study, we mapped the spatial distribution of cashew plantations from 2015 to 2021 and cashew planting density in 2021 with our developed tree crop mapping algorithms. Combining high-resolution Planet Basemaps and aerial imagery, even with limited ground truth labels, the STCA and U-Net showed promising performance in mapping cashew plantation locations in each of the four years. The methods and data sources used allowed us to achieve this result even in the face of difficult challenges that included heterogeneous landscapes and irregularly-planted smallholder farms, similar spectral signatures between cashew and other trees, pervasive and yearround clouds and shadows, and frequent land-use changes. We found that cashew plantation areas in Benin expanded nearly doubled since 2015 to 514±121 kha in 2021. With the self-supervised learning model CASTC, the cashew plantation planting density map provided important information to assist in identifying regions with the greatest need for guidance on tree-spacing practices. Although the tree crop classification algorithms in this study were designed for mapping cashew plantations in Benin, they can be adapted in the future for other cashew growing regions and to map the distribution and planting density of other smallholder tree crops.