Predicting rice yield ๐ŸŒพ using temporal vegetation indices with Random Forest algorithm

Prediction of rice yield using vegetation indices from rice growing season July - October in Jhapa district—monthly four mean vegetation indices from rice growing season trained to construct random forest model; NDVI and EVI individually—validated by remaining years unused during training—EVI based model found to be more robust in the yield prediction


Introduction

Rice is a staple food crop of Nepal. Rice remains the most imported food crop among rice, wheat, and maize in Nepal, with rice imports valued at NPR 5.78 billion in the fiscal year 2020/21 which slightly decreased in 2021/22 at NPR 4.75 billion. The imported rice was Rs 32.21 billion in the fiscal year 2018/19, an increase from Rs 28.90 billion in 2017/18 suggestively projecting the huge rice import trends over the years. Additional challenging economic, agricultural and environmental constraints: migration of youths from the country, decrease in the agricultural fields and challenging climate scenarios for growing the rice have further jeopardized food security in the country. The country’s goals to be self-sustained in rice production still seem challenging in current contexts.


One aspect towards self-sustenance in rice production should be statistical information of rice yield estimated timely and accurately. This assists in the proper economic and agricultural planning in the nation. Traditional approaches are often time-consuming and tedious, and can be most times biased in the process. Compared to the traditional approaches, utilising satellite based remote sensing spectral information with the machine learning algorithms can be the suitable approach in the rice yield prediction. Available satellite based spectral information over the years in the crop growing season can be suitably related to the crop’s growing condition. 


This implication compels us to estimate rice yield at the district level. Our primary objective is to apply the Random Forest algorithm to predict the rice yield in Jhapa District, Nepal, using Normalised Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) along with historical yield data while evaluating the effectiveness of NDVI and EVI in the rice yield prediction.



Methods

  • Study Area:

Jhapa is the easternmost district of Nepal which consists of fertile terai plain lands (Fig. 1). The district is one of the most rice growing districts of Nepal. 


Fig. 1: Study Area

  • Rice Yield Data:

The rice yield data for Jhapa District was accessed through Nepalindata. Nepalindata is an interactive portal that provides a wide range of statistical information, including yield-related data. Yield data can be viewed as in Fig. 2.



Fig. 2: Rice Yield trends in Jhapa District (screenshot from nepalindata)

  • Satellite Data:

MODIS Terra Daily NDVI and MODIS Terra Daily EVI were utilised as vegetation indices. They can be accessed by the following links in the google earth engine catalogue.

- https://developers.google.com/earth-engine/datasets/catalog/MODIS_MOD09GA_006_EVI

- https://developers.google.com/earth-engine/datasets/catalog/MODIS_MOD09GA_006_NDVI


Both the data are prepared are prepared from MODIS/006/MOD09GA surface reflectance composites and are available in daily basis from 2000.

The analysis involved calculating the mean vegetation indices from July to October as the rice growing period in the district. For each month, four vegetation indices were computed for the periods Day 1-8, 9-16, 17-24, and 25-31. The spatial mean of these indices was determined using MODIS Land Cover Type 1 (cropland) as a mask to isolate the rice-growing areas. This process was repeated annually from 2001 to 2021 for both vegetation indices. All the remote sensing computations were computed in earth engine.


The R programming environment using the RandomForest package was used to construct the  two models, based on EVI and NDVI, were constructed using the Random Forest algorithm. An approach was implemented to determine the optimal number of trees for maximising the explained variance (sequenced from 10 to 1000 by the units of 10). Data from the years 2001 to 2018 was used to train the model while the years 2019 to 2021 was used for the validation. 



Results:


For the NDVI-based model, the optimal number of trees was found to be 540 which yielded a variance explained of 0.73. In contrast, the EVI-based model achieved the best results with 10 trees, explaining 0.86 of the variance


For validation of NDVI-based model, MAE, RMSE and RE were found to be 904.78 kg/ha, 915.11 kg/ha and 0.20 respectively. In comparison, EVI-based model resulted in 753.77 kg/ha, 757.35 kg/ha and 0.17 respectively. Comparatively, EVI-based model was found to be better performing than NDVI-based model.



Conclusion:

Remote sensing when combined with machine learning can predict the timely and accurate rice yield compared to the traditional methods. Such approaches provide valuable insights in the rice yield estimation in Nepal. 



Supplementary Materials:

  • Vegetation indices charts: over the years, from July to October each year, indices were computed using MODIS Land Cover Type 1 (cropland) as a yearly mask to filter the data.






๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ๐ŸŒพ