A Data-driven Model for Sustainable Deployment of Climate Smart Agriculture Practices Among Smallholder Farmers in Kakamega

Kenya’s agriculture is dominated by millions of smallholder farmers who produce over 75 per cent of the national agricultural production. The smallholder farmers, however, are the most vulnerable to climate change because of various socioeconomics, demography, and policy trends limiting their capacity to adapt to change. To mitigate against the negative effects of climate change on smallholder farmers’ numerous interventions, in the form of Climate Smart Agriculture Technologies have been developed and promoted by development partners and government departments. Not all the targeted smallholder farmers, however, participate in and adopt the technologies at the ideal rates and intensity leading to their dis-adoption and abandonment. This study, therefore, sought to develop a data-driven model for the sustainable deployment and adoption of CSA practices among smallholder farmers in Kakamega county. The study employed a mixed methods research design. Through a quantitative survey of 428 smallholder Climate Smart Agriculture Technology adopters and dis-adopters this study reviewed and investigated the major socio-economic and biophysical characteristics associated with the different smallholder farmer categories. Supervised Machine Learning using the Scikit-Learn library of Python Programming language was used to build, pilot, and review Decision Tree and Random Forest Classifier models for the sustainable deployment and adaptation of CSA practices among Kakamega county's smallholder farmers. 19 key variables were identified for the development of a predictive model for CSA Technology adoption. A predictive tool was developed and piloted among 15 smallholder CSA farmers. The classifier model produced a Mean Squared Error of 0.16. The proposed model predicted smallholder farmer adoption at an accuracy of 89.53 per cent and 90.0 per cent with test data and pilot data, respectively. This study, therefore, proposes a new model for the optimal selection of Climate Smart Agriculture intervention beneficiaries.


I. INTRODUCTION
Smallholder agriculture is a term used to describe rural producers predominantly in developing countries who mainly farm using family labour and for whom the farm provides the principal source of income (Cornish, 1998).Smallholder farmers are those who work on and own land ranging from 0.5 to 5 hectares, according to Kenya's Ministry of Agriculture.Kenya is estimated to be dominated by 4.5 million smallholder farmers who produce more than 75 per cent of the country's agricultural output (Kirimi et al., 2011).The contribution of smallholder farmers to agricultural development cannot be underestimated as they play a significant role in the food security of both the country and the continent of Africa.Available reports indicate that smallholder farmers produce over 80 per cent of the food produced in Africa (Hlophe-Ginindza & Mpandeli, 2021) .In addition, the smallholder farmers produce for their households thereby reducing the burden on the government to provide food for them.
Kenyan smallholder farmers face several challenges.First, because of their small landholdings, they produce only enough food to feed their families and have little to sell.As a result, their ability to generate income is reduced, and their poverty levels rise.Second, smallholder farmers cannot obtain agricultural credit to improve their farming practices because they lack adequate data to support their creditworthiness (Maru et al., 2018).Third, because the majority of these smallholder farmers live in remote and rural areas, they do not have access to the necessary infrastructure and other services that would enable them to access farm inputs and agricultural markets Aaron (Aaron, 2012).Fourth, smallholder farmers face pest and disease outbreaks, droughts, and a scarcity of arable land to both carry out their farming practices and live in (Hlophe-Ginindza & Mpandeli, 2021).Lastly, smallholder farmers are faced with the major challenge of climate change.
The Kenya Climate Change Act of 2016 defines climate change as the "change in climate systems which is caused by significant changes in the concentration of greenhouse gases as a consequence of human activity and which in addition to natural climate change that has been observed during a considerable period" ("Climate Change Act," 2016).This implies that human activity is primarily to blame for climate change.Thus, climate change is concerned with long-term changes in weather patterns around the world caused by the concentration of GHGs primarily from human activities.A report by Kenya Agricultural Research Institute [KARI] (2009) indicates that the zones that are considered semi-arid may become arid areas or too dry for any agricultural activity to take place.Climate change is, therefore, expected to result in losses in the production of basic staples like maize and beans, and livestock products which in effect may lower food accessibility and lower per capita calorie availability.
Climate change studies have identified rising temperatures, more variable rainfall, and changes in the onset and offset of rainfall as some of the major challenges facing agriculture today (Harvey & Pilgrim, 2011).In addition, high temperatures and drought conditions have been reported to harm maize and bean production, flowering, and yields in many tropical countries (Eitzinger et al., 2013).Furthermore, climate change has been reported to harm tropical agricultural production including high pest and disease incidences.ClimateChange.ie (2017), associates the invasion of fall armyworms and other pests in Africa with climate change.
The foregoing notwithstanding, climate change has impacted negatively on smallholder agriculture through unpredictable weather and intensified drought cycles making farming unpredictable and reducing agricultural productivity (ClimateChange.ie,2017).As a result, smallholder farmers must develop coping strategies such as sustainable agriculture, climatesmart agriculture (CSA), precision agriculture, and other interventions.
To counter these challenges, Climate Smart Agriculture (CSA) interventions have been developed to increase smallholder farmers' resilience to climate change, reduce Greenhouse Gas (GHG) emissions, and increase agricultural productivity (FAO, 2020).CSA has been termed as the method of combining various sustainable methods to address a specific community's climate challenges (Rainforest-Alliance, 2020).While Sustainable Agriculture focuses on producing crops and livestock with minimal environmental impact, CSA is an approach that aims to assist those who manage agricultural systems in responding effectively to climate change.Thus, CSA practices can be defined as agricultural practices that consider both resilience and adaptation to climate change.
The implementation of CSA practices among smallholder farmers, however, has not achieve the intended goals because the current practices do not consider individual farm-level data and socio-economic characteristics during the design and implementation of the interventions.Individual smallholder farms are different ranging from management practices in each farm, soil characteristics, and other farm-based characteristics.For this reason, the lack of information, insights, and data-driven decisions leads to losses and reduced yields forcing some smallholder farmers to abandon CSA practices with the winding up of supporting projects.Data-driven agriculture informs smallholder farmers on the critical economic decisions of what to produce, how much to produce and when and how much to produce.This study, therefore, designed a data-driven model for the deployment and adaptation of CSA practices among smallholder farmers in Kakamega county.
Many studies have been conducted to model agricultural production.First, Johann et al. (2016) estimated the soil moisture content using an autoregressive error function.This model is suitable to estimate soil moisture in controlled systems applied no no-till machinery.A similar study by Chen et al. (2014) designed a Wireless Sensor Network (WSN) to monitor multi-layer soil temperature and moisture in a farmland field to improve water utilization and to collect basic data for research on soil water infiltration variations for intelligent precision irrigation.Muangprathub et al. (2019) developed a model for optimally irrigating crops based on a Wireless Sensor Network (WSN).In this model, a soil moisture sensor is used to monitor the field and connecting to the control box.A web-based application is designed to manipulate crop data and field information.This application applies data mining to analyze the data for predicting suitable temperature, humidity and soil moisture for optimal future management of crops growth.A mobile smart phone app is then developed to control crop watering.
Another notable model developed in the recent past is the Climate Smart Village Approach by Aggarwal et al. (2018).This model provides a means of performing agricultural research for development through testing technological and institutional options for dealing with climate variability and climate change using participatory methods.
According to Aggarwal et al. (2018), an ideal CSV approach gives guidance before and during the planting season on the most suitable CSA practices, technologies, services, processes, and institutional options considering market and resource availability such as capital, labor and markets.
The Climate Smart Decision Support system for analysing the water demand of a large-scale rice irrigation scheme is one of the models that have been developed to inform Climate Smart Agricultural decisions.This model by Rowshon et al. (2019), was applied to evaluate the impacts of climate change on irrigation water demand and other key hydro-climatic parameters in the Tanjung Karang Irrigation Scheme in Malaysia for the period 2010-2099.This model which has been used for analysing the water demand of a large-scale rice irrigation scheme helps promote adaptation and mitigation strategies that can lead to more sustainable water use at the farm level.
Ascough Li et al. (2002), developed the Great Plains Framework for Agricultural Resource Management (GPFARM), to provide crop and livestock management support at the whole farm level in the Great Plains of the United States.This DSS provides producers, consultants, action agencies, and scientists with information for making management decisions that promote sustainable agriculture.GPFARM contains risk analyses that combine projected crop yield and animal production data with concurrent environmental impact data.Another DSS was developed by Bseiso et al. (2015) targeting greenhouse farmers in low-resource settings.The DSS provides farmers with slides of decision information which is only read through printed papers or in a PDF format.This means that this DSS tool can be made into an app instead of paperwork.2014) present a climatic monitoring system for farmers.Using an integrated WSN weather station, farmers can display weather measures relative to temperature, humidity, wind and solar radiation.These measures allow the DSS to precisely calculate the water requirement in a daily calendar.Another DSS is by Panchard et al. (2007), known as Commonsense net.This DSS is a wireless sensor network for resource-poor agriculture in the semiarid areas of developing countries.This sensor network system aims at improving resource poor farmers' farming strategies in the wake of highly variable conditions.The risk management strategies include choice of crop varieties, planting and harvesting, pests and disease control and efficient use of irrigation water.This decision Support System uses WSN for the improvement of farming strategies in the face of highly variable conditions.

II. METHODOLOGY Primary Data Collection
Primary data was collected from 428 smallholder farmers in Kakamega County (182 adopters and 246 dis-adopters).The purpose of the models, therefore, was to aid in decision-making through prediction on which smallholder farmers would be CSA adopters and who would be CSA dis-adopters using the different variables identified in the study.

Machine Learning Tools
The Google Collaboratory notebook was used for the model fitting and testing process.Pandas, Numpy, Matplotlib, Scikit-learn and Seaborn ML libraries were used in the modelling.These libraries were imported into the Collaboratory notebook as shown below: import pandas as pd import numpy as np import matplotlib.pyplotas plt %matplotlib inline from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn import metrics import seaborn as sns

Importing the Data into The Notebook
This involved loading the dataset into a pandas' data frame using the read_csv function.The dataset was loaded as follows: df = pd.read_csv("/content/WholeData Set_610 Variables.csv")

Splitting the Data into Training and Test Data Sets
The data were randomly split into two datasets; 70 per cent for training the model and 30 per cent for testing the model.The train and test datasets were set as follows: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=0) Fitting the Models Model fitting was done to measure how well the ML models generalize to similar data to that on which they were trained.The models were defined and fit as shown on Table 1 below.

Making Predictions on the Test Data Set
The fitted models were used to fit the test data as follows:

Comparison of the Actual and Predicted Values
The actual values were compared with the predicted values as per the ML model.The actual values and the predicted values were compared as follows: df=pd.DataFrame('Actual':y_test, 'Predicted':y_pred)

Model Evaluation
The models were evaluated using the following metrics: Confusion Matrix; This metric was used in measuring recall, precision, specificity, accuracy, and AUC-ROC curves.The confusion matrix was developed for the models as follows: metrics.confusion_matrix(y_test, y_pred, labels = [1, 2]).The other metrics are described on Table 2, below.

Plotting the Actual and Predicted Values and the Identification of Important Features
The actual and predicted values were plotted, and the important features identified as shown on Table 4 below.

Visualizing the Random Forest and the Decision Tree Classifier Models
Tree visualization was used to illustrate how underlying variables (data) predict a chosen target and highlights key insights about the Random Forest Classifier and the decision tree.
The Gini index was used to measure the impurity or purity of the decision tree in the Classification and Regression Tree (CART) algorithm.The resulting trees were visualized as shown on Table 5 below.

Rapid Prototyping of the Data-Driven Model
This step involved the development of a data-driven prototype that predicts whether a smallholder farmer will either adopt or dis-adopt CSA technologies.Prototyping is the first stage of product development, and it gives the potential users a complete idea of how the final product will look like.The prototype developed was used to simulate a real ground situation.The main aim of the prototype was to attract and inform potential users of a product that they could invest in before allocating resources to and implementation of CSA technologies in Kakamega County.The following steps were followed in this process.

Development of a data collection guide
An online data collection tool was developed for the top 18 variables as identified in Objective 2 as being the most important in influencing the adoption or dis-adoption of CSA technologies in Kakamega.

Primary data collection
A random sample of 15 smallholder farmers, 8 adopters, and 7 dis-adopters, was identified from Butere Subcounty.Their farm biophysical and socioeconomic data were collected based on the top 18 variables identified in objective 2.

Fitting the model
The Google Collaboratory notebook was used for the model fitting and testing process.The prediction capabilities of the model were tested as follows: Importing the data into the notebook; The dataset was loaded into a pandas' data frame using the read_csv function as follows: df_test = pd.read_csv("/content/drive/MyDrive/Model_Testing_15092022.csv") Defining the X and Y variables; this step involved the use of all 18 variables and the resultant secondary independent variables.The independent variable, V12, was defined as the smallholder farmer categorization in terms of adopters and dis-adopters.The independent variables comprised the 18 important variables that were under investigation and the secondary independent variables resulting from the data collection exercise.The independent variables (X) and dependent variables (y) were then defined as follows.

Data-Driven Prototype Evaluation and Piloting
This step involved conducting a focus group discussion with key stakeholders in the CSA ecosystem to get their input in the model development process.This step was important as it brought out the potential users' expectations about the model and the challenges it was meant to solve.In addition, this step was used to determine whether the model was useful to the potential users and to gauge its user-friendliness.Model AUC-ROC graphs; figure 1, below, depicts the model AUC-ROC graphs.

Figure 1: AUC-ROC graphs
Decision Tree Classifier Random Forest Classifier The area under Curve (AUC); as depicted in Figure 1 (above) and Table 9 (below), the models under review produced AUCs of 0.89 and 0.91 under the Decision Tree Classifier and Random Forest Classifier, respectively.Training Accuracy; as shown in Table 9 the models had a training accuracy of 0.943 and 0.996 for the decision tree and random forest classifiers respectively.
Prediction Accuracy; the model prediction accuracy was tested on 30 per cent of the data, and as Table 9 depicts, the models' prediction accuracy was 0.860 and 0.8445 for the decision tree and random forest classifier, respectively.Precision; as shown in Table 9 the models' evaluation gave precisions of 0.80 and 0.78 for the decision tree and random forest classifier, respectively.Recall; the model had a Recall of 0.86 for both decision tree classifier and random forest classifier as indicated in Table 9. Specificity; the Model evaluation gave a specificity of 0.865 for both the decision tree and random forest classifiers as shown in Table 9. F1 Score; this models had F1 scores of 0.833 and 0.818 for the decision tree classifier and random forest classifier, respectively as indicated in Table 9.
Classification Report; a Classification report was used to measure the quality of predictions from a classification algorithm in terms of how many predictions were true and how many predictions were wrong.Table 10, below, depicts the model classification report.Plot the Actual Vs Predicted Values; the actual and predicted values were plotted together for visualizing and analysing how the actual data correlate with those predicted by the model.As depicted on Figure 2 below, the plots displayed identical distributions both for the decision tree classifier and the random forest classifier.Tables 12 and 13, below, depict the ranking of the Important features in the Decision Tree and Random Forest Classifier, respectively.
Figures 3 below, depict the graphical representation of the key features from the most important to the least important.

IV. DISCUSSION
The ML Model predictions were compared with the actual values to determine their predictive accuracy.The confusion matrix was used to visualize the performance of the ML Algorithms.The Decision Tree Classifier had a prediction accuracy of 86.05 per cent with 45 True positives, 11 False Positives, 7 False negatives and 66 True Negatives.The Random Forest Classifier, on the other hand had a prediction accuracy of 84.50 per cent with 45 True positives, 13 False Positives, 7 False negatives and 64 True Negatives.The Area Under Curve imply that both models were excellent and had a good measure of separability.The training accuracy of the two models imply that they (models) could predict accurately a high number of smallholder CSA farmers.The prediction accuracy of the models indicate that the models have high prediction abilities given that the testing data was completely new to them.The precision given by the models imply that that they were pretty good in their prediction ability.The Recall of both models imply that they (models) had no false negatives.The specificity of the models imply that they (models) could accurately predict 86.5 per cent of the smallholder CSA adopting farmers.
The Mean Absolute Errors, Mean Squared Errors, Root Mean Squared Errors and the accuracy indicate that the models only had a few errors had high accuracy and, therefore, were good models to predict the adoption of CSA practices among smallholder farmers in Kakamega County.The visualization of the classifiers gives the various levels of importance of the different variables in predicting the farmer categorization.For the decision tree the most important variables is V160.
Variables V161 and V162 are on the second most important level followed by variables V167, V164, V51 and V163.For the Random Forest, on the other hand, the most important variable on the first level is V138.Variables V163 and V51 are on the second most important level as variables V129, V161, V141 and V167 are on the third level of importance.The ML model was piloted with 15 randomly selected smallholder CSA farmers from a new area and correctly predicted 12 of them.
This implies that, given a new data set, the ML model could accurately predict smallholder CSA farmer's ability to adopt CSA technologies.The pilot model Precision, Recall, Specificity and F1 scores showed that the models had a high level of prediction precision and were good models in predicting smallholder CSA farmer ability to adopt or dis-adopt CSA technologies.

Conclusion
Using the random forest classifier and decision tree, it was found that it was possible to predict which smallholder farmers would be CSA technology adopters and which ones would be disadopters.These findings will go a long way to solve the farmer's problem of dis-adoption of CSA technologies.These models are able to guide extension officers and policy makers on the right interventions for smallholder farmers in Kakamega County.Ultimately, data-driven agriculture will inform the smallholder farmers on the critical economic decisions of what to produce, how much to produce and when and how much to produce.

Recommendations
This

Figure 2 :
Figure 2: Actual vs Fitted Values for Adoption Vs Dis-adoption Decision Tree Classifier Random Forest Classifier Figure 3: Important FeaturesDecision Tree ClassifierRandom Forest Classifier Figure 4: Decision Tree Visualization

Table 1 :
Fitting the Models

Table 2 :
Model Evaluation Using Various MetricsResultant model accuracy given when the model is applied to the training data implying that the model is tested on the examples it was constructed on Prediction Accuracy given by the ratio of the variables that are correctly predicted to the number of times the variables have been predicted in total.Precision/Sensitivity the proportion of observed positives that are predicted to be positives.Classification Report; A Classification report was used to measure the quality of predictions from a classification algorithm in terms of how many predictions were true and how many predictions were wrong.The Classification Report was developed for the models as follows: Model accuracy was given by the number of classifications that a model predicted accurately divided by the number of predictions made.Mean Absolute Error (MEA), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) were used to calculate the accuracy of the classification and regression model.The model accuracy, using the various approaches, was computed as follows; Link: https://journals.kabarak.ac.ke/index.php/kjri/authorDashboard/submission/314Vol 13 | Issue 1 | Nov. 2023 77 plt.show().

Table 3 :
Computing Model Accuracy

Table 4 :
Plotting the Actual and Predicted Values and the Identification of Important Features Link: https://journals.kabarak.ac.ke/index.php/kjri/authorDashboard/submission/314Vol 13 | Issue 1 | Nov. 2023 78 the data into training and test data sets; the data was split into two datasets, 70 per cent for training the model and 30 per cent for testing the model.The data was split as follows:Comparing the actual and predicted values; 15 smallholder CSA farmers were sampled out of which eight were adopters while seven were dis-adopters.The actual values and the predicted values were compared as follows:df=pd.DataFrameModel evaluation; the model was evaluated using the metrics on Table6, below.
model to learn from the training data and make accurate predictions when input with new data.The fitted models were used to fit the test data as follows: y_pred = model.predict(X_test)Link: https://journals.kabarak.ac.ke/index.php/kjri/authorDashboard/submission/314Vol 13 | Issue 1 | Nov. 2023 80

NO: 2410-8383
The participants included the University academic staff and students, Research Organizations, County Government Agricultural Extension Staff, Smallholder CSA farmers and Organizations promoting CSA technologies among smallholder farmers in Kakamega County.A demonstration was conducted to show the workings of the data-driven model for the deployment and adaptation of CSA practices among Kakamega County's smallholder farmers.Dummy farmer biophysical and socioeconomic data was used to predict the possibility of adoption of CSA technologies.The objective of this exercise was to elicit feedback on the applicability and suitability of the data-driven model for the deployment and adoption of CSA practices among Kakamega County's smallholder farmers.ISSN Decision tree Classifier and Random Forest Classifier Models for the Prediction of Adoption or Dis-adoption of CSA Practices were considered for prediction and behaviour analysis.The models were evaluated using the following metrics:Confusion Matrix: table 8 below depicts the model confusion matrix.

Table 10 :
Model Classification Report As indicated inTable 11 below, this model had MSEs of 0.13953488372093023 and 0.15503875968992248 for the Decision Tree Classifier and Random Forest Classifier, respectively.Root Mean Squared Error (RMSE); As depicted in Table 11 below, the RMSEs for the Decision Tree Classifier and Random Forest Classifier were 0.3735436838188142 and 0.3937496154790789, respectively.Accuracy; as depicted in Table 11 below, the Accuracy Values for the Decision Tree Classifier and Random Forest Classifier were 90.31 per cent and 89.53 per cent, respectively.

Table 11 :
Model Accuracy Using Different Approaches

Table 17 :
Model Classification Report study considered the adoption of bundled CSA technologies among smallholder farmers in Kakamega County.A study that targets commercial and large-scale farmers in Kakamega and other areas encouraged as it would enhance the findings of this study and support the United Nations Sustainable Development Cooperation framework principle of Leaving No One Behind.Future research should also seek to model the adoption of CSA technologies through larger samples that would cover bigger regions such as the former Western Province or the Western Region including the former Nyanza Province.The adoption of individual CSA technologies may be influenced by the different biophysical and socio-economic characteristics that are specific to the technology.For this reason, future studies, and the development of models for the sustainable deployment of specific CSA technologies should be considered.Future studies may also focus on seasonal and crop-specific CSA technologies.