The Use of Artificial Neural Network and Advanced Statistics to Model Sediment Yield on a Large Scale: Example of Morocco

Abdelali Gourfi ✉ Lahcen Daoudi, Abdelhafid El Alaoui El fels, Abdellatif rafik, Salifou Noma Adamou and Ayoub Lazaar ESO-Angers, UMR 6590 CNRS, Université d'Angers, France Laboratory of Georessources, Geoenvironnement and Civil Engineering (L3G), Faculty of Sciences and Techniques, Cadi Ayyad University, B.P. 549, Marrakech, Marocco International Water Research Institute (IWRI), Mohammed VI Polytechnic University, Ben Guerir 43150, Morocco Dept. of Geology, Faculty of Sciences, Mohammed First University, Oujda, Morocco ✉Corresponding Author: Abdelali Gourfi, E-mail: gourfia@gmail.com

concentration in river systems and present, thus, a real simple solution for modelling soil erosion and understanding controlling factors (Liu et al., 2013a;Bhattacharjee and Tollner 2016).
Due to its geographical position in the northwest of Africa, Morocco is characterized by great diversity in terms of climate, vegetation, morphology and geology; this situation makes it even more difficult to model and predict SSY for sustainable management of this phenomenon. In light of these concerns, this study's main objectives are i) application of statistical analysis to understand factors controlling their distribution iii) testing the feasibility of ANN to predict SSY.
With a total of 152 dams, Morocco is ranked among the African countries with the highest amount of dams (Alhassan 2009;Wisser et al., 2013). However, the management of water resources in dams is important, especially with the Kingdom's accentuated aridity and drought. In contrast, the soil erosion by water represents the principal cause of silting of dams, causing a dramatic decrease in water storage capacity (Gourfi et al., 2018). In this study, we expect that the detailed statistical sensitivity analysis and the neural network could provide a better understanding of the dynamics and distribution of soil erosion and contribute to identifying strategies for better sustainable management of water and soil resources.

Study area
The Morocco kingdom (710 850 km2) is located in the northwest of the African continent. (21 N to 36 N, 1W to 17W). It is classified as one of the most diversified countries in terms of rainfalls (Fig. 1a), structural domains ( Fig. 1b), elevations (Fig. 1c) and land cover (Fig. 1d). Morocco can be classified into four main areas: 1) the domain to the North (Rif), composed of friable rocks of various ages, dense and permanent vegetation cover, well developed thick soil, and a humid climate, 2) the Meseta area is typically characterized by gentle to moderate slopes, the Mediterranean to semi-arid climate and thick soils, 3) the domain of the Atlas Mountains is marked by high altitudes with very steep slopes, a semi-arid climate, a medium vegetation cover and undeveloped soils, and 4) the Anti-Atlas and Saharan regions are distinguished by a desert climate, very poor soils, bare areas and sparse desert steppes.

Methodology
The understanding of factors controlling the sediment yields (SY) and area-specific sediment yield is done following five main steps: (1) estimation of the observed SSY and SY based on the bathymetric surveys of the studied check dams, (2) the collection of a database gathering the parameters reported as the most influential SSY and SY (3) performing the PLS model in order to develop the regression equations relation SY and its most correlated variables (4) perform a detailed sensitivity analysis of RNN in order to identify parameters that are important for predicting the SSY, (5) processing, calibration and validation of recurrent neural network. 42 of the 152 dams counted in Morocco have been selected because of the presence of sedimentation data, based on data from Gourfi et al. (2018) (Fig. 2).

SY and SSY calculation
The formula suggested by Verstraeten and Poesen (1999) was used to determine the area-specific sediment yield: Where SSY is the area-specific sediment yield (t·ha -1 ·year -1 ), dBD represents the dry sediments bulk density (t·m -3 ), which was taken to be equal to 1.3 t·m -3 in Morocco's reservoir sediments (Alahiane et al., 2016;Elmouden et al., 2017;Lahlou 1988), STE is the trap efficiency (%), SV is the measured sediment volumetric (m 3 ·year -1 ) and CA the basin area (ha).
The usual relation proposed by Brown (1943), relatively easy to apply and suitable for large reservoirs (eq. 2), was used to calculate STE: Where STE is the trap efficiency (%), C is the reservoir capacity (m 3  For each outlet (dam), the basin boundaries were delineated through the digital elevation model at a resolution of 30 meters using the GIS environment.

Potential predictor variables
An extensive database of 14 quantitative variables was used. These variables are described in the literature to have the greatest influence on sediment yield and soil erosion processes ( Properties of soil, including sand, silt, clay, coarse fragment volumetric (CFV), soil organic carbon stock (SOCS), bulk density and absolute deep to bedrock, were obtained from the Soil Grids database at 250 meters (Hengl et al., 2015) (https://soilgrids.org).

Artificial neural network
The artificial neural network (ANN) is an approach adaptable to predicting and modelling nonlinear behaviour; it has received a great deal of attention in the last few decades as an effective computation tool in forecasting hydrology and sediment delivery processes (Liu et al., 2013b).
Artificial neural networks (ANN) are rarely used in soil erosion modelling (Jain, 2001;Nagy et al., 2002;Bhattacharjee and Tollner, 2016), although their application is extremely beneficial in many other disciplines. Compared with other types of ANNs, the multilayer perceptron (MLP) or feedforward network is less complex, and it is the most widely used type in the ANN family. It covers the majority of scientific and industrial applications. It is a layer-propagation network model ( Figure. 3) in which neurons are classified into several successive layers: an input layer, one or more intermediate layers, also called hidden layers, and finally, an output layer. MLP neural network modelling aims to find the best structure of the connection weights between neurons to associate an adequate response to each input configuration. The modelling is done in two steps. First of all, a learning phase (calibration) is done where the network must find by itself the underlying regularities of the data; it is a matter of determining the weight values, which enable it to obtain the best performance for a given criterion by minimizing the error between the observed and simulated data, this calibration is obtained by the back-propagation algorithm of the error gradient. The second phase is validation. In this phase, we test the model's ability to simulate data outputs that were taken different from those taken in the first step.

Sensitivity analysis of ANN and predictions accuracies
The main objective of sensitivity analysis is to get knowledge of the relationship among output and input parameters in a model by ranking the input variables and quantifying their influence on output variables prediction (Bhattacharjee and Tollner, 2016). In this study, the sensitivity analysis is estimated using the "Neural Net Tools" package in R (Beck, 2018), based on the Garson's Algorithm (Garson, 1991).
The method proposed by (Garson 1991) is based on partitioning the weights of neural network connections. In order to quantify the influence of inputs on the output, the algorithm reports the relative importance of each input variable expressed as a percentage. The algorithm uses absolute values of connection weights, and it can be applied only to neural networks with one output variable.
The predictions accuracies of all regression equations were evaluated using two common criteria; the coefficient of determination (R 2 ) and Nash-Sutcliffe efficiency coefficient (NSE). The NSE and R 2 are calculated as ( Nash and Sutcliffe, 1970;Bennett et al., 2013;): N is the considered number of data points; Oi and Pi are, respectively, the observed and predicted value, and O and P are their means; The closer the R 2 and NSE approach to 1, the more the model is efficient.

Spatial distribution of sediment yield SY and area-specific sediment yield SSY
The distribution map of SY (Mt 3 ·year -1 ) and SSY (t·ha -1 ·year -1 ) was illustrated in Fig. 4. Results show a mean SY of 1.96 Mt·year -1 , recording as the highest value of SY 15.08 Mt 3 ·year -1 , the lowest value recorded is 0.04 Mt 3 ·year -1 . The average SSY is 6.40 t·ha -1 ·year -1 , the lowest value is 1.02 t·ha -1 ·yr -1 , and the highest value is 55.25 t·ha -1 ·year -1 . The majority of watersheds experiencing high SSY are located in the North; these watersheds are generally characterized by a Mediterranean climate, dense vegetation and located in the Rif; the presence of the friable lithology in these areas induces spectacular gully soil erosion (Gourfi et al., 2018).

Correlation analysis results
The correlation between Area-Specific Sediment Yield, Sediment Yield and catchment properties shows information about relations existing between some variables (Fig. 5), most relevant remains the existing relation between rainfall and soil organic carbon stock (r=0.88), the observed sediment yield and catchment area (r=0.74), silt content and NDVI (r=0.78).
A set of founded high correlations between some variables show the existing collinearity between them. For example, the Aridity index, Rainfall, NDVI and Soil organic carbon stock are highly correlated, also, drainage network length, catchment area and drainage density.
Only a few factors are meaningfully correlated with SY; drainage network length and basin area were the most explanatory variables correlated with SY (0.74 and 0.73, respectively). The SSY shows as a most important positive correlation with the NDVI and SOCS (0.56 and 0.53 respectively), a moderate negative correlation is observed between SSY and drainage density, drainage network and catchment area (-0.55, -0.53 and -0.52, respectively).

Major groups of watersheds in Morocco
The Principal Component Analysis (PCA) application decorticates the watershed in Morocco into five major categories. Each category is characterized by a vegetation cover, a climate and geography. The PCA also allowed the understanding of parameters having a close relationship with these groups (Fig.6).
The PCA, including the studied variables, identified three main end members (Fig. 6A). The first end member is positively loading correlated to the first principal component of the PCA (PC1 axis, 35.9 % of the variance), presenting high positive loadings for rainfall, SOCS, NDVI, aridity index (AI), and silt. The second end member has a negative correlation for the first principal component and positive loadings for the second principal component (PC2 axis, 25.6 % of the variance) and includes absolute deep to bedrock (ADB), drainage network length (DNL), catchment area (Area) and sand. The third major pool is negatively correlated with the first and second principal components and includes coarse fragment volumetric (CFV), elevation, and slope.
Results of observations in PCA (Fig. 6B) show five major populations. The first population concerns basins of the Pre-Saharan area and Saharan domain. It is characterized by gentle slopes, poor vegetation cover and low precipitations volumes, with a close variation to the sand fraction variable. The second group is related to watersheds in the mountains of the High Atlas. They are characterized by steep slopes, hard lithology and high annual precipitations volumes. Those watersheds have a close variation with Slope, CFV and elevation variables. The third population regroups to watersheds located at the north part, characterized by steep slopes, high precipitation rates, soft lithology and dense vegetation, with a close variation to the variables aridity index, silt fraction, NDVI, SOCS and rainfall. The fourth population is related to basins in the occidental Meseta characterized by flatlands, medium precipitation rates and vegetation cover. The last group (population 5) is related to watersheds located at the Oriental Meseta characterized by flatlands, low precipitation rates and vegetation cover. The last two groups (groups 4 and 5) have a close variation with drainage density, catchment area, absolute depth to bedrock, and drainage network length.

PLS application results
The application of the correlation analysis (Fig. 5) shows that drainage network length and catchment area have the closest relation to sediment yield, with respective correlations of r=0.73 and r=0.74. Based on the model efficiency and fit, the developed regression equations by the partial least squares Regression model can provide accurate predictions of sediment yield in relation to these two variables (

Neural network application
After several tests and after having obtained the maximum value of Nash and correlation performance criteria, the architecture of the neural network model most relevant for the simulation of area-specific sediment (SSY) is of type [15-5-1] ( Figure 8A). The model is thus, composed of an input layer of 15 neurons, a hidden layer containing five neurons and an output layer containing a single neuron. In the calibration phase, the dams with references from 1 to 20 were taken; in this step, the results of comparing the observed SSY and simulated ones by the application of the AAN are shown in figure 8B; the predictions show very good results with a Nash criterion of NSE=0.97, and a correlation coefficient of R 2 =0.97.
The analysis of the results in this phase ( Figure 8C) shows a good simulation of SSY (NSE= 0.93; R 2 =0.93). This ability validates the applicability of this model for the prediction of area-specific sediment.
The performed sensitivity analysis using Garson's Algorithm (Garson, 1991), the most important input variables are area-specific sediment prediction, drainage network length (DNL), Sand, Aridity index, and Elevation are the most (Fig. 8D). Conceptual, traditional empirical and physics-based or regression models didn't prove the ability to estimate processes controlling sediment yield due to unfeasible required data and insufficient systems knowledge and often do not represent all involved soil erosion processes. Generally, most experiential models are based on RUSLE (Revised Universal Soil Loss Equation) (Renard et al., 1997) and were developed to predict soil losses generated by rill and/or sheet erosion. As sediment deposition is not considered (Wischmeier 1976 As mentioned by (De Vente et al., 2013), a great cause why some models still provide acceptable estimates of SY after the calibration is that these over-estimate hillslope compensate for undescribed erosion processes. In Morocco, the sediment yield rates in reservoirs are mainly related to the catchment areas and the drainage networks. The founded results go with those of (Milliman and Syvitski, 1992), which found that the catchment area has the strongest influence on sediment yield in most rivers. This is especially due to a general increase in catchment size with flow magnitude and the cumulative sediment delivery at the downstream outlet. Also, many other researchers have described this relation (    In fact, the only available study that tried to model area-specific sediment yield at the national scale is that of (Gourfi et al., 2018). This study showed the inability of the RUSLE model to predict SSY. Indeed, by comparing observed SSY with modelled ones obtained using RUSLE ( Figure.   The analyses of ordinary conditions, presence of gully erosion conditions, and absence or poor soil horizon conditions ( Figure. 12) in relation to environmental diversity entities shows that area with ordinary conditions represent most of the cases and compose the totality of Occidental Meseta and Saharan areas, in this respect, the RUSLE model can be applied for valuable estimations for SSY in these areas. However, we recommend its application cautiously in Oriental Meseta; in contrast, it would be better not to apply it in the North and High Atlas mountains areas, dominated by gully erosion or/and poor or absence of soil horizon. Fig. 12. Number of watersheds compared to geomorphic zones according to ordinary, gully erosion abundance and poor or absence of soil horizon areas

Neural network application
The description and modeling of SSY in all its forms is still a major challenge worldwide (Poesen, Torri, and Van Walleghem 2011; Wainwright et al., 2008). This work proposes the application of artificial neural networks of the black box type as a simple approachable to estimate SSY. The approach results give satisfactory results. This study will not be only a solid dataset for responsible regional policy and decision-makers, a benchmark indicator on sediment yield and area-specific sediment yield in Morocco, but also more comprehensive and accurate than those derived in previous assessments.
As the determination of area-specific sediment yield, most affecting factors always remain a challenging issue (De Vente et al., 2013), the sensitivity analysis of artificial network was applied. However, to our knowledge, only very few studies have applied ANN to predict sediment yield. Moreover, (Liu et al., 2013a)adopted ANN in order to forecast suspended sediment concentration in a river system and prove its effectiveness in predicting the phenomenon. (Kisi et al., 2006) and (Rajaee et al., 2009) and support the conclusion that ANN handles nonlinear and nonstationary intervention of SSY controlling variables better than many other models.
However, the ANN predictions give very satisfactory results compared to observed outputs; on the other hand, drainage network length (DNL), sand, aridity index (AI) and elevation are the most significant input variables contributing to the outputs.

Conclusion
This study constructed a database of sediment yield and area-specific sediment yield for the most important Moroccan dams catchments. Additionally, a dataset of most SY and SSY controlling variables was constructed in order to emphasize most controlling factors a set of statistical processing coupled with the application of ANN to reach this objective. Concerning SY, results highlighted the good linear relation found between SY and catchment size and drainage network length. Furthermore, the modelling SSY remains a difficult issue worldwide, especially in an area like Morocco characterized by a large landscape diversity. Some areas are characterized by the absence of soil horizon, and others are dominated by landslides and gully erosion. RUSLE model didn't show the ability to model SSY in these conditions, but nevertheless, the model still gives good results under ordinary conditions. In the light of these concerns, the sensitivity analysis of ANN was applied; ANNs gave very good results and represented an efficient solution to the very diversified landscape of the area, where drainage network length, aridity index and elevation remain the most significant factors infecting SSY. The application of ANNs able to assess sediment yield in the light of the difficulty to estimate this one, especially in a country known for great morphological, climatic and vegetation diversity.