1. Introduction
As the country actively promotes the energy revolution and implements the “dual-carbon” goal, a new type of power system has emerged to take on the heavy responsibility of carbon peaking, carbon neutrality, and realizing the new development concept. Accurate power load forecasting can significantly improve the power utilization efficiency and reduce carbon emissions [1]. However, large quantities of electricity are difficult to store for long periods of time and to supply at any time, which makes it difficult to maintain a balance between supply and demand in the operation of the power system [2]. How to maintain the flexibility, stability, economy and security of the power grid in the future has become an urgent issue.
Precise power load forecasting is vital for the effective operation and planning of power systems. It not only guarantees the stability of the power system but also empowers power grid companies to strategically organize power generation schedules, ensuring ample power supply to cater to diverse user demands. This approach helps minimize power disruptions due to inadequate supply and enhances the overall reliability of power provision [3]. Through accurate load forecasting, power grid companies can rationally plan and dispatch power generation resources to avoid resource wastage and cost increase. Accurate forecasts can provide valuable information to electric utilities, power plants, and grid operators to help them formulate reasonable generation plans, optimize power dispatch, and improve the reliability and economics of the power supply [4].
2. Background
2.1. Basic Methodology and Shortcomings of Existing Studies
In recent years, scholars have proposed various methods to forecast the global electricity load. Before conducting power load forecasting, a common method used to process data is data reconstruction. Scholars at home and abroad have developed various decomposition algorithms for power load time series, such as wavelet decomposition [5], wavelet packet transform [6], empirical mode decomposition (EMD) [7,8], and ensemble empirical mode decomposition (EEMD) [9], etc. Data reconstruction is utilized to reduce the impact of intermittency and randomness in the original data on the prediction results. Most of these decomposition algorithms decompose the original data and then apply neural network models to predict the subcomponents, which in turn improves the accuracy of the prediction.
There are three types of methods for power load forecasting. They are time series analysis methods, statistical methods and machine-learning methods. Time series analysis methods mainly include the exponential smoothing modeling method [10] and Fourier expansion model [11]. This method considers the trend, seasonality, periodicity, and randomness of historical load data when performing load forecasting, but its modeling ability for nonlinear relationships is limited. Statistical methods refer to the analysis and modeling of historical load data to predict future power loads, mainly including the vector autoregressive model [12], multiple linear regression model [13] and uncertainty analysis theory [14]. However, this method not only has limited modeling ability for complex nonlinear relationships but also has limited modeling ability for external factors such as weather changes, holidays, etc., which may affect the accuracy of the load forecasting. Machine-learning analytics mainly include grayscale projection and random forest algorithms [15], deep belief network prediction [16], multi-core support vector machine algorithms [17], and LSTM models [18]. This method considers more complex patterns and relationships in power load forecasting, and it can handle nonlinear relationships, large-scale data, and multidimensional features. But it is impossible to extract valid information between discontinuous data.
Deep learning (DL) is an important branch of machine learning that has received much attention in recent years and is widely applied in different engineering fields. Deep-learning models perform well in learning feature representations and simulating complex data relationships, effectively predicting complex associations hidden in big data [19]. The DL models include convolutional neural networks (CNNs), recurrent neural networks (RNNs) and fully connected neural networks (FCNNs). An FCNN is where each neuron in each layer is connected to each neuron in the next layer. It is suitable for structured data (e.g., tabular data) and simple classification or regression tasks. However, it has a large number of parameters and is prone to overfitting. It does not perform well for high-dimensional data (e.g., images, time series). A CNN uses convolutional and pooling layers to extract local features. It is suitable for processing image and video data. However, it has a relatively complex structure and requires more computational resources. It does not perform well for time series data or natural language-processing tasks. Thus, it is not very good at learning from sublinear data compared to an RNN. Among them, the RNN greatly improves the prediction compared to the FCNN and CNN because the RNN takes into account the temporal dependencies between sequential data through its unique self-cycling structure. In addition, long short-term memory (LSTM), as a significant representative of RNNs, has been widely used in problems such as load forecasting. Compared to an RNN, LSTM has advantages in preventing gradient explosion and vanishing when dealing with time series data correlation issues, and it has been widely used in the field of forecasting and time series-related fields due to its correlation with time series, advantages in long-term memory, and strong learning ability on nonlinear data.
The learning effect of a prediction model is often affected by the quality of the input data; therefore, appropriate processing of the data is key to improving the accuracy of the prediction model. Some common data decomposition methods have been introduced above, among which EMD is widely used due to its multi-resolution and adaptive nature [20]. However, EMD may encounter the problem of information feature extraction and obfuscation when constructing solid-mode functions. EEMD effectively solves this problem by introducing noise-assisted signal processing [21]. Although EEMD has improved in solving the mode-aliasing problem of EMD, determining the amplitude of the white noise added to EEMD is still challenging [22]. CEEMDAN is adaptive and suitable for decompression of complex signals requiring high adaptability and high resolution, but it is computationally complex and sensitive to noise [23]. To overcome the limitations of the EMD variant, white noise is added during operation to eliminate mode mixing. After several mixings, the noise remains in the signal. The empirical wavelet transform (EWT) is a revolutionary innovation based on a filter structure that is closely related to wavelet theory, which fundamentally solves the mode-mixing problem [24]. The core of the EWT lies in the design of adaptive filters, which requires the determination of the filter boundaries based on the spectral characteristics of the signal. This process not only relies on the choice of parameters but may also require complex algorithms to determine these boundaries automatically, increasing the complexity of the implementation. However, EWT may require more computational resources for filter design and signal reconstruction than variants of EMD, which may not work well when applied to short-term power load forecasting. The VMD is adaptive, does not require preset basis functions, and is free of modal aliasing. Its ability to provide stable modal decompositions helps to extract clear periodic and trending components from power load data, which is crucial for the accuracy of forecasting models. VMD, which is capable of decomposing complex sequences into multiple signals with adjustable amplitude and frequency, overcomes the difficulty of EEMD in distinguishing the frequency characteristics, achieves accurate signal decomposition, and improves the operational efficiency [25]. VMD is more suitable than CEEMDAN and EWT if the signal needs to be processed quickly and the parameters can be chosen rationally, e.g., for applications in short-term power load forecasting.
In the initial stages of deep-learning exploration, the power load was primarily seen as a regression task, with reliance on conventional statistical models and artificial neural networks. However, the autoregressive nature, dynamism, and weather-related aspects of power load data make it challenging to achieve precise load predictions solely through technologies like artificial neural networks. These methods struggle to effectively capture the complexities of nonlinear time series, such as the power load, making it hard for a singular model to meet the precision standards for load forecasting. Employing a blend of diverse models stands as a practical approach to enhance the precision of power load predictions.
In the literature [26], a power load-forecasting method based on the combination of composite adaptive filtering and fuzzy BP neural network is suggested. The proposed short-term power load algorithm is validated using real power load data to confirm its effectiveness. In addition to this, in the literature [27] is an electricity consumption prediction model based on a Bayesian regularized BP neural network, which proves that the proposed model has a better impact on electricity consumption prediction in Xinjiang compared with the general gradient descent method BP neural network model. However, the BP neural network has the phenomenon of “overfitting” and poor generalization ability, and the above method does not consider the problem of temporal correlation. A study [28] used multilayer perceptron neural networks, an adaptive network-based fuzzy inference system (ANFIS) and SARIMA to predict the power loads, respectively, and then proposed a direct and accurate algorithm to estimate the corresponding weight coefficients of each model. Despite the successful application of many optimization algorithms to determine the weighting coefficients of individual models, there is no evidence that optimization algorithms perform better in all cases. The deep RNN and single-layer RNN proposed in the literature [29] were applied to two different loaded datasets. The results show that the model has better prediction ability. When the time series data are long, the traditional recurrent neural network (RNN) implements short-term memory through neurons with self-feedback, which is theoretically capable of handling time series of arbitrary length. However, traditional RNNs sometimes encounter the problem of gradient explosion and vanishing, which is especially evident when dealing with long sequences. To address this issue, long short-term memory (LSTM) has been suggested as an enhanced version of an RNN, incorporating a gating mechanism to effectively tackle the challenges of gradient vanishing and explosion [30]. In the literature [25], a hybrid network for short-term load prediction is introduced, merging variational modal decomposition (VMD), gated recurrent unit (GRU), and a time convolution network (TCN). The VMD is applied to break down the original electrical load sequence data containing noise into fundamental IMF components with diverse frequencies and amplitudes. Subsequently, a joint load forecasting approach utilizing GRU and TCN networks is suggested for tracking high-frequency and low-frequency load variations, correspondingly; and finally, the high- and low-frequency signal prediction results are reconstructed for the GRU and TCN networks. However, the prediction error effect of the proposed method is not good, and the prediction accuracy is still not up to 90%. In the literature [31], a VMD-IWOA-LSTM approach for short- and long-term load forecasting (STLF) is proposed by also applying VMD to decompose the historical electricity load signals and optimizing the parameters in the LSTM using the improved whale optimization algorithm (IWOA). Finally, the prediction results of each component are summed up to obtain the final prediction results. The proposed method has strong anti-interference performance. However, the parameters of VMD need to be set manually, the workload will be larger in comparison, and the LSTM optimized by IWOA takes longer.
2.2. Innovative Aspects of the Study
To overcome the constraints of prior research and enhance the forecast precision, this study merges data decomposition with deep-learning architectures, specifically LSTM models. Leveraging LSTM’s capability to preserve historical data for assessing uncertainties arising from model misclassifications and data noise, it surpasses the drawbacks of conventional gradient-based networks such as sluggish performance and numerous input parameters. Consequently, the forecast accuracy is enhanced. The key contributions of this study encompass the following:
Most of the existing methods are based on the meteorological features of numerical weather prediction as input features, and the input prediction model based on multiple correlation and target variables proposed in this study takes into account the historical patterns of correlation features and target variables, which can improve the accuracy of the model prediction.
Aiming to address the shortcomings of the traditional DBO algorithm in the application of power load forecasting, such as its time-consuming nature, reduced search capability, ease of falling into local optimum and low convergence accuracy, the traditional DBO algorithm is improved by using the lens inverse-learning strategy, the spiral search strategy, the optimal value-guiding strategy, and the positional dynamic updating of weight coefficients strategy in order to form the MODBO-LSTM algorithm with multiple strategies, and the optimization outcomes of various optimization algorithms (FOA, SSA, GWO, etc.) are compared. The simulation findings indicate that the MODBO-LSTM algorithm outperforms the conventional DBO-LSTM approach, exhibiting superior prediction accuracy.
A hybrid MODBO-VMD-MODBO-LSTM (MVMO-LSTM) model is proposed, whereas previous studies need to reset the number of decompositions when the center frequency domain is determined and seldom explore the penalty factor and decomposition layer values. This multi-strategy improved dung beetle algorithm not only optimizes the penalty factor and decomposition layer values of VMD but also optimizes the optimal number of hidden units, the optimal maximum training period and the optimal initial learning rate of LSTM. The combination of these two improvements can effectively help the LSTM learn and adapt to the complexity and seasonal changes of the power load data, thus improving the prediction accuracy and making the prediction results more reliable and practical.
3. Methodologies
3.1. Variational Modal Decomposition Algorithm (VMD)
Variational mode decomposition (VMD) is a new signal adaptive-processing technique used to decompose a non-smooth, nonlinear signal into several discrete intrinsic mode functions (IMFs) [32]. Compared with traditional modal decomposition methods such as EMD, EEMD, etc., the greatest superiority of VMD lies in its adaptive tuning, i.e., a parameter is used to control the number of decomposed modes, and this adaptivity makes VMD more flexible in pre-processing signals of a different nature and complexity [33]. In the case of a preprocessed signal consisting of K bandwidth limited modal components ${u}_{k}(t)$, the center frequency of each IMF is denoted by $\omega (t)$, and the modal sum pairs are approximated to be equal to the original signal, subject to certain error allowances. The next step in the construction of the VMD is as follows:
By applying the Hilbert transform, the analytical signal of ${u}_{k}(t)$ is obtained, its one-sided spectrum is calculated, and an exponential term is added to correct the estimated frequency of the center of each modal function and multiply it with the exponential signal of the one-sided interspectrum to modulate the center band to the corresponding baseband [34]. The square of the gradient of the demodulated signal is computed to determine the estimated fundamental frequency range of each modal function. This process establishes the constrained variational problem, leading to the following mathematical model:
$$\begin{array}{c}\mathrm{G}=\underset{\left\{{u}_{k}\right\},\left\{{\omega}_{k}\right\}}{\mathrm{min}}\left\{{\displaystyle \sum _{k=1}^{K}}{\partial}_{t}\left[\left(\delta (t)+\frac{\mathrm{j}}{\pi t}\right)\ast {u}_{k}(t)\right]{{\mathrm{e}}^{-\mathrm{j}{\omega}_{k}t}\Vert}_{2}^{2}\right\}\\ \mathrm{s}.\mathrm{t}.{\displaystyle \sum _{k=1}^{K}}{u}_{k}=h(t)\end{array}$$
where G is the minimum of the sum of the ${u}_{k}(t)$ estimated bandwidths, K is the number of modal decompositions, ${\partial}_{t}$ is the bias of the sought function, $\delta (t)$ is the Dirac distribution function, “*” is the convolution operator, and $h(t)$ is the original load signal [35].
Introducing the Lagrange multiplier $\epsilon (t)$ and the quadratic penalty factor $\alpha $, the constrained problem is transformed into an unconstrained problem, and the expression of the augmented Lagrange function $L$ is obtained as shown in Equation $\left\{{u}_{k}(t)\right\},\left\{{\omega}_{k}\right\}$ is the set of the decomposed modal components and the center frequency:
$$L\left(\left\{{u}_{k}\right\},\left\{{\omega}_{k}\right\},\epsilon \right)=\alpha {\displaystyle \sum _{k=1}^{K}{\Vert {\partial}_{t}\left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_{k}(t)\right]{e}^{-j{\omega}_{k}t}\Vert}_{2}^{2}}+{\Vert h(t)-{\displaystyle \sum _{k=1}^{K}{u}_{k}\left(t\right)}\Vert}_{2}^{2}+\u2329\epsilon (t),h(t)-{\displaystyle \sum _{k=1}^{K}{u}_{k}(t)}\u232a$$
The modal components and center frequencies are solved iteratively by the alternating direction multiplier method (ADMM) to find the “saddle point” of the improved Lagrangian. Then, the updated ${u}_{k}$, ${\omega}_{k}$ and $\lambda $ are:
$${u}_{k}^{n+1}\left(\omega \right)=\frac{f\left(\omega \right)-{\displaystyle \sum _{i\ne k}}{u}_{i}\left(\omega \right)+\frac{\lambda \left(\omega \right)}{2}}{1+2\alpha {\left(\omega -{\omega}_{k}\right)}^{2}}$$
$${\omega}_{k}^{n+1}=\frac{{\int}_{0}^{\infty}\omega |{\widehat{u}}_{k}^{n+1}(\omega ){|}^{2}\mathrm{d}\omega}{{\int}_{0}^{\infty}|{\widehat{u}}_{k}^{n+1}(\omega ){|}^{2}\mathrm{d}\omega}$$
$${\widehat{\lambda}}^{n+1}(\omega )={\widehat{\lambda}}^{n}(\omega )+\gamma [\widehat{f}(\omega )-\sum _{k=1}^{n}{\widehat{u}}_{k}^{n+1}(\omega )]$$
The judgement accuracy is $\epsilon >0$, where $\epsilon $ is the iteration error iteration stop condition reached that stops:
$$\sum _{k}{\Vert {\widehat{u}}_{k}^{n+1}-{\widehat{u}}_{k}^{n}\Vert}_{2}^{2}/{\Vert {\widehat{u}}_{k}^{n}\Vert}_{2}^{2}<\epsilon$$
3.2. Long Short-Term Memory
LSTM (long short-term memory) is a deep-learning model commonly used to process sequence data and is particularly good at capturing long-term dependencies. It controls the flow of information through gating mechanisms, including input gates, forgetting gates, and output gates, so as to efficiently process sequence data and avoid the problem of vanishing or exploding gradients [36]. LSTM has achieved a wide range of applications in natural language processing, time series prediction and other fields [37]. In LSTM, the cell state acts like a conveyor belt that passes information through the chain structure with minimal changes so that the information flows through almost unchanged [38]. Gates control the inflow and outflow of information, and they are composed of sigmoid neural network layers and a dot-multiply operation that is able to decide which information is important enough should be retained or forgotten. Its structure is shown schematically in Figure 1, as well as the formulas for each variable as follows:
$$\left\{\begin{array}{c}{f}_{t}=\sigma ({W}_{f}\cdot [{h}_{t-1},{x}_{t}]+{b}_{f})\\ {i}_{t}=\sigma ({W}_{i}\cdot [{h}_{t-1},{x}_{t}]+{b}_{i})\\ {C}_{t}^{\prime}=\mathrm{tan}\mathrm{h}({W}_{C}\cdot [{h}_{t-1},{x}_{t}]+{b}_{C})\\ {C}_{t}={f}_{t}\ast {C}_{t-1}+{i}_{t}\ast {C}_{t}^{\prime}\\ {o}_{t}=\sigma ({W}_{o}\cdot [{h}_{t-1},{x}_{t}]+{b}_{o})\\ {h}_{t}={o}_{t}\ast \mathrm{tan}\mathrm{h}({C}_{t})\end{array}\right.$$
3.3. Dung Beetle Algorithm for Multi-Strategy Optimization
3.3.1. Dung Beetle Optimizer
The dung beetle optimizer (DBO) is a novel algorithm introduced by Jiankai Xue and Bo Shen in 2022 [39]. It draws inspiration from the dung beetle’s activities, such as ball-rolling, dancing, breeding, foraging, and stealing. This algorithm combines global exploration and local exploitation, leading to rapid convergence and precise results.
The specific strategy improvements of multi-strategy optimization of the dung beetle optimizer proposed in this study are as follows:
Dung Beetle Rolling Balls
Dung beetles will make a ball of dung and roll it to a desired location, and during the rolling process, dung beetles keep the ball of dung rolling in a straight line by, for example, the position of the Sun or the direction of the wind [40]. The updating of the position of the dung beetle during the ball-rolling behavior can be expressed as:
$$\begin{array}{c}{x}_{j}\left(t+1\right)={x}_{j}\left(t\right)+\alpha \times k\times {x}_{j}\left(t-1\right)+b\times \mathsf{\Delta}x\end{array}$$
$$\mathsf{\Delta}x=\left|{x}_{j}\left(t\right)-{x}_{w}\right|$$
where $t$ represents the current iteration number, ${x}_{j}\left(t\right)$ denotes the position information of the $j$th dung beetle at the $t$th iteration, $\alpha $ is a natural coefficient indicating whether it deviates from the original direction and is assigned as −1 or 1 according to the probabilistic method, $k\in (0,0.2)$ denotes the deflection coefficient, $b\in (0,1)$ denotes a constant, $k$ and $b$ are set to be 0.1 and 0.3, respectively, and ${x}_{w}$ denotes the global worst position, and $\mathsf{\Delta}x$ is used to simulate the change of light intensity.
When a dung beetle encounters an obstacle, it performs a behavior called dancing, by which it reorients itself to go around the obstacle. A tangent function can be used to simulate the dancing behavior of the dung beetle to obtain a new rolling direction. Once the dung beetle has successfully determined a new direction, it will continue to roll forward. The dancing behavior of a dung beetle can be defined as follows:
$${x}_{j}\left(t+1\right)={x}_{j}\left(t\right)+\mathrm{tan}\left(\beta \right)\left|{x}_{j}\left(t\right)-{x}_{j}\left(t-1\right)\right|$$
where $\beta \in (0,\pi ]$ denotes the angle of deflection, and the position of the dung beetle is not updated when $\pi $ is equal to 0, $\frac{\pi}{2}$ or $\pi $.
Breeding Dung Beetles
In nature, dung beetles roll their dung balls to a safe place and bury them to provide a safe environment for their offspring. Therefore, the selection of appropriate oviposition sites is crucial for dung beetles. Inspired by this scenario, a boundary selection strategy is proposed to model the female dung beetle’s choice of spawning behavior [41]. The strategy is:
$$L{b}^{n}=max\left({x}_{best}\times \left(1-R\right),Lb\right)$$
$$U{b}^{n}=min\left({x}_{best}\times \left(1-R\right),Ub\right)$$
where ${x}_{best}$ represents the current local optimal position, while $U{b}^{n}$ and $L{b}^{n}$ represent the upper and lower bounds of the spawning area, respectively. $Ub$ and $Lb$ represent the upper and lower bounds of the optimization problem, respectively, where $R=1-t/{T}_{max}$ and ${T}_{max}$ denotes the maximum number of iterations.
From the above equation, it can be seen that the boundary range of the spawning area is dynamically changing, and this change is mainly affected by the R value. Therefore, the position of the breeding sphere changes dynamically during the iterative process. The iterative process is represented as follows:
$${B}_{j}\left(t+1\right)={x}_{best}+{b}_{1}\times \left({B}_{j}\left(t\right)-L{b}^{n}\right)+{b}_{2}\times \left({B}_{j}\left(t\right)-U{b}^{n}\right)$$
where ${b}_{1}$ and ${b}_{1}$ represent two independent random vectors of size $1\times D$, and $D$ denotes the dimension of the optimization problem. ${B}_{j}\left(t\right)$ is the position information of the $j$th brood ball at the $t$th iteration.
Foraging Dung Beetles
Upon hatching, young dung beetles tend to seek food in the most favorable foraging zone. Therefore, it is crucial to define the optimal foraging area to direct the dung beetles toward food sources, with continuous dynamic updates, and the formula for regional positioning and location update is as follows:
$$L{b}^{l}=max\left({X}_{best}\times \left(1-R\right),Lb\right)$$
$$U{b}^{l}=min\left({X}_{best}\times \left(1-R\right),Ub\right)$$
$${x}_{j}\left(t+1\right)={x}_{j}\left(t\right)+{C}_{1}\times \left({x}_{j}\left(t\right)-L{b}^{l}\right)+{C}_{2}\times \left({x}_{j}\left(t\right)-U{b}^{l}\right)$$
where ${X}_{best}$ denotes the current global optimal position, i.e., the best foraging position, $L{b}^{l}$ and $U{b}^{l}$ denote the lower and upper bounds of the optimal foraging area, respectively, ${C}_{1}$ is a random number obeying a normal distribution, and ${C}_{2}$ is a random vector within $\left(0,1\right)$.
Stealing Dung Beetles
The thieving dung beetle steals dung balls from other dung beetles, and the thieving dung beetle’s location is updated below:
$${x}_{j}\left(t+1\right)={X}_{best}+M\times g\times \left(\left|{x}_{j}\left(t\right)-{x}_{best}\right|+\left|{x}_{j}\left(t\right)-{X}_{best}\right|\right)$$
where ${X}_{best}$ is the optimal foraging location as mentioned above, where ${x}_{j}\left(t\right)$ denotes the location of the $j$th thief in the $t$th iteration, $g$ is a random vector of size $1\times D$ obeying a normal distribution, and $M$ denotes a constant.
3.3.2. Multi-Strategy Optimized Dung Beetle Algorithm
- (a)
Lens-Imaging Inverse-Learning Strategist
At the late stage of the algorithm iteration, it is difficult for the DBO algorithm to avoid falling into the local optimum, resulting in poor convergence accuracy, slow convergence speed and other defects. In order to overcome these shortcomings, and inspired by Long Wen et al. [42], who analyzed that individuals in the late stage of the grey wolf optimization algorithm who are all clustered to the decision level region, which leads to poor population diversity, this study introduces a reverse-learning strategy combined with the lens-imaging principle to initialize the population matrix of dung beetles. The steps of reverse learning based on the lens-imaging principle are as follows:
Step 1: Initialize the population matrix of the dung beetle by selecting $N$ matrix elements as ${x}_{i,j}^{\ast}$ population vectors.
Step 2: Generate the refractive reverse population ${x}_{i,j}^{\ast}$, where the specific strategy is expressed in the following equation:
$${x}_{i,j}^{\ast}=\frac{{u}_{j}+{l}_{j}}{2}+\frac{{u}_{j}+{l}_{j}}{2k}-\frac{{x}_{i,j}}{k}$$
where ${u}_{j}$ and ${l}_{j}$ are the maximum and minimum values of the $j$th dimension in the search space, and $k$ is the scaling factor related to the lens-imaging principle. The two populations in steps one and two are sorted according to the descending power of the fitness value, and the first $N$ population vectors are used as the initialized populations.
- (b)
Spiral Search Strategy
In the dung beetle optimization algorithm (DBO), the second step is to reproduce the dung beetles, and if the current reproduction of chicks within the spawning area is followed, this will certainly lead to a rapid convergence of the population in a short period of time, but it will also lead to a decrease in the diversity of the population, which is prone to causing the algorithm to fall into a local optimum. Therefore, inspired by the head whale population rounding up prey in the whale algorithm [43], the whale algorithm uses a spiral search strategy for individual whales to update their position with prey during the iteration process, which not only ensures the convergence speed of the algorithm but also increases the diversity of individuals. The whales rounding up prey-stage formula is as follows:
$$X\left(t+1\right)=\left|{x}_{best}(t)-X(t)\right|\cdot {e}^{d}\cdot \mathrm{cos}\left(2\pi l\right)+{x}_{best}\left(t\right)$$
where the strategy is easily affected by the defined parameters. Larger $c$’s can cause the algorithm to decay too quickly, leading to a local optimum, and smaller $c$’s can cause the algorithm to converge slowly. To solve this problem, the parameter $r$ of the dynamic spiral search shape is introduced.
$$r={e}^{cg\mathrm{cos}(\frac{\pi \cdot t}{MaxIter})}$$
The updated formula for breeding dung beetles is as follows:
$${B}_{j}\left(t+1\right)={x}_{best}+{e}^{rl}\cdot \mathrm{cos}\left(2\pi l\right)\times {b}_{1}\times \left({B}_{j}\left(t\right)-L{b}^{n}\right)+{e}^{rl}\cdot \mathrm{cos}\left(2\pi l\right)\times {b}_{2}\times \left({B}_{j}\left(t\right)-U{b}^{n}\right)$$
- (c)
Optimal Value Guidance Strategy
In the third phase of the dung beetle algorithm, the foraging phase, the generation of candidate solutions is influenced by two random numbers (${K}_{1}$ and ${K}_{2}$), where ${K}_{1}$ is a random number obeying a normal distribution and ${K}_{2}$ is a random variable within $\left(0,1\right)$. This makes the probability of generating better and worse candidate solutions equal. This is inspired by the particle swarm optimization (PSO) algorithm [44], where each particle changes its speed and position as it moves according to the optimal value it has found and the current global optimal value. Therefore, the generation of candidate solutions is guided by introducing the current optimal value.
The updated formula is shown in the following equation:
$${x}_{j}\left(t+1\right)={x}_{j}\left(t\right)+{K}_{1}\times \left({x}_{j}\left(t\right)-L{b}^{l}\right)+{K}_{2}\times \left({x}_{j}\left(t\right)-U{b}^{l}\right)+\lambda \left({x}_{best}-{x}_{j}\left(t\right)\right)$$
$$\lambda ={e}^{\frac{t}{MaxIter}-1}$$
where ${x}_{best}-{x}_{j}\left(t\right)$, their difference represents the direction of the current position from the optimal position, and $\lambda $ is used to control how much the optimal value is influenced so that there is a greater probability of producing a better result. When the maximum number of iterations $MaxIter$ is increased from 0, the new position is influenced by the current optimal position, which improves the mining capability.
- (d)
Positional Update Dynamic-Weighting Coefficient Strategy
The last stage in the dung beetle optimization algorithm is the stealing stage, where it can be viewed as a local search strategy that mimics the behavior of dung beetles that may occur when they rob each other of food resources while searching for food. Approaching the global optimal solution at the beginning of the iteration will lead to an insufficient search scope and fall into the local optimum, making the search fail. To overcome this drawback, this study updates the dynamic weight coefficients by improving the position, which can better introduce the influence of stealing behavior on the search process, thus improving the global search capability and convergence of the algorithm.
The formula for updating the location of the stealing dung beetle was introduced as follows:
$$\left\{\begin{array}{l}{k}_{1}=1-\frac{{t}^{3}}{{T}^{3}}\\ {k}_{2}=\frac{{t}^{3}}{{T}^{3}}\\ {x}_{j}\left(t+1\right)={k}_{1}\times {X}_{best}+{k}_{2}\times g\times \left(\left|{x}_{j}\left(t\right)-{x}_{best}\right|+\left|{x}_{j}\left(t\right)-{X}_{best}\right|\right)\end{array}\right.$$
The weight coefficient ${k}_{1}$ is larger in the early iteration period, allowing the dung beetle to explore more optimal regions in the search space, while in the later period, the weight coefficient ${k}_{2}$ gradually increases as the stealing dung beetle approaches the optimal region, allowing the stealing dung beetle to develop in the neighborhood of the optimal region, thus improving the algorithm’s ability to balance the global exploration with the local development.
4. Electricity Load-Forecasting Model
The Algorithmic Flow of the Model
This study is based on the superposition of the dung beetle algorithm for multi-strategy optimization (MODBO), variational modal decomposition (VMD), and long short-term memory (LSTM). An overview of the entire process is depicted in Figure 2. The detailed execution steps of the proposed power load prediction model are outlined as follows:
5. Experimentation and Analysis
5.1. Source of Data
The dataset selected for this study is the public dataset of energy consumption in Spain. The selected ones are the load data for certain representative days of the four seasons of 2018. The first 70% of every season (spring, summer, fall, and winter) was chosen for training, with the remaining 30% for testing. To mitigate the impact of varying feature data sizes on the model training and fitting speed, it is essential to normalize the historical data. The normalization process is illustrated in Formula (25).
$${\mathrm{X}}_{\mathrm{norm}}=\frac{\mathrm{X}-{\mathrm{X}}_{\mathrm{m}}}{{\mathrm{X}}_{\mathrm{M}}-{\mathrm{X}}_{\mathrm{m}}}$$
where ${\mathrm{X}}_{\mathrm{norm}}$ is the normalized value, ${\mathrm{X}}_{\mathrm{m}}$ the minimum value, and ${\mathrm{X}}_{\mathrm{M}}$ the maximum value in the dataset. $\mathrm{X}$ is the value in the corresponding dataset.
5.2. Prediction Error Analysis
An accurate load-forecasting process is the process of infinite approximation to the actual power load value, but no matter how accurate, the forecasting still cannot avoid the generation of errors, so the degree of approximation through the error evaluation indexes to measure the degree of scientific estimation can improve the accuracy of the model prediction and have guiding significance for the prediction ability of the model in order to evaluate the results of the proposed model prediction. The hybrid model proposed in this paper includes signal-processing aspects with commonly used performance metrics such as PRD, SNRout and SNRimp. However, in the field of prediction, most of the performance metrics used to evaluate the goodness of the model are generally used, such as the RMSE, MAE, ${R}^{2}$, etc. Therefore, these three performance metrics are applied in this paper to evaluate the superiority of the proposed model [45]. This paper adopts the root mean square error (RMSE) and mean absolute error (MAE) to evaluate the results. These two indicators allow the characterization of the prediction accuracy, and the smaller the value, the higher the accuracy [46]. The coefficient of determination ${R}^{2}$ measures the proportion of variance explained by the model to the total variance, and the closer its value is to 1, the stronger the explanatory power of the model. It is an important indicator for evaluating the overall goodness of fit of the model. These three metrics are defined below:
$$MAE=\frac{1}{n}\sum _{i=1}^{n}\left|{y}_{ir}-{y}_{ip}\right|$$
$$RMSE=\sqrt{\frac{1}{n}\sum _{i=1}^{n}{({y}_{ir}-{y}_{ip})}^{2}}$$
$${R}^{2}=1-\frac{{\displaystyle \sum _{i=1}^{n}}{\left({y}_{ir}-{y}_{ip}\right)}^{2}}{{\displaystyle \sum _{i=1}^{n}}{\left({y}_{ir}-\overline{{y}_{ir}}\right)}^{2}}$$
where ${y}_{ir}$ is the true value, ${y}_{ip}$ is the predicted value, and $n$ is the number of samples. The closer the values of $MAE$ and $RMSE$ in the above indexes are to 0, the better, while the closer ${R}^{2}$ is to 1, the better.
5.3. Experimental Environment
In this experiment, the simulation device is a Seven Rainbow desktop, where the computer is configured with the operating system Windows 11 from China, the system is equipped with an Intel(R) Core(TM) i5-13400F processor, an NVIDIA GeForce RTX 3070 GPU, 32 GB of RAM, and it operates within the MATLAB 2022b Runtime environment from China.
5.4. Analysis of Experimental Arithmetic
5.4.1. Comparison Results of the MODBO in Test Functions
The CEC2017 test function set [47] is quite widely used, and it is challenging and academically recognized for algorithms. All the functions within the test set undergo rotations and shifts, intensifying the complexity of algorithmic optimization searches. The functions mainly contain 29 test functions: F1 and F3 are single-peak functions without local minima but only global minima, which can test the convergence ability of the algorithms; F4–F10 are multi-peak functions with local extrema, which are used to test the ability of the algorithms to jump out of the local optimum; F11–F20 are composite functions comprising three or more CEC2017 benchmark functions, where each subfunction is allocated a specific weight; F21–F30 are composite functions that incorporate a minimum of three hybrid functions or CEC 2017 benchmark functions after rotation/shift, with each subfunction having an additional bias value and a weight, and these combined functions further increase the optimization difficulty of the algorithm (where the F2 function of the original function set was officially removed due to F2 instability). F1–F7 are used in this study to show part of the test set to verify the feasibility of this improved algorithm.
To verify the feasibility of the dung beetle algorithm for multi-strategy optimization, the algorithm is compared with the original dung beetle algorithm. The test set of CEC2017 is selected and the comparison results are shown below.
As shown in Figure 3, the slope of the performance curve of the MODBO algorithm is larger, indicating that its convergence speed is faster, and compared with the DBO algorithm, the MODBO algorithm has better adaptability under the same number of iterations. After 500 iterations, the final adaptation value of the MODBO algorithm is lower than that of the DBO algorithm. Therefore, it can be concluded that based on the CEC2017 test function, after 500 iterations, the MODBO algorithm proposed in this study has a faster convergence speed, better stability, lower final adaptation value, and better performance compared with the DBO algorithm.
5.4.2. Comparison Results of the MODBO in Load Data
In order to further validate the dung beetle algorithm for multi-strategy optimization, this study applies the MODBO to the power forecasting of real power loads, and it is compared with the fruit fly optimization algorithm, the grey wolf optimization algorithm, the sparrow optimization algorithm and the dung beetle algorithm. In order to avoid chance errors, data from four seasons are used for the experiment. The comparison results are shown in Figure 4.
As shown in Figure 4, the MODBO algorithm converges faster than the traditional intelligent algorithms and has better adaptability for the same number of iterations. After 100 iterations, the final adaptation value of the MODBO algorithm is lower than that of the various intelligent algorithms compared. Thus, it can also be verified that the MODBO algorithm proposed in this study converges faster, has better stability, has lower final adaptation values, and has better performance compared to traditional intelligent algorithms.
The effectiveness of VMD depends on the choice of parameters. Among them, the penalty factor $\alpha $ is a balancing parameter that controls the decomposition integrity and the loss of band information. If the chosen value of a is too high, it will lead to the loss or redundancy of band information. Therefore, the optimal combination of parameters $[\mathrm{K},\alpha ]$ must be found to avoid over- or under-decomposition. Currently, the center frequency observation technique is used to find the optimal value of K. However, this method can only detect the number of formats K and not the penalty factor $\alpha $. In general, the penalty factor $\alpha $ is in the range of 1000–3000, with a usual value of 2000.The four-season VMD obtained by the center-frequency method is shown in Figure 5.
Table 1 shows the results of using the MODBO algorithm to determine the K-value and the center frequency method to identify the VMD.
Figure 5 shows the effect of VMD. From left to right, the decomposition is performed for spring, summer, autumn, and winter for the data selected for these four seasons. The graph shows the modal number K of the decomposition calculated by applying the center frequency method, which is used to determine the IMF component for each season. The first graph for each season is the raw data for that season, respectively. For example, spring has five IMF components and summer has six IMF components. These IMFs (intrinsic mode functions) are obtained by solving a variational problem in order to make the bandwidth of each mode as small as possible. In a VMD, each IMF represents a specific frequency component of the signal, usually in the order of frequency from highest to lowest. Specifically, IMF1 is the first mode obtained from decomposition and usually contains the highest frequency component in the signal. It captures fast-changing parts of the signal, such as noise or high-frequency oscillations. IMF2 is the second mode, usually lower in frequency than IMF1. It may represent the next highest frequency component of the signal, capturing smoother, but still relatively fast, changes than IMF1. As the third mode, IMF3 is usually lower in frequency than both IMF1 and IMF2, and it may contain the more fundamental signal trends or cyclical components. The exact meaning and significance of each IMF depends on the characteristics of the original signal and the purpose of the analysis. One of the main advantages of the VMD is that it is able to adaptively extract modes of obvious physical significance from the signal, which makes it very effective in dealing with complex signals, especially in the analysis of non-stationary signals.
5.4.3. Comparison of Four-Season Load-Forecasting Models without Weather
For the purposes of this research project, the dates selected were for a random week of data during the spring, summer, fall, and winter seasons of 2018: 4.1–4.7 for spring, 8.11–8.17 for summer, 11.11–11.17 for fall, and 1.11–1.17 for winter. The prediction accuracy of the model applying the MVMO-LSTM proposed in this research was compared with the prediction accuracy of the conventional model. Figure 6 and Table 2 show the results of the evaluation. Among them, the performance metrics of the improved model proposed in this study have been bolded in bold.
(a) The table in this paper compares the MVMO-LSTM with the metrics of each algorithm from top to bottom. The metrics for each of the four seasons are compared from left to right, and “Target” is the original data. Compared with the MVMO-LSTM, the prediction performance of LSTM and the combination model of various optimization algorithms in the spring, autumn and winter seasons is not highly evaluated, and some indicators are slightly lower in the early period, which is susceptible to environmental changes. The average RMSE, MAE and R^{2} of LSTM in each season were 1.0959 KW, 0.9388 KW and 84.72%, respectively. Smoothing the data with VMD can reduce the noise interference. As a result, the average RMSE, MAE and R^{2} of the MVMO-LSTM were 0.2530 KW, 0.2195 KW and 99.27%, respectively. From these three indicators, this multi-strategy dung beetle algorithm is more advantageous compared with the combined models of other algorithms.
(b) In addition, most of the other conventional algorithms optimize LSTM with RMSE values in the range of 0.3676 KW to 1.0519 KW. Comparing LSTM with the MVMO-LSTM from Table 2, these models are also closer to the predicted results. Compared to other conventional algorithms with optimized LSTM, the MVMO-LSTM integrated prediction model has R^{2} over 99% for different datasets. In comparison, it has strong stability and adaptability.
5.4.4. Comparison of Four-Season Load Forecasting Models with Weather
The experiment is to implement load forecasting under conditions with weather characteristics. Temperature, humidity, atmospheric pressure and load values are used as inputs. Among them, the performance metrics of the improved model proposed in this study have been bolded in bold.
(a) The Figure 7 experiment without weather for one is a multiple-input, single-output experiment. In this case, meteorological data features such as the temperature and humidity are used, which are input into the model as additional feature variables, and feature selection or dimensionality reduction techniques are used to filter out features that have a greater impact on power load forecasting, thus reducing the impact of weather disturbances on the forecasting results. The table also compares the metrics of the MVMO-LSTM with each algorithm from top to bottom. The metrics for each of the four seasons are compared from left to right. The “target” is the raw data.
(b) From Figure 7 and Table 3, it can be seen that when the input conditions increase, the accuracy of the load prediction is affected so as to be worse compared to the load pre-diction without weather features. However, the four-season average of the RMSE, MAE, and R^{2} of the MVMO-LSTM can still be maintained at 0.3250 KW, 0.2828 KW, and 98.85%. The average RMSE, MAE and R^{2} compared to the MVMO-LSTM without weather are 0.2530 KW, 0.2195 KW and 99.27%, respectively. It can be concluded that the influential factor of weather has an influential relationship on short-term power load forecasting. However, the hybrid model proposed in this paper is still superior in all the indicators when compared with the model combined with ordinary intelligent algorithms. The RMSE and MAE of each season are lower, and the R^{2} is closer to 1.
(c) Figure 8 shows a plot of the four-season prediction model error comparisons, which is shown on the left half of the figure for the four-season model error comparisons without input features, such as weather, and on the right half of the figure for the four-season model error comparisons with input features, such as weather. The closer the error is to 0, the better the prediction of the model. It can be concluded from this figure that whether the MVMO-LSTM is applied to load data without input features such as weather or with input features such as weather, the error of the model is closer to 0 compared to the other algorithms.
(d) The MVMO-LSTM model performs the best in the comparison of this dataset, with a smaller average error and more stable prediction capability. This highlights the significant advantage of the model in processing the current dataset, making it closer to actual historical data.
5.4.5. Comparison of Different Signal Decomposition Algorithms
In order to avoid the chance of experiments, in this experiment, one week’s data in four seasons without weather and other influencing factors are also selected and compared with the different signal decomposition algorithms. The experimental results show that the MVMO-LSTM prediction system outperforms the other traditional signal decomposition algorithms. Figure 9 and Table 4 summarize the various performance metrics of the proposed hybrid model. More in-depth details are provided below. Among them, the performance metrics of the improved model proposed in this study have been bolded in bold.
- (1)
Based on Figure 9 and Table 4, it can be concluded that the efficiency of the proposed hybrid prediction system is better than the LSTM model based on signal decomposition such as VMD, and the mean values of the evaluation metrics are RMSE = 0.1564 KW, MAE = 0.1216 KW, and R^{2} = 99.20%, respectively. In addition, the model metrics with VMD were superior to the other four hybrid models as far as winter is concerned. The evaluation metrics were RMSE = 0.4572 KW, MAE = 0.3871 KW and R^{2} = 96.02%.
- (2)
In addition, the CEEMDAN-LSTM outperforms the other three hybrid models in all three seasons: spring, summer and fall. The average RMSE of the three seasons = 0.4012 KW. MAE = 0.3045 KW. R^{2} = 97.89%. Due to the multi-strategy optimization of the dung beetle algorithm used in this study, the k-value of the VMD as well as the penalty factor were optimized. In addition to this, the parameters of LSTM are optimized. As for the MVMO-LSTM, this hybrid model is optimal in various experimental comparisons for various performance metrics. Thus, the proposed prediction model in this study provides better potential.
6. Conclusions
In this study, we developed the MVMO-LSTM model by incorporating the advantages of the variational mode decomposition (VMD), long short-term memory (LSTM), and dung beetle optimization methods and multi-strategy improvement. In particular, the utilization of variational mode decomposition with the dung beetle algorithm incorporating multi-strategy optimization intelligently adjusts the parameters to mitigate signal noise effects. Furthermore, optimizing the long- and short-term memory with this algorithm significantly enhances the prediction accuracy by speeding up the convergence and steering clear of local optima. In addition, in this study, three assessment metrics are used to evaluate the predictions of the proposed model and other widely used models: the MAE, the RMSE, and the R^{2}. Ultimately, the integrated model underwent comparison with alternative predictive models and testing to validate the accuracy and consistency of its forecasts. In conclusion, the following summarizing points are established:
- (1)
The model proposed in this study achieves significant accuracy in power load forecasting.
- (2)
The predictions of this hybrid model are more robust and efficient than a single model or a model with only one fusion algorithm.
- (3)
A hybrid MODBO-VMD-MODBO-LSTM model is proposed, and previous studies need to reset the decomposition number when the center frequency domain is determined, and they seldom explore the penalty factor and decomposition layer values. This multi-strategy improved dung beetle algorithm not only optimizes the penalty factor and decomposition layer values of VMD but also optimizes the optimal number of hidden units, the optimal maximum training period and the optimal initial learning rate of LSTM. The combination of these two improvements can effectively help LSTM learn and adapt to the complexity and seasonal changes of power load data, thus improving the prediction accuracy and making the prediction results more reliable and practical.
Therefore, more efforts are needed to further improve the performance of power load-forecasting models and expand their application scenarios.
Author Contributions
All of the authors contributed extensively to this work. Software, conceptualization, methodology and writing—original draft preparation, J.C.; Reviewing and software, writing—review and editing, L.L.; Data curation and visualization, K.G.; supervision, S.L.; funding, D.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Natural Science Foundation of Fujian Province (Grant No. 2022N5020, 2022H6005).
Data Availability Statement
The experimental dataset in this paper is the Spanish Electricity Load Dataset, which is available from the authors upon request.
Acknowledgments
Our profound appreciation goes out to the editors and reviewers whose meticulous effort and insightful comments helped make this publication stronger.
Conflicts of Interest
The authors state that they have no financial or personal ties to other parties that might potentially affect the results presented in this study.
References
- Wu, C.; Yao, J.; Xue, G. Load prediction of integrated energy system based on MMoE multi-task learning and long short-term memory network. Electr. Power Autom. Equip. 2022, 42, 78–87. [Google Scholar]
- Eren, Y.; Küçükdemiral, İ. A comprehensive review on deep learning approaches for short-term load forecasting. Renew. Sustain. Energy Rev. 2024, 189, 114031. [Google Scholar] [CrossRef]
- Cheng, X.; Wang, L.; Zhang, P.; Yan, Q.; Shi, H. Data Characteristics and Short-term Forecasting of Regional Power Load. Dianwang Jishu/Power Syst. Technol. 2022, 46, 1092–1099. [Google Scholar] [CrossRef]
- Zhao, Y.; Guo, N.; Chen, W.; Zhang, H.; Guo, B.; Shen, J.; Tian, Z. Multi-step ahead forecasting for electric power load using an ensemble model. Expert Syst. Appl. 2023, 211, 118649. [Google Scholar] [CrossRef]
- Zhu, H.; Li, X.; Sun, Q.; Nie, L.; Yao, J.; Zhao, G. A power prediction method for photovoltaic power plant based on wavelet decomposition and artificial neural networks. Energies 2015, 9, 11. [Google Scholar] [CrossRef]
- Wang, J.-Z.; Wang, Y.; Jiang, P. The study and application of a novel hybrid forecasting model—A case study of wind speed forecasting in China. Appl. Energy 2015, 143, 472–488. [Google Scholar] [CrossRef]
- Prasad, R.; Ali, M.; Kwan, P.; Khan, H. Designing a multi-stage multivariate empirical mode decomposition coupled with ant colony optimization and random forest model to forecast monthly solar radiation. Appl. Energy 2019, 236, 778–792. [Google Scholar] [CrossRef]
- Ren, Y.; Suganthan, P.N.; Srikanth, N. A novel empirical mode decomposition with support vector regression for wind speed forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2014, 27, 1793–1798. [Google Scholar] [CrossRef]
- Lin, G.; Lin, A.; Cao, J. Multidimensional KNN algorithm based on EEMD and complexity measures in financial time series forecasting. Expert Syst. Appl. 2021, 168, 114443. [Google Scholar] [CrossRef]
- Ostertagova, E.; Ostertag, O. Forecasting using simple exponential smoothing method. Acta Electrotech. Et Inform. 2012, 12, 62. [Google Scholar] [CrossRef]
- Rizwan, M.; Alharbi, Y.R. Artificial Intelligence Based Approach for Short Term Load Forecasting for Selected Feeders at Madina Saudi Arabia. Int. J. Electr. Electron. Eng. Telecommun. 2021, 10, 300–306. [Google Scholar] [CrossRef]
- Xin, A.; Zhiyu, Z.; Yanping, W.; Hongzhi, Z. Bidding strategy for time-shiftable loads based on autoregressive integrated moving average model. Autom. Electr. Power Syst. 2017, 41, 26–31. [Google Scholar]
- Mahmud, A. Isolated area load forecasting using linear regression analysis: Practical approach. Energy Power Eng. 2011, 3, 547–550. [Google Scholar] [CrossRef]
- Liu, L.; Zhao, Y.; Chang, D.; Xie, J.; Ma, Z.; Sun, Q.; Yin, H.; Wennersten, R. Prediction of short-term PV power output and uncertainty analysis. Appl. Energy 2018, 228, 700–711. [Google Scholar] [CrossRef]
- Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical Load Forecasting Based on Random Forest, XGBoost, and Linear Regression Algorithms. In Proceedings of the 2023 International Conference on Information Technology (ICIT), Orlando FL, USA, 4–6 April 2023; pp. 25–31. [Google Scholar]
- Kong, X.; Zheng, F.; Zhijun, E.; Cao, J.; Wang, X. Short-term load forecasting based on deep belief network. Autom. Electr. Power Syst. 2018, 42, 133–139. [Google Scholar]
- Dai, Y.; Zhao, P. A hybrid load forecasting model based on support vector machine with intelligent methods for feature selection and parameter optimization. Appl. Energy 2020, 279, 115332. [Google Scholar] [CrossRef]
- Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
- Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
- Gao, X.; Li, X.; Zhao, B.; Ji, W.; Jing, X.; He, Y. Short-term electricity load forecasting model based on EMD-GRU with feature selection. Energies 2019, 12, 1140. [Google Scholar] [CrossRef]
- Naz, A.; Javaid, N.; Khalid, A.; Shoaib, M.; Imran, M. Electric Load Forecasting using EEMD and Machine Learning Techniques. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 2124–2127. [Google Scholar]
- Huang, Y.; Hasan, N.; Deng, C.; Bao, Y. Multivariate empirical mode decomposition based hybrid model for day-ahead peak load forecasting. Energy 2022, 239, 122245. [Google Scholar] [CrossRef]
- Ran, P.; Dong, K.; Liu, X.; Wang, J. Short-term load forecasting based on CEEMDAN and Transformer. Electr. Power Syst. Res. 2023, 214, 108885. [Google Scholar] [CrossRef]
- Elouaham, S.; Nassiri, B.; Dliou, A.; Zougagh, H.; El Kamoun, N.; El Khadiri, K.; Said, S. Combination time-frequency and empirical wavelet transform methods for removal of composite noise in EMG signals. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2023, 21, 1373–1381. [Google Scholar] [CrossRef]
- Cai, C.; Li, Y.; Su, Z.; Zhu, T.; He, Y. Short-Term Electrical Load Forecasting Based on VMD and GRU-TCN Hybrid Network. Appl. Sci. 2022, 12, 6647. [Google Scholar] [CrossRef]
- Ge, Q.; Jiang, H.; He, M.; Zhu, Y.; Zhang, J. Power load forecast based on fuzzy BP neural networks with dynamical estimation of weights. Int. J. Fuzzy Syst. 2020, 22, 956–969. [Google Scholar] [CrossRef]
- Yuan, C.; Niu, D.; Li, C.; Sun, L.; Xu, L. Electricity consumption prediction model based on Bayesian regularized bp neural network. In Cyber Security Intelligence and Analytics; Springer: Cham, Switzerland, 2020; pp. 528–535. [Google Scholar]
- Khashei, M.; Chahkoutahi, F. Electricity demand forecasting using fuzzy hybrid intelligence-based seasonal models. J. Model. Manag. 2022, 17, 154–176. [Google Scholar] [CrossRef]
- Shi, H.; Xu, M.; Ma, Q.; Zhang, C.; Li, R.; Li, F. A whole system assessment of novel deep learning approach on short-term load forecasting. Energy Procedia 2017, 142, 2791–2796. [Google Scholar] [CrossRef]
- Muzaffar, S.; Afshari, A. Short-term load forecasts using LSTM networks. Energy Procedia 2019, 158, 2922–2927. [Google Scholar] [CrossRef]
- Zhuang, Z.; Zheng, X.; Chen, Z.; Jin, T. A reliable short-term power load forecasting method based on VMD-IWOA-LSTM algorithm. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 1121–1132. [Google Scholar] [CrossRef]
- Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
- Li, W.; Quan, C.; Wang, X.; Zhang, S. Short-Term Power Load Forecasting Based on a Combination of VMD and ELM. Pol. J. Environ. Stud. 2018, 27, 2143–2154. [Google Scholar] [CrossRef]
- Lian, J.; Liu, Z.; Wang, H.; Dong, X. Adaptive variational mode decomposition method for signal processing based on mode characteristic. Mech. Syst. Signal Process. 2018, 107, 53–77. [Google Scholar] [CrossRef]
- ur Rehman, N.; Aftab, H. Multivariate variational mode decomposition. IEEE Trans. Signal Process. 2019, 67, 6039–6052. [Google Scholar] [CrossRef]
- Li, K.; Huang, W.; Hu, G.; Li, J. Ultra-short term power load forecasting based on CEEMDAN-SE and LSTM neural network. Energy Build. 2023, 279, 112666. [Google Scholar] [CrossRef]
- Sheng, Z.; An, Z.; Wang, H.; Chen, G.; Tian, K. Residual LSTM based short-term load forecasting. Appl. Soft Comput. 2023, 144, 110461. [Google Scholar] [CrossRef]
- Liu, S.; Kong, Z.; Huang, T.; Du, Y.; Xiang, W. An ADMM-LSTM framework for short-term load forecasting. Neural Netw. 2024, 173, 106150. [Google Scholar] [CrossRef]
- Xue, J.; Shen, B. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. J. Supercomput. 2023, 79, 7305–7336. [Google Scholar] [CrossRef]
- Li, Y.; Sun, K.; Yao, Q.; Wang, L. A dual-optimization wind speed forecasting model based on deep learning and improved dung beetle optimization algorithm. Energy 2024, 286, 129604. [Google Scholar] [CrossRef]
- Zhu, F.; Li, G.; Tang, H.; Li, Y.; Lv, X.; Wang, X. Dung beetle optimization algorithm based on quantum computing and multi-strategy fusion for solving engineering problems. Expert Syst. Appl. 2024, 236, 121219. [Google Scholar] [CrossRef]
- Long, W.; Wu, T.-B.; Tang, M.-Z.; Xu, M.; Cai, S.-H. Grey Wolf Optimizer Algorithm Based on Lens Imaging Learning Strategy. Zidonghua Xuebao/Acta Autom. Sin. 2020, 46, 2148–2164. [Google Scholar] [CrossRef]
- Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
- Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
- El Khadiri, K.; Elouaham, S.; Nassiri, B.; El Melhaoui, O.; Said, S.; El Kamoun, N.; Zougagh, H. A Comparison of the Denoising Performance Using Capon Time-Frequency and Empirical Wavelet Transform Applied on Biomedical Signal. Int. J. Eng. Appl. 2023, 11, 358. [Google Scholar] [CrossRef]
- Liu, L.; Guo, K.; Chen, J.; Guo, L.; Ke, C.; Liang, J.; He, D. A Photovoltaic Power Prediction Approach Based on Data Decomposition and Stacked Deep Learning Model. Electronics 2023, 12, 2764. [Google Scholar] [CrossRef]
- Wu, G.; Mallipeddi, R.; Suganthan, P.N. Problem Definitions and Evaluation Criteria for the CEC 2017 Competition on Constrained Real-Parameter Optimization; Technical Report; National University of Defense Technology: Changsha, China; Kyungpook National University: Daegu, Republic of Korea; Nanyang Technological University: Singapore, 2017. [Google Scholar]
Figure 1. LSTM schematic diagram.
Figure 1. LSTM schematic diagram.
Figure 2. MODBO-VMD-MODBO-LSTM flowchart.
Figure 2. MODBO-VMD-MODBO-LSTM flowchart.
Figure 3. Comparison plot of the convergence curves for different test functions.
Figure 3. Comparison plot of the convergence curves for different test functions.
Figure 4. Comparison plot of the convergence curves of different optimization algorithms.
Figure 4. Comparison plot of the convergence curves of different optimization algorithms.
Figure 5. VMD effect diagram.
Figure 5. VMD effect diagram.
Figure 6. Comparison of the various models without weather.
Figure 6. Comparison of the various models without weather.
Figure 7. Comparison chart of the various models with weather.
Figure 7. Comparison chart of the various models with weather.
Figure 8. Comparison of the forecast model errors for four seasons.
Figure 8. Comparison of the forecast model errors for four seasons.
Figure 9. Comparison chart of the various models with weather.
Figure 9. Comparison chart of the various models with weather.
Table 1. Decomposition results of the VMD for each season.
Table 1. Decomposition results of the VMD for each season.
Season | K (MODBO) | $\mathit{\alpha}$ (MODBO) | K | $\mathit{\alpha}$ |
---|---|---|---|---|
Spring | 11 | 292 | 5 | 2000 |
Summer | 10 | 100 | 6 | 2000 |
Autumn | 8 | 223 | 5 | 2000 |
Winter | 8 | 110 | 5 | 2000 |
Table 2. Results of the performance comparison of the four-season combination prediction model without weather.
Table 2. Results of the performance comparison of the four-season combination prediction model without weather.
Model | Spring | Summer | Autumn | Winter | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | |
MVMO- LSTM | 0.2448 | 0.1899 | 0.991 | 0.2596 | 0.2238 | 0.996 | 0.2935 | 0.2832 | 0.991 | 0.2140 | 0.1810 | 0.993 |
LSTM | 1.0860 | 0.8909 | 0.806 | 1.2886 | 1.0724 | 0.883 | 0.5294 | 0.4682 | 0.789 | 1.4795 | 1.3237 | 0.911 |
FOA-LSTM | 0.7702 | 0.6614 | 0.847 | 1.3927 | 1.2455 | 0.895 | 0.5212 | 0.4222 | 0.797 | 1.5234 | 1.3238 | 0.910 |
GWO-LSTM | 0.2696 | 0.2087 | 0.985 | 0.5308 | 0.4506 | 0.983 | 0.3013 | 0.2755 | 0.986 | 0.3688 | 0.3367 | 0.977 |
SSA-LSTM | 0.2776 | 0.2159 | 0.981 | 0.5172 | 0.4299 | 0.986 | 0.2467 | 0.2156 | 0.988 | 0.5416 | 0.4342 | 0.971 |
DBO-LSTM | 0.2819 | 0.2183 | 0.980 | 0.7232 | 0.4382 | 0.937 | 0.3126 | 0.2978 | 0.984 | 0.3545 | 0.3157 | 0.978 |
Table 3. Results of the performance comparison of the four-season combination prediction model.
Table 3. Results of the performance comparison of the four-season combination prediction model.
Model | Spring | Summer | Autumn | Winter | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | |
MVMO- LSTM | 0.2582 | 0.2013 | 0.988 | 0.2785 | 0.2409 | 0.993 | 0.3026 | 0.2914 | 0.990 | 0.4607 | 0.3977 | 0.983 |
LSTM | 1.1098 | 0.9024 | 0.794 | 1.2886 | 1.0724 | 0.883 | 0.5338 | 0.4738 | 0.775 | 2.5911 | 2.4433 | 0.637 |
FOA-LSTM | 0.8111 | 0.6895 | 0.833 | 1.5098 | 1.3224 | 0.883 | 0.6940 | 0.6007 | 0.757 | 2.3165 | 2.1342 | 0.701 |
GWO-LSTM | 0.2904 | 0.2250 | 0.979 | 1.0904 | 0.9146 | 0.943 | 0.3816 | 0.2546 | 0.979 | 0.7142 | 0.5931 | 0.956 |
SSA-LSTM | 0.3927 | 0.3159 | 0.971 | 0.5899 | 0.4781 | 0.980 | 0.3194 | 0.2934 | 0.985 | 0.7074 | 0.5911 | 0.958 |
DBO-LSTM | 0.3462 | 0.2738 | 0.975 | 0.5156 | 0.4382 | 0.989 | 0.3457 | 0.3323 | 0.981 | 0.6812 | 0.5638 | 0.961 |
Table 4. An assessment of the predictive efficacy of the proposed model and four hybrid models.
Table 4. An assessment of the predictive efficacy of the proposed model and four hybrid models.
Model | Spring | Summer | Autumn | Winter | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | |
MVMO- LSTM | 0.1502 | 0.1192 | 0.9932 | 0.1732 | 0.1289 | 0.9907 | 0.1013 | 0.0893 | 0.9955 | 0.2008 | 0.1488 | 0.9879 |
EMD- LSTM | 0.4724 | 0.3615 | 0.9311 | 0.5024 | 0.3915 | 0.9393 | 0.4354 | 0.3315 | 0.9406 | 0.5324 | 0.4447 | 0.9328 |
VMD- LSTM | 0.4230 | 0.3393 | 0.9532 | 0.4328 | 0.3422 | 0.9482 | 0.4302 | 0.3306 | 0.9582 | 0.4572 | 0.3871 | 0.9602 |
CEEMDAN-LSTM | 0.3827 | 0.2868 | 0.9752 | 0.4137 | 0.3163 | 0.9772 | 0.4073 | 0.3103 | 0.9843 | 0.4617 | 0.4016 | 0.9561 |
EEMD-LSTM | 0.4109 | 0.3190 | 0.9572 | 0.4435 | 0.3493 | 0.9637 | 0.4144 | 0.3190 | 0.9573 | 0.4704 | 0.4245 | 0.9477 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).