AI-Powered Bidding Strategy Enables Smarter Power Trading in Data-Scarce Markets
In the rapidly evolving landscape of electricity markets, where strategic bidding can significantly influence profitability, a new algorithmic approach is offering a lifeline to power generators operating under severe information constraints. Developed by researchers at Shanghai Jiao Tong University and North Carolina State University, this novel method leverages a refined form of reinforcement learning to help generators optimize their bidding behavior—even when historical data is sparse and competitor actions are opaque.
This breakthrough, detailed in a recent paper published in Automation of Electric Power Systems, addresses a critical gap in existing literature: the assumption that market participants have access to rich, high-quality data about rivals and system dynamics. In reality, especially during the early phases of electricity market liberalization—as seen in China’s ongoing power sector reforms—such information is often unavailable or unreliable. Traditional game-theoretic models that require full knowledge of competitors’ cost structures or bidding histories become impractical. Similarly, advanced deep reinforcement learning techniques, while powerful, demand extensive training data and computational resources that may not be feasible for real-time bidding decisions.
The team, led by Qiangang Jia, Sijie Chen, Yiyang Li, Zheng Yan, and Chengke Xu, proposes a computationally lightweight yet highly adaptive solution: the Practical Reinforcement Learning Automata (PRLA). Built upon the theoretical foundation of Continuous Action Reinforcement Learning Automata (CARLA), PRLA dramatically reduces computational overhead by discretizing the probability density function used to select bidding actions. This innovation sidesteps the need for complex symbolic integration and iterative equation solving that previously hindered CARLA’s practical deployment.
At its core, PRLA operates on a simple yet powerful principle: learn by doing. A generator begins with no prior assumptions about the market—its initial bidding strategy is uniformly random across a feasible range. After each market clearing event, the algorithm receives feedback in the form of profit (or loss). It then evaluates this outcome against recent historical performance, using a normalized reinforcement signal to adjust the likelihood of choosing similar bids in the future. Crucially, the update mechanism doesn’t just reinforce the exact bid that yielded high profit; it also boosts the probability of nearby bids through a symmetric Gaussian neighborhood function. This “neighborhood reinforcement” ensures exploration while steadily converging toward optimal behavior.
One of the paper’s most significant conceptual contributions is its modeling of the bidding process as a repeated game rather than the more commonly assumed Markov game. Markov games presume that the current market state—such as locational marginal prices (LMPs)—depends heavily on past states and actions, a condition that holds true in systems dominated by inflexible generation or high renewable penetration with storage constraints. However, in markets with abundant flexible thermal units, today’s LMP is primarily determined by today’s bids and demand, with minimal dependence on yesterday’s prices. In such environments, the repeated game framework is not only more accurate but also far less demanding in terms of state representation and memory requirements.
The researchers validated PRLA through extensive simulations on a three-node test system, modeling both steady-state and dynamic market conditions. In the steady-state scenario, two generators fixed their bids at the theoretical Nash equilibrium values, while the third used PRLA to learn its optimal strategy. The algorithm converged precisely to the analytical solution—31.0 USD/MWh—within approximately 130 iterations, demonstrating its ability to identify equilibrium behavior without any knowledge of competitors’ strategies.
More impressively, in the non-stationary environment where all three generators simultaneously employed PRLA, the system still converged to a stable outcome. The learned bids—31.1, 15.1, and 22.9 USD/MWh—deviated by less than 4% from the Nash equilibrium benchmarks. This result is particularly noteworthy because it shows that PRLA is not only effective in isolation but also robust in multi-agent settings where every participant is actively learning and adapting. Such emergent coordination without explicit communication or centralized control mirrors real-world market dynamics and underscores the algorithm’s practical relevance.
The implications for emerging electricity markets are profound. In regions where market institutions are nascent and data infrastructure is underdeveloped, generators often resort to conservative or heuristic bidding strategies that leave significant revenue on the table. PRLA offers a principled, automated alternative that requires only three pieces of information after each clearing: the generator’s own bid, the resulting dispatch volume, and the profit earned. No knowledge of rivals’ bids, system topology, or even aggregate demand is necessary. This minimal data footprint makes PRLA uniquely suited for early-stage markets like those being rolled out across China’s provinces under the national power sector reform initiative launched in 2015.
Moreover, the algorithm’s design prioritizes deployability. By avoiding deep neural networks and complex optimization subroutines, PRLA can run efficiently on standard computing hardware—critical for real-time bidding in fast-paced spot markets. Its parameters, such as the Gaussian width and learning rate, are intuitive and require minimal tuning. The use of a sliding window for historical profit data ensures that the algorithm remains responsive to changing market conditions, such as shifts in demand patterns or the entry of new competitors.
From a regulatory perspective, the adoption of such adaptive bidding algorithms raises important questions about market stability and fairness. If all participants use similar learning mechanisms, could the market oscillate or converge to undesirable equilibria? The paper’s results are reassuring: convergence to near-Nash outcomes suggests that PRLA promotes stable, efficient market operation. However, the authors acknowledge that future work must explore more complex scenarios, including the integration of renewable energy with uncertain output, multi-period bidding with unit commitment constraints, and the potential for strategic manipulation through algorithmic collusion.
The research also opens avenues for extending PRLA beyond the supply function model used in the study. While supply functions—where generators submit a linear marginal cost curve—are common in many markets, others use block or stepwise bidding formats. The core learning mechanism of PRLA, being model-agnostic in its reward processing, could potentially be adapted to these alternative frameworks with minimal modification.
For power generators, the message is clear: intelligent bidding no longer requires big data or supercomputers. With PRLA, even in the fog of market uncertainty, a path to optimal profitability can be learned through disciplined, incremental experimentation. As electricity markets worldwide continue to liberalize and digitize, such lightweight, robust learning algorithms will become indispensable tools for competitive survival.
This work exemplifies the growing synergy between artificial intelligence and energy economics. By grounding machine learning in realistic market assumptions and computational constraints, the Shanghai Jiao Tong and NC State team has delivered not just a theoretical advance, but a practical instrument for market participants navigating the complexities of modern power trading.
Authors: Qiangang Jia, Sijie Chen, Yiyang Li, Zheng Yan, Chengke Xu
Affiliations: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA
Published in: Automation of Electric Power Systems, Vol. 45, No. 6, March 25, 2021
DOI: 10.7500/AEPS20200701002