本文介绍在每层有3种选择、共100层的多层树中,快速计算从根到叶所有路径中最大累积收益的方法——利用动态规划思想的自顶向下逐层更新,时间复杂度仅为 o(n),远优于暴力枚举的 o(3¹⁰⁰)。
该问题本质是一个状态依赖型树形最优化问题:每个节点的即时收益不仅取决于自身选择(1/2/3),还依赖于其父节点的选择(即转移依赖)。因此,不能简单对每层独立取最大值,而需在路径维度上维护“以某选择结尾”的最优累积收益。
核心思路是动态规划 + 层序遍历(BFS):
rev_a] + payoff(prev_a → a) }; 由于每层仅依赖前一层,空间可优化至 O(1) ——只需两个长度为 3 的数组交替更新:
import numpy as np
def max_cumulative_payoff(num_layers, payoff_matrix):
"""
payoff_matrix: 3x3 array, payoff_matrix[prev_a][a] = reward for choosing `a` after `prev_a`
Returns: maximum total payoff from root to any leaf
"""
# dp[a] = max cumulative payoff ending with action `a` at current layer
dp = np.array([0.0, 0.0, 0.0])
for layer in range(1, num_layers): # layer 0 is root (no action taken yet)
new_dp = np.full(3, -np.inf)
for prev_a in range(3):
for a in range(3):
candidate = dp[prev_a] + payoff_matrix[prev_a][a]
if candidate > new_dp[a]:
new_dp[a] = candidate
dp = new_dp
return np.max(dp)
# Example: payoff_matrix[i][j] = reward for choosing j after i
payoff_matrix = np.array([
[12, 6, 10], # after action 0
[10, 24, 14], # after action 1
[6, 10, 30] # after action 2
])
print(max_cumulative_payoff(num_layers=100, payoff_matrix=payoff_matrix)) # O(100×9) = O(n)⚠️ 注意事项:
综上,该方法以最小计算开销精准捕获状态依赖关系,是解决此类深层决策树最优化问题的标准且最优策略。