I am seeking some reference on already existing work for the following problem. Given an $n$-dimensional square matrix $A=DP$ where $D$ is a diagonal and $P$ is a permutation matrix (think of Gaussian algorithm) I would like to find a sub-rectangle (very specific sub-matrix) of $A$ that has maximum sum. Kadane's $2d$ algorithm provides $\mathcal{o}(n^3)$ solution. However my matrix $A$ is sparse and even nicer than that so my feeling that this can be done faster. Actually, I have a feeling that the lower bound on this computation should be something like $c_{\epsilon}n^{1+\epsilon}$ but I my coworkers will be happy with anything that beats $n^2\log(n)$ which is what we think we have right now (we are verifying the result).
Edit: We are actively discussing the problem in the office right now and it seems that theoretical complexity of $n^2\log(n)$ can be "easily" improved at least computationally to something like $n^{1+\epsilon}\log(n)$ in particular if we allow approximate solutions which true maximum to something like $1+\epsilon$. We have no clue how to prove this right now.