Questions about the execution speed and memory usage of algorithms, data structures, languages and libraries.
2
votes
2answers
37 views
Speedup prediction of rotating mask filter
I am trying to do the speed up analysis of the rotating mask filter (section 4.2.3).
Let $N^2$ be the pixels in the image and let $m^2$ be the neighborhood of a given pixel, what I have for my ...
2
votes
3answers
92 views
How do I get reliable timing data for time spent in function calls in my code?
This question is a follow-up to Fortran: Best way to time sections of your code?.
If I want to time functions in my code, I know I could use gprof or kcachegrind. I also know that the results from ...
3
votes
1answer
130 views
Effecient CFD programming techniques
I'm trying to make highly efficient CFD programming complex for solving combustion problems. I've finished writing core which realises mathematical model, and now I'm concerned about code performance. ...
6
votes
1answer
124 views
What is the impact of C++11 move semantics in the context of scientific computing?
C++11 introduces move semantics which can, for example, improve code performance in situations where C++03 would need to perform a copy construction or copy assignment. This article reports that ...
13
votes
0answers
124 views
How does the performance of Python/Numpy array operations scale with increasing array dimensions?
How do Python/Numpy arrays scale with increasing array dimensions?
This is based on some behaviour I noticed while benchmarking Python code for this question: How to express this complicated ...
3
votes
3answers
108 views
Evaluate the sum
I want to evaluate the sum $\sum_{k=1}^\infty (\frac{i+1}{\sqrt{2}})^k\cdot k^{-\alpha}$ where $i=\sqrt{-1}$ and $\alpha\in[\frac{3}{4},1]$ with 8 digits accuracy.
If I am willing to expend up to a ...
10
votes
4answers
163 views
Calculation of the sparsity structure for finite element matrices
Question: What methods are available to accurately and efficiently calculate the sparsity structure of a finite element matrix?
Info: I'm working on a Poisson Pressure Equation solver, using ...
2
votes
0answers
61 views
Cusp Library performance worse than PETSC (GMRES 200 iterations) Why?
I wanted to compare the speeds of the GMRES implementations in the CUSP and the PETSc libraries.
The matrix (A) used for testing was a 3d Laplacian matrix obtained by using the 7 point stencil on a ...
9
votes
1answer
120 views
Statistical models for local memory/compute, network latency, and bandwidth jitter in HPC
Parallel computation is frequently modeled using a deterministic local rate of computation, latency overhead, and network bandwidth. In reality, these are spatially variable and non-deterministic. ...
6
votes
1answer
122 views
Literature references for modeling current and future energy costs of floating-point operations and data transfers
I am searching for the most important literature and slide references for modeling current and future energy costs of floating-point operations and data transfers across the CPU, memory, network, and ...
6
votes
2answers
264 views
Memory usage in fortran when using an array of derived type with pointer
In this sample program I'm doing the same thing (at least I think so) in two different ways. I'm running this on my Linux pc and monitoring the memory usage with top. Using gfortran I find that in the ...
4
votes
1answer
140 views
How to get sparse complex matrices from my code to PETSc efficiently
What is the most efficient way to get a complex sparse matrix from my Fortran code to PETSc? I understand that this is problem dependent, so I tried to give as many relevant details as possible below.
...
5
votes
1answer
80 views
Open source implementation of rational approximation to a function
I am looking for some open source implementation (any of Python, C, C++, Fortran is fine) of rational approximation to a function. Something along the article [1].
I give it a function and it gives me ...
4
votes
1answer
62 views
Efficient way to find max height repetitive sub-trees in an object tree
I am trying to solve a problem of finding a max repetitive sub-tree in an object tree.
By the object tree I mean a tree where each leaf and node has a name. Each leaf has a type and a value of that ...
8
votes
2answers
203 views
Fastest way to find eigenpairs of a small nonsymmetric matrix on a GPU in shared memory
I have a problem where I need to find all positive (as in the eigenvalue is positive) eigenpairs of a small (usually smaller than 60x60) nonsymmetric matrix. I can stop calculating when the eigenvalue ...