Allocate n-dimensional arrays
Long time ago I parallelized a program written in C++ (Well, not really C++: the only C++ thing in it was the usage of cout in stead of printf). As usual, I looked first at possibilities to optimize the serial version of the program.
Profiling learnt me that the program spent most of it's time in computing addresses in a 6-dimensional array using a hand-written alogorithm, so it seemed that using a normal C syntax for that array, like
A[i][j][k][l][m][n] = 10
based on pointers, would speed-up the program considerably.
I learned soon that hand-writing code to create pointers, pointers to pointers, pointers to pointers to pointers and so on, is not easy to do correctly. (Indeed, in another program I found a mistake in hand-written code to create a 2-dimensional array). So I decided to create a function that does the job. Originally the function I created was named 'matalloc', but since there are very many matalloc's around on the web, I changed the name in wv_matalloc.
BTW: indeed the program ran several factors faster with my modification.
wv_matalloc is based on the usage of pointers to pointers to ... Sometimes I get the impression that real programmers do not like this for performance reasons.
Suppose, we have a matrix A with dimensions M and N. Using wv_matalloc we access element i,j with:
int M=100; int N=50; double **A = wv\_matalloc(sizeof(double),0,2,M,N) int i,j; for (i=0; i<M; i++) for (j=0; j<N; j++) A[i][j] = i*j;
Real programmers do it as follows:
int M=100; int N=50; double *A = malloc(sizeof(double)*M*N); int i,j; for (i=0; i<M; i++) for (j=0; j<N; j++) A[i*N+j] = i*j;
It seems to me that in the 6-dimensional case it requires a REAL programmer to do this reliably.
In the tar.gz file under downloads, a program (matmulcompare.c) is included to measure the difference in performance using a naive implementation of a matrix multiplication. The program source contains some explanation, results and compile instructions.
In short: for gcc both methods have the same efficiency. For icc (the Intel compiler) the real-programmers approach is much more efficient, but then, for matrix multiplication you should use something from Lapack or so, and you can still use wv_matalloc to access your data in a convenient way.
Example using wv_matalloc and Lapack
The tar.gz file under downloads contains an example (dgesvtest.c) how to use wv_matalloc and Lapack to solve a system of linear equations.