Quickjump to:

- How to iterate over all non zero elements?
- How to speed up matrix-vector-operations?
- How to fill a sparse matrix?

back to my ublas-page.

Sample code: ex_sparse1.cpp, ex_sparse1.cpp.html

Basically you have to use one of the two iterators returned by begin1() or begin2() for the outer loop. These iterators are dense for all matrix types. Only the nested iterators of the inner loop are either dense, packed or sparse depending on the matrix type.

typedef spm_t::iterator1 i1_t; typedef spm_t::iterator2 i2_t; for (i1_t i1 = M.begin1(); i1 != M.end1(); ++i1) { for (i2_t i2 = i1.begin(); i2 != i1.end(); ++i2) cout << "(" << i2.index1() << "," << i2.index2() << ":" << *i2 << ") "; cout << endl; }

Unfortunatly, the Mircosoft VC 7.1 compiler cannot handle the nested iterators, so you have to use this workaround:

for ( Matr::iterator1 it1 = a.begin1(); it1 != a.end1(); ++it1 ) #ifndef BOOST_UBLAS_NO_NESTED_CLASS_RELATION for ( Matr::iterator2 it2 = it1.begin(); it2 != it1.end(); ++it2 ) #else for ( Matr::iterator2 it2 = begin(it1, ublas::iterator1_tag()); it2 != end(it1, ublas::iterator1_tag()); ++it2 ) #endif *it1 = 0.;

Here is another example of matrix iterators: iterator_chaos.cpp, iterator_chaos.cpp.html

Sample code: ex_residual.cpp, ex_residual.cpp.html

Assume you have to compute the residual of a system of linear equations:
`r=Ax-b`. Where `A` is a m-by-n matrix, `x` is a vector of size n
and `b` is the right hand side vector of size m.

The first guess is: `r = b - prod(A,x)`. This works pretty well.

Because the expression `b-prod(A,x)` does not contain any reference to `r`
you can notify the compiler:
`noalias(r) = b - prod(A,x)` or `r.assign(b - prod(A,x))`

Next you can use the expression as initializer for `r` which slightly reduces the
execution time: `vector<double> r(b - prod(A, x))`

After reading the documentation about `axpy_prod` the next try is:
`r = b; axpy_prod(A,-x,r,false);`

This should give a large speedup. But this is still not enough. It is even possible to specialize
`axpy_prod` (from `operation.hpp`).

size 4000000 r=Ax-b : 0.28 (40) noalias(r)=b-Ax : 0.27 (40) vector<double> r(b - prod(A, x)) : 0.27 (40) axpy_prod(A,-x,r,false) : 0.15 (40) residual(A,x,b,r,row_major_tag()) : 0.14 (40) size 10000000 r=Ax-b : 0.68 (63.2456) noalias(r)=b-Ax : 0.69 (63.2456) vector<double> r(b - prod(A, x)) : 0.69 (63.2456) axpy_prod(A,-x,r,false) : 0.38 (63.2456) residual(A,x,b,r,row_major_tag()) : 0.34 (63.2456)using: g++ 4.3.2 (64 Bit) on AMD Athlon(tm) 64 X2 Dual Core Processor 4800+

There are different variants of filling a sparse matrix. First fill it row by row or column by column using push_back(i,j,value). Sample code: ex_fill1.cpp, ex_fill1.cpp.html

for (size_t i=0; i<A.size1(); ++i) { if (i>=1) { A.push_back(i,i-1,-1.0); } A.push_back(i,i,4); if (i+1<A.size2()) { A.push_back(i,i+1,-1.0); } }

Much more often the matrix entries are computed in a nearly random order. Therefore you have to carefully choose the best matrix type. I will explain the differences shortly. Now assume A is an M-by-N row major matrix with at most K nonzeroes per row:

`sparse_matrix:` This type is an implementation of a
`std::map<size_t, std::map<size_t, double> >`. Thus insertion of an element takes O(log(M)+log(K)) operations plus storage allocation, which should be (amortized) contant time. The disadvantage is slower traversal throug all elements, some levels of indirection (pointers) and noncontiguous storage.

`compressed_matrix:` This type is an implementation of the compressed row storage known from Netlib (www.netlib.org) widely used by FORTRAN linear algebra libraries. Short: We have a vector of values, a vector of column indices corresponding to this values and a vector of pointers where each row starts. Inserting one element takes O(log(M)+log(K)) operation plus O(M*K) storage operations if the values and column indices have to be moved one position. Adding elements in order by
`push_back(i,j,value)` is constant time. Increasing an existend element takes only O(log(M)+log(K)) Operations. The advantage is very fast element traversal which gives optimal performance for linear algebra routines.

`coordinate_matrix:` This type is an implementation a list of triples (i,j,value) by using three vectors. Therefore you can insert elements in random order by
`insert(i,j,value)` which acts like `A(i,j) += value`. This actually differs from insert operations for other matrix types. The price you pay is a sort() operation before each element access and additional storage if you insert many elements more than once. `sort()` sorts all elements and combines multiple insertions of an element to one. Thus insertion of an element is constant time plus one sort() at the end for O(M*K *log(M*K)) operations depending on the method. (According to Stroustrup, "C++": std::sort() has average O(n*log(n)), worst case O(n*n), std::stable_sort() has O(n*log(n)*log(n)) plus O(n*log(n)) storage.) For this type linear algebra operations are typically slower than for compressed_matrix.

`generalized_vector_of_vector<double, row_major, vector<coordinate_vector<double> > >:` This type implements a vector of some sparse vectors which gives you direct access to each matrix row. So any insert(i,j,value) operation is really an insert(j,value) to the i-th vector. So any insert takes O(log(K)) for sparse_vector, O(log(K)) + storage movements for compressed_vector and O(1) plus storage plus final sort of M*O(K*log(K)) for coordinate_vector. The performance of linear algebra is comparable to the corresponding sparse matrix type.

The choice of the best matrix type depends on the access pattern. The sample program below simulates the assembling of a FEM stiffnes matrix by randomly adding small blocks to the global sparse matrix. Watch and decide: sparse_fill2.cpp, sparse_fill2.cpp.html

$ g++ -I /LOCAL/gwinkler/include/ -o sparse_fill2 sparse_fill2.cpp -DNDEBUG -O3 $ ./sparse_fill2 10000 size 10000 sum(column(A,0)) 18 test_matrix_fill_insert coordinate_matrix: 0.25 sum(column(A,0)) 18 test_matrix_fill_insert vector_of_coordinate_vector: 0.12 sum(column(A,0)) 18 test_matrix_fill_plus_assign vector_of_compressed_vector: 0.14 $ ./sparse_fill2 100000 size 100000 sum(column(A,0)) 9 test_matrix_fill_insert coordinate_matrix: 3.06 sum(column(A,0)) 9 test_matrix_fill_insert vector_of_coordinate_vector: 1.37 sum(column(A,0)) 9 test_matrix_fill_plus_assign vector_of_compressed_vector: 1.65using: g++ 4.3.2 (64 Bit) on AMD Athlon(tm) 64 X2 Dual Core Processor 4800+