I've been thinking for a while now that I should maybe re-organise some of my work code. The question turns out to be slightly less trivial than I thought...
Essentially, I have a library of useful graph/network analysis and simulation tools, written in C. There are currently three ways you can represent a graph in this library:
...though I might want to add ones later. Currently you chose between these at compile time (it's a switch in the Makefile), and then
The various header files then
The various functions I've written call MAT_SET and so on without having to care what type of graph type you're using (since generally it's only an efficiency issue); in the odd case that a function needs a particuar representation, it can check what #define you used (again, this is currently a compile-time thing).
There are a few obvious problems. Firstly, for this library to be more generally useful than for just me (which I think would be a good aim), people shouldn't have to re-build the entire library frequently (consider multi-user systems). Secondly, it is occasionally useful to be able to transform between the different structures (for example, reading in some data formats is best done using an adjacency-list representation), and only with a bit of hackery can you do this currently.
I can think of a few possibilities for going ahead, and I was also wondering what my readers (many of whom are far better at this coding malarky than I am) thought:
Firstly, I could carry on as I am doing, which is the past of least effort currently. I'm already finding myself having to do unnecessary work as a result, though. I'd like to find something better
Secondly, I could have different types for each data type, so you could say
If I seem a little obsessed with all this, it makes a noticable difference to run-time which data structure type I use for different algorithms - some graph algorithms are very expensive, and I'm working with large graphs, hence the desire to get this sort of thing right.
[if anyone wants to see the complete header files, then do ask, they just seem a bit large to dump into LJ]
Essentially, I have a library of useful graph/network analysis and simulation tools, written in C. There are currently three ways you can represent a graph in this library:
- an n-by-n array of integers
- an n-by-n array of bytes
- an n-array of adjacency lists
...though I might want to add ones later. Currently you chose between these at compile time (it's a switch in the Makefile), and then
matrix.h does things like:#ifdef S_MATRIX #include "smatrix.h" #elif defined I_MATRIX #include "imatrix.h" ...
The various header files then
#define various useful functions, eg:/*is bit x of a set?*/ #define MAT_TEST(m,a,x) (m[(a)][SM_BYTE(x)] & SM_BIT(x)) /*set bit x of a*/ #define MAT_SET(m,a,x) (m[(a)][SM_BYTE(x)] |= SM_BIT(x))
The various functions I've written call MAT_SET and so on without having to care what type of graph type you're using (since generally it's only an efficiency issue); in the odd case that a function needs a particuar representation, it can check what #define you used (again, this is currently a compile-time thing).
There are a few obvious problems. Firstly, for this library to be more generally useful than for just me (which I think would be a good aim), people shouldn't have to re-build the entire library frequently (consider multi-user systems). Secondly, it is occasionally useful to be able to transform between the different structures (for example, reading in some data formats is best done using an adjacency-list representation), and only with a bit of hackery can you do this currently.
I can think of a few possibilities for going ahead, and I was also wondering what my readers (many of whom are far better at this coding malarky than I am) thought:
Firstly, I could carry on as I am doing, which is the past of least effort currently. I'm already finding myself having to do unnecessary work as a result, though. I'd like to find something better
Secondly, I could have different types for each data type, so you could say
SMAT_TEST or IMAT_TEST, and so on. You'd still need some mechanism for chosing the default, though. Also, you'd need some mechanism for a library function to know which sort of structure it had been passed. Maybe the matrix_t structure could be a union of the other types, with a flag to say which it was? That has the problem that the TEST and SET functions are often macros. I guess you could compile your code three times into one vast library, and then have a piece of code to work out what type of graph type you were using, and call the relevant version. This all seems terribly complicated, though...If I seem a little obsessed with all this, it makes a noticable difference to run-time which data structure type I use for different algorithms - some graph algorithms are very expensive, and I'm working with large graphs, hence the desire to get this sort of thing right.
[if anyone wants to see the complete header files, then do ask, they just seem a bit large to dump into LJ]
There are 7 comments on this entry.