emperor: (lego scholar)
posted by [personal profile] emperor at 11:39am on 30/05/2005
I've been thinking for a while now that I should maybe re-organise some of my work code. The question turns out to be slightly less trivial than I thought...

Essentially, I have a library of useful graph/network analysis and simulation tools, written in C. There are currently three ways you can represent a graph in this library:

  1. an n-by-n array of integers

  2. an n-by-n array of bytes

  3. an n-array of adjacency lists


...though I might want to add ones later. Currently you chose between these at compile time (it's a switch in the Makefile), and then matrix.h does things like:
#ifdef S_MATRIX
#include "smatrix.h"
#elif defined I_MATRIX
#include "imatrix.h"
...

The various header files then #define various useful functions, eg:
/*is bit x of a set?*/
#define MAT_TEST(m,a,x) (m[(a)][SM_BYTE(x)] & SM_BIT(x))
/*set bit x of a*/
#define MAT_SET(m,a,x) (m[(a)][SM_BYTE(x)] |= SM_BIT(x))


The various functions I've written call MAT_SET and so on without having to care what type of graph type you're using (since generally it's only an efficiency issue); in the odd case that a function needs a particuar representation, it can check what #define you used (again, this is currently a compile-time thing).

There are a few obvious problems. Firstly, for this library to be more generally useful than for just me (which I think would be a good aim), people shouldn't have to re-build the entire library frequently (consider multi-user systems). Secondly, it is occasionally useful to be able to transform between the different structures (for example, reading in some data formats is best done using an adjacency-list representation), and only with a bit of hackery can you do this currently.

I can think of a few possibilities for going ahead, and I was also wondering what my readers (many of whom are far better at this coding malarky than I am) thought:

Firstly, I could carry on as I am doing, which is the past of least effort currently. I'm already finding myself having to do unnecessary work as a result, though. I'd like to find something better

Secondly, I could have different types for each data type, so you could say SMAT_TEST or IMAT_TEST, and so on. You'd still need some mechanism for chosing the default, though. Also, you'd need some mechanism for a library function to know which sort of structure it had been passed. Maybe the matrix_t structure could be a union of the other types, with a flag to say which it was? That has the problem that the TEST and SET functions are often macros. I guess you could compile your code three times into one vast library, and then have a piece of code to work out what type of graph type you were using, and call the relevant version. This all seems terribly complicated, though...

If I seem a little obsessed with all this, it makes a noticable difference to run-time which data structure type I use for different algorithms - some graph algorithms are very expensive, and I'm working with large graphs, hence the desire to get this sort of thing right.

[if anyone wants to see the complete header files, then do ask, they just seem a bit large to dump into LJ]
Mood:: 'thoughtful' thoughtful

Reply

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

October

SunMonTueWedThuFriSat
      1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
26
 
27
 
28
 
29
 
30
 
31