emperor: (Default)
Add MemoryShare This Entry
posted by [personal profile] emperor at 06:41pm on 15/05/2009 under ,
Suppose for a moment, I have a 13.5 million-point data file that I want to plot as a scatter-plot. gnuplot will happily produce a plot for me, although it takes a little while, and I need a fair bit of RAM.

The resulting .ps file is 212M, however, and makes evince thrash for absolutely ages before displaying it. Converting it to PDF just makes matters worse (up to 275M). I want to be able to include this figure in a pdfLaTeX document.

Annoyingly, runes of the form time gs -r600 -sDEVICE=pdfwrite -sOutputFile=foo-600.pdf -dNOPAUSE -dSAFER -dBATCH foo.ps actually make matters worse - the resulting PDF is even bigger!

Surely there must be an easy way to simplify ps or PDF files? presumably I could force gs to rasterize by outputting to PNG and then converting back to PDF (how?), but that seems a deeply ugly hack...
Mood:: 'confused' confused
There are 11 comments on this entry. (Reply.)
 
posted by [identity profile] purplepiano.livejournal.com at 06:04pm on 15/05/2009
I thought pdfLaTeX accepted png/jpg/gif figures?
emperor: (Default)
posted by [personal profile] emperor at 06:07pm on 15/05/2009
It accepts at least some of them, yes, but in the past I've found PDF figures work better.
 
posted by [identity profile] covertmusic.livejournal.com at 06:13pm on 15/05/2009
Normalize the data on the way in? You can work out which points are repeated pretty easily, I'd have thought - cast everything to 300dpi output resolution and remove the dupes...

Dunno how many points that saves you, though :)
 
posted by [identity profile] 3c66b.livejournal.com at 06:26pm on 15/05/2009
Basically your deeply ugly hack is the way I reduce sizes of PS files for posting on arXiv.org which is picky about such things.

I use the shell script below and fiddle the resolution to trade off between appearance and file size... (needs ImageMagick... and the ppmtopgm makes it b/w which you may not want)

#!/bin/bash

res=${3:-150}

gs -q -sDEVICE=ppmraw -sOutputFile=- -r${res}x${res} - < $1 | pnmcrop | ppmtopgm | pnmtops -noturn -rle > $2
 
posted by [identity profile] foradan.livejournal.com at 07:09pm on 15/05/2009
Can you make a 2d histogram with a grid covering your scatter plot, colouring each grid cell by how many points are contained in it? That would make a much smaller output file, even for grids of the order of 100x100 cells.

 
posted by [identity profile] foradan.livejournal.com at 07:11pm on 15/05/2009
Oh yes, if you know python, try matplotlib. It will generate png output (or ps or pdf, or several others).

 
posted by [identity profile] arnhem.livejournal.com at 07:52pm on 15/05/2009
Isn't it intrinsically misleading to display the scatter plot at a comparatively low resolution for that many points, as it gives no real indication of how dense the points are once they hit "more than one per pixel" (for any given resolution at which you view).

There seem to be two problems you might want to solve here:

a) a pdf file that can be arbitrarily zoomed in on demand, in which all 13.5 million points are represented. This is unavoidably going to be large and unwieldy.

b) a summary graph at a resolution that you think is suitable. In this case, surely it's desirable to find a third dimension (grey level, or colour) that conveys the density of points in a particular pixel location.

You may want to do both; paper copies need the latter; electronic copies might want the former (but maybe not - it depends whether detailed information is useful or whether it's _only_ the summary information that's worth presenting).

Then again, perhaps you can get a halfway house; in which you use grey levels or colour coding to represent density of points per pixel, but with a pixel resolution that is only moderately greater than the default that you render it at (so you can zoom in a bit, but not hugely ...)
emperor: (Default)
posted by [personal profile] emperor at 09:22pm on 15/05/2009
Hm, I think b) might be Too Much Work for what isn't a very important figure. I'd have to write some sort of binning code, which would probably have to work out what was a sensible scale to bin at...
 
posted by [identity profile] queex.livejournal.com at 09:10pm on 15/05/2009
Write it out to a suitably large png file. For all practical purposes it's just as good.
 
posted by [identity profile] wellinghall.livejournal.com at 10:58am on 20/05/2009
I don't know, but I'd like to see it.
 
posted by [identity profile] caliston.livejournal.com at 10:45pm on 21/05/2009
Don't try plotting it in TeX with pgfplots. I have a mere 20,000 point curve I'm plotting (Matlab->matlab2tikz->LaTeX+tikz+pgfplots) and I've spent the day recompiling TeX to increase the hardcoded memory limits. Sigh. TeXLive only needs 2GB of Stuff to function properly.

October

SunMonTueWedThuFriSat
      1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
26
 
27
 
28
 
29
 
30
 
31