January 2007


Well it seems like Ed has tagged me, in this scheme which I’m wondering whether is a very subtle Google ads pyramid scheme started off by someone at the top of the chain, but anyway I’ll oblige and tell you five things about me that you don’t (need to ) know, as this is a GI blog I’ve kept that theme and given you a little geocoding puzzle to solve.

1)   I was Born somewhat prematurely in 1973 and so far I seem to be the only Dominic Stubbins in the world (well on Google anyway), I was born  here:

 osgb1000005733056

2)  I don’t have fully opposable thumbs, but despite that I grew up, and developed my geekiness tendencies  here:

 -50784,5804480

3)   I first came across digital mapping systems when doing work experience at a sweatshop doing digitisation of the OS Landline maps for British Gas. I didn’t have an auspicious start as my main job was to refill the industrial size pen plotters used for test plots, but I didn’t realise it was important that the correct coloured pens were in the right caddy.  That was here:

tqsrsrsrrrtsqrstrqqt

4)  I first used GIS in anger at University, mainly Unix based ArcInfo back before there was a distinction between workstation and desktop  here:

13/4059/2591

5) I’m getting married in a few weeks and after that you might be able to locate me somewhere in

GB 100105

Lat/longs on a postcard please, most of them should be fairly obvious especially if you cheat and use a search engine, although I guess it shows the importance of metadata.  The first one is interesting as TOIDS are supposed to be the new way of information sharing, but I wonder how much it costs to geocode the single toid I’ve given to a lat,long without purchasing a full mastermap coverage.

I believe the correct etiquette is to pass these tags along, so in no particular order.

Matt, Art and Andrea

Advertisements

We tend to take storage for granted these days, as you can pick up a 1/2 TB disk for about £100. which considering the first computer I owned was a ZX81 with no storage other than a cassete drive and an 8K ROM is pretty amazing. However there are still plenty of ways to use storage up, one way that works quite well is to format your drive with the wrong block size and then store an ArcGIS server Map Cache on it.

My laptop is currently running Windows vista, but for ArcGIS server stuff I tend to use a windows 2003 Virtual server. I was doing some experimenting with performance and using different image cache tile sizes and swapping between fused and Layer caches. One interesting thing that this highlighted was the fact that the disk block size can have a huge effect on the amount of storage taken up by the map cache.

The Virtual server I was using had a default virtual hard disk that has a block size of 4K this is fairly standard, although you can vary it when formatting the disk, for example my media server at home has 64K blocks as this gives better performance when accessing large files.

The Default size in ArcGIS server when creating a map cache is 512×512 pixels, for the vector data and rendering I was using these came out at about 5-10Kb in size. To compare performance I created the same cache with 128x128px tiles. I was expecting that in general the total size of the cache would be slightly bigger, because there would be more images (16 times as many), but they would all be 16 times smaller in pixel size and maybe 10 -15 times smaller in file size. There is going to be some overhead in image headers and maybe the png compression would be slightly less efficient, however for the same data at the same cache levels, the information content and total file size should be much the same, regardless of tile size.

In reality what I found was that the 128 pixel tiles took up almost 10 times the total storage space than the 512 pixel tiles. The reason for this was the disk block size was too large. The average size of each 128px tile was about 400 bytes. however with the block size at 4096 bytes, each file occupies a whole block on disk which means most of the space in each block is wasted. The data I was using was an extreme case, The data was very sparse so a lot of the small tiles were completely empty but were still occupying 4K on disk. This is compounded if you build a separate cache for each layer with small tile sizes, as each tile is much less likely to have any data in it but will still occupy a whole block. If on the otherhand your data contains a lot of detailed imagery, smaller tile dimensions will not be as inefficient as most tiles will contain information regardless of how small they are. They will still take up plenty of space but at least you will be using it efficiently. The key is getting the block size large enough to contain the small tiles without wasting too much space.

So the moral of this story, is that in order to make best use of the storage that you have you should format the disk with a suitable block size for the data that you will store on it. If you expect to have a lot of small and possibly blank or near blank cache tiles you should use a small block size such as 512 bytes. If your data is dense, and your tiles larger then you can get away with a larger block size. Normally for data you don’t really have to worry about the inefficiencies of disk block size as the numbers are so small, but if you are building industrial size map caches, the number of tile images can be in the 100’s of millions. Use the right block sizes and the cache can provide huge performance and usability gain, use the wrong block size and the gains will be at the cost of a lot of wasted storage.