Well it seems like Ed has tagged me, in this scheme which I’m wondering whether is a very subtle Google ads pyramid scheme started off by someone at the top of the chain, but anyway I’ll oblige and tell you five things about me that you don’t (need to ) know, as this is a GI blog I’ve kept that theme and given you a little geocoding puzzle to solve.

1)   I was Born somewhat prematurely in 1973 and so far I seem to be the only Dominic Stubbins in the world (well on Google anyway), I was born  here:

 osgb1000005733056

2)  I don’t have fully opposable thumbs, but despite that I grew up, and developed my geekiness tendencies  here:

 -50784,5804480

3)   I first came across digital mapping systems when doing work experience at a sweatshop doing digitisation of the OS Landline maps for British Gas. I didn’t have an auspicious start as my main job was to refill the industrial size pen plotters used for test plots, but I didn’t realise it was important that the correct coloured pens were in the right caddy.  That was here:

tqsrsrsrrrtsqrstrqqt

4)  I first used GIS in anger at University, mainly Unix based ArcInfo back before there was a distinction between workstation and desktop  here:

13/4059/2591

5) I’m getting married in a few weeks and after that you might be able to locate me somewhere in

GB 100105

Lat/longs on a postcard please, most of them should be fairly obvious especially if you cheat and use a search engine, although I guess it shows the importance of metadata.  The first one is interesting as TOIDS are supposed to be the new way of information sharing, but I wonder how much it costs to geocode the single toid I’ve given to a lat,long without purchasing a full mastermap coverage.

I believe the correct etiquette is to pass these tags along, so in no particular order.

Matt, Art and Andrea

Advertisements

We tend to take storage for granted these days, as you can pick up a 1/2 TB disk for about £100. which considering the first computer I owned was a ZX81 with no storage other than a cassete drive and an 8K ROM is pretty amazing. However there are still plenty of ways to use storage up, one way that works quite well is to format your drive with the wrong block size and then store an ArcGIS server Map Cache on it.

My laptop is currently running Windows vista, but for ArcGIS server stuff I tend to use a windows 2003 Virtual server. I was doing some experimenting with performance and using different image cache tile sizes and swapping between fused and Layer caches. One interesting thing that this highlighted was the fact that the disk block size can have a huge effect on the amount of storage taken up by the map cache.

The Virtual server I was using had a default virtual hard disk that has a block size of 4K this is fairly standard, although you can vary it when formatting the disk, for example my media server at home has 64K blocks as this gives better performance when accessing large files.

The Default size in ArcGIS server when creating a map cache is 512×512 pixels, for the vector data and rendering I was using these came out at about 5-10Kb in size. To compare performance I created the same cache with 128x128px tiles. I was expecting that in general the total size of the cache would be slightly bigger, because there would be more images (16 times as many), but they would all be 16 times smaller in pixel size and maybe 10 -15 times smaller in file size. There is going to be some overhead in image headers and maybe the png compression would be slightly less efficient, however for the same data at the same cache levels, the information content and total file size should be much the same, regardless of tile size.

In reality what I found was that the 128 pixel tiles took up almost 10 times the total storage space than the 512 pixel tiles. The reason for this was the disk block size was too large. The average size of each 128px tile was about 400 bytes. however with the block size at 4096 bytes, each file occupies a whole block on disk which means most of the space in each block is wasted. The data I was using was an extreme case, The data was very sparse so a lot of the small tiles were completely empty but were still occupying 4K on disk. This is compounded if you build a separate cache for each layer with small tile sizes, as each tile is much less likely to have any data in it but will still occupy a whole block. If on the otherhand your data contains a lot of detailed imagery, smaller tile dimensions will not be as inefficient as most tiles will contain information regardless of how small they are. They will still take up plenty of space but at least you will be using it efficiently. The key is getting the block size large enough to contain the small tiles without wasting too much space.

So the moral of this story, is that in order to make best use of the storage that you have you should format the disk with a suitable block size for the data that you will store on it. If you expect to have a lot of small and possibly blank or near blank cache tiles you should use a small block size such as 512 bytes. If your data is dense, and your tiles larger then you can get away with a larger block size. Normally for data you don’t really have to worry about the inefficiencies of disk block size as the numbers are so small, but if you are building industrial size map caches, the number of tile images can be in the 100’s of millions. Use the right block sizes and the cache can provide huge performance and usability gain, use the wrong block size and the gains will be at the cost of a lot of wasted storage.

tape-measure.jpg

One of the tasks I often get involved in is system sizing and capacity planning for GIS systems. The key reference for this is the excellent System Sizing Document produced twice a year by Dave and his Group at ESRI. The Paper is pretty long and can be kind of daunting when you first see it, however understanding it all is vital if you are going to be undertaking any detailed sizing tasks.

I had been planning on putting together a Post about which were the key bits of the document to look at if you wanted to do a rough sizing effort. So i just had a look back at the document and noticed that it has been updated with an October 2006 edition. There have been some pretty fundamental changes, to bring it in line with some of the new tools that are being developed to help the process and make it more up to date.

The Key chapters 7 and 8 have been completely re-written and updated. With the Release of 9.2 the new document has been updated with details of the sizing and performance as well as licensing mechanisms and recommended configurations. A new hardware baseline has been included and it also takes account of multicore hardware. The ArcGIS Server sections are much more comprehensive than in the last document, and i suspect there is still a bit more to come to help with some of the more complex aspects of 9.2 sizing issues. All I have to do now is find the time to digest the whole document, maybe on the flight back from Barcelona tomorrow.

This is obviously the week for conferences, what with the ESRI European User Conference happening in Athens, its also the first week of the Microsoft Tech-ed conference.  This year is a little different, with the conference smaller but spread over 2 weeks and split into 2 streams.  This week is the developers conference and next week is the IT Forum.  I think I preferred it as it used to be with developers and system admin mixed together as I tend to be involved in both areas, however 2 weeks is much too long to justify being at a conference, so I chose the developer half this year. 

 Barcelona is a great city and so far the weather has been pretty good.  Microsoft certainly know how to look after delegates, with an endless supply of free doughnuts and sugared beverages!

The key things that are being talked about here are .Net 3.0  and related technologies such as LINQ, WPF and WCF, also there is the launch of office 2007 and Windows Vista. LINQ in particular looks like a really cool and very useful framework. 

Unlike previous Tech Ed events there are no Mappoint or spatial related sessions, there is a virtual earth stand where they are showing off the new Virtual Earth 3D ActiveX component, It does have a certain wow factor seeing it embedded in a browser, even if it is IE only.  There was also an interesting talk about Windows live mashups, though it focused mainly on things other than maps such as contacts.  Nothing very new although a hint that some kind of online storage service may be coming as well as a possible link up between Virtual Earth 3D and the new MS Flight Simulator.

Its certainly a great place to be to find out whats coming that will impact windows projects over the next couple of years.

Well with the UK mashup event coming up I was having a look around to try and find some interesting mashups to explore, but it seems there is a lot of similarity. Yes there are some cool applications with nice interfaces and some interesting datasets, but 99% of them are just data overlay applications. The mashup phenomenon has been going on for getting on for 2 years now and the initial excitement and innovation seems to be wearing off a bit. Most of them consist of one of two recipes.

  1. Find interesting list of something on the web, process list format, geocode and display points using GYM API
  2. Take homegrown list of something optionally geocode then display points using GYM API.

There are some exceptions to this for example combining several different services such as calendaring, events and mapping, however most spatial mashups are still just pushpin maps of locations. These are fine if you are really interested in the data, but many mashups seem to be there just for the sake of doing a mashup. There are two things that would really make mashups that much more interesting and useful and hopefully we will start to see these appearing more often.

The first has already started to happen which is to open up the data to to other users. When you think about mashups you would expect that people using for example googlemaps and some point data would now have made that point data available for all to see, but often this is not really the case. The point data rendered on the map is often no more useable or accessible to other applications than it was before. Yes you can visualise it against a map background, but you cant easily access the raw data; for example to use in a spatial analyst process to find the density surface of those locations. What would be great would be if when creating mashup sites the authors were making the data available as a service, not just an end user application. This can be achieved in several ways, the simplest and most common is to make the same data available in a “standard” format, the most popular of these are; GeoRSS and KML followed by GML and the other OGC formats. This is now pretty common and there are many sites that reformat data into these forms as part of a mashup. The other approach is to create a service or API to access the data you are publishing or accessing as part of the mashup. This is much rarer as its a pretty complex process involving server based software or processes. However there are a few tools that can help you do this. A great example is Dapper, this allows you to create a web service out of any third party website, it will map that site to an XML schema which can in turn be transformed into any common format, such as a googlemap URL or icalendar format. You can also map it to a custom format such as KML or GML. This is an amazingly powerful if slightly contentious tool that allows you to create some interesting applications. I can see people using it to generate KML services from pretty much any website containing locations.

Once services and Data are available the next step is to do more analysis of that data. Currently mashups combine data so the user can for example look for patterns in the geographic data that they will never have noticed before. What would be even better would be for the systems to look for patterns and relationships rather than leaving it up to the users. Spatial analysis is something we are all comfortable with on the desktop, but mashups are still stuck in visualisation mode. This is understandable as many of the tools to do this analysis have not been available in the web environment. However if data is provided as GML, KML or WFS service then users who have the tools could use the data in much more innovative ways. With standards support built into ArcGIS users could utilize online data sources with many of the ArcGIS extensions such as spatial or geostatistical analyst. At 9.2 with ArcGIS server geoprocessing support, organisations can now begin to make analysis services available as well as just raw data. For example there are many sites displaying houses for sale along with information about nearby schools and amenities, what would be great would be for the user to input their preferences about the importance of certain features, such as schools, transport links house prices, distance to shops, pollution and then for the site to calculate the ideal location based on a specific users preferences. This type of problem is a classic spatial analyst problem of overlaying different surfaces to come up with a ideal locations. Another example of this is this story about using Google Earth to search for potential archeological sites, why not combine this with an automated feature extraction model to search for specific data within the imagery, an example of this is documented in this paper that searched for road junction in online raster maps. Hopefully soon with the release of ArcGIS Server 9.2 we will start to see more of this type of analysis capability move to the web and become incorporated into some new and innovative mashups.

mashed.jpg

It seems like the mapping mashup phenomenon is really beginning to take off in the enterprise, the recent story about a legal software provider integrating Google Maps into their software is one example. Now this is nothing particularly new, software vendors have been integrating with 3rd party services for some time, for example the Searchflow conveyancing web site was integrated, maps and all, into a number of 3rd party legal software products. The difference now appears to be the simplicity with which this can be done without having to build a relationship between the different service suppliers or the end users. This ease of use has done a great deal to promote the use of geography and spatial visualisation of data, but it also seems to be causing some confusion in the commercial sector.

I have seen several ITT’s recently for enterprise GIS systems that request the use of “Google satellite imagery” in the solution. These requirements have been driven either by the fact that they already have a web application that displays Google imagery, or someone has seen Google maps and wants the data in their applications. There are 2 main issues that organisations looking to use so called “Google data” in enterprise applications need to look at, both of these seem to be legal rather than technical.

The first issue is the use of the Google maps API in a non public site. It seems like the Google terms and conditions require any service to be publicly available. If its not then you can use the Google enterprise maps edition. This is not free and although they have just announced availability in Europe this doesn’t include the UK yet. I’m not sure whether this means you can’t do commercial mashups at all or whether they will allow the non commercial API to be used until its available in the UK. The reasoning behind having a commercial service is pretty obvious. As our mapping agencies are fond of mentioning collecting and maintaining high quality spatial data is an expensive business, and why would users go to say DigitalGlobe or even the O.S. when it seems they can get essentially the same data for free from Google. The availability of free applications and data has meant that the perceived value of the data is essentially free, whereas the reality is very different. Someone is paying for the data, presumably Google, but increasingly it is going to be “enterprise” users.

The second issue with integration, is the requirement that you use the API, rather than access the data directly, this is obviously driven by the same issues of data licensing, however it does put a pretty big hurdle in the way of organisations who wish to use the data in a non-web environment. Sometimes we may be guilty of assuming that everyone just uses web applications these days, but there are many many users using desktop applications for data maintenance, analysis and publishing purposes and these users are not going to be able to use the Google services.

Google however are not the only service provider that can deliver mapping data, there are free services and API’s from Yahoo, Microsoft and of course ESRI, but again these all have “commercial” versions or restrictions. For example the Yahoo maps API does not really distinguish between commercial or non commercial, but they do have some technical restrictions such as not using it with GPS derived data! (see 1.f(vi) in the terms) they also have an Image API that allows you to directly access the map images without having to use a web browser client which makes integration a bit simpler.

Microsoft have a dedicated commercial API, but don’t limit what you can do with the public API, as long as you use the API.

The ArcWeb public services are designed just for that, i.e. non commercial use. There are also terms available for organisations who wish to use services in commercial applications.

I’m no lawyer so I wouldn’t take any notice of my interpretations of the T’s &C’s. I guess the thing to remember is that while there are many services available to end users delivering high quality data and imagery free of charge, the same is not necessarily true if you wish to make use of these services within a commercial setting. You should take great care in examining the terms of use before using these services for enterprise mashups.

cassette.jpg

OK So I’m officially useless at this blogging lark, its been a while and I’ve been a little busy but still no excuse. There have been a bunch of events that I failed to get to, I managed to miss both the AGI and the oracle spatial user group by being in Redlands catching up with the PLTS team, and getting a final sneak preview of 9.2.

Talking of which, if you haven’t been on the beta program you now have a chance to get a look at 9.2 in all its glory. ESRI(UK) are running a round of Tech update sessions across the country where you will get to see all the cool new stuff. You can register on the website here the dates and locations are:

  • Oct 31st Edinburgh
  • Nov 9th Manchester
  • Nov 14th Birmingham
  • Nov 16th London

The agenda is still to be confirmed, but as its all about 9.2 you can probably expect to see lots about ArcExplorer and ArcGIS server. I would really recommend you try to get to one of these events, as 9.2 really is more than a “dot” release. It introduces some fundamentally new ways of sharing GIS data and functionality around organisations so if you are involved in planning, architecting, managing or building GIS systems its worth seeing what’s new.

A couple of other events to note, If your in Birmingham for the 14th, you could also stay an extra day and catch the second day of the oracle spatial stream at the Oracle UK User group.

Also the UK Geospatial mashup is happening on October the 20th along with an evening of musical mashups the night before where you can pit you DJ skills against our very own DJ CharlesK.