The time goes by and I don’t have enough personal space to write in this blog. It is crazy how life changes and how many things are thrown at you.
That being said, recently I helped a friend with some format problems, and I thought it was a perfect excuse to prepare a new blog post.
Overview of what to do
The situation is as follows. A friend of mine is doing an architecture project and she found a 3D dataset of the city she is working on. This is an official dataset of Ljubljana, capital of Slovenia.
The dataset is in KML (Keyhole Markup Language) with 3D buildings, and once opened in Google earth looks very cool. As everybody probably knows, Google earth is not a fauvorite software for architecture (mainly because it does nothing), this title goes to the famous AutoCAD.
She wants to get that 3D model into AutoCAD. In that way she can prepare more impressive renders of the project. Over an over this is not imperative, but will make her project look sharper.
Format Wars
There is a lot of Autodesk software, it is like a small web of pieces. Some will support GIS formats, some won’t. In this case, there is no way to open that KML with AutoCAD, so we need to transform that into something that can be read. That is another nice format, DXF (Drawing Exchange Format).
A little background on KML
KML is an XML format used mainly by Google to store geographic information, it became an OGC (Open Geospatial Consortium) standard around 2008. It has 3D models support using collada formats that can be stored directly in the same file, or 3D polygons with X,Y,Z points.
KML can be stored as a zipped file with KMZ extension. That is simply the base XML file (doc.xml) and then different resources to be referenced, for example, 3D collada models.
KML calling home
From that situation one can imagine that it will be not difficult to transform this. It is a standard and it is well documented. But surprise surprise, it is not as trivial as I would like it to be.
The first surprise I encounter is that, even if in Google earth there are a lot of polygons lying around, the KML file inside the KMZ only weights 5K. It’s an XML file with barely 1000 lines, that can’t contain all the information being presented in Google earth. But it is presented! how?!
Apparently, this is thanks to network links. A KML tag that points to extra networked data that has to be downloaded.
<Link> <href>http://urbanizem.ljubljana.si/Googleearth/3Draba/BarjeCrnaVas.kmz</href> <refreshInterval>0</refreshInterval> <viewRefreshMode>onRegion</viewRefreshMode> </Link>
So, we have this 5k KML file that will download extra KMZ files to represent the result. Here is where I start to drop my optimism, I don’t believe that GDAL will be able to handle this. Soon enough I know the response.
> ogrinfo 3dRaba.kmz
INFO: Open of `3DRaba.kmz'
using driver `LIBKML' successful.
1: Obmocja
2: 3D raba
3: Raba Stavbe
4: Raba Območja
5: Pot_Raba
It lists the Layers. I double check the KML file for this layer data. Unluckily for me, most of the information I want is stored under the “3D Raba” folder tag, under other folder tags. I am not sure how the libKML driver handles this, but I did not find a way to go down to sublayers.
ogr2ogr -skipfailures -f "DXF" 3Draba.dxf 3DRaba.kmz "3D raba"
ogr2ogr will generate an empty DXF file. 😦
In summary, the KML that we downloaded has no real data, it calls his mum for the extra information and then presents it. GDAL does not handle it well (at least directly).
Getting that data
This should not be a big deal, we have the url to the data files. It is just a little bit of extra work, we can try to download those files separately and convert one by one. Hopefully with a plain data file we’ll achieve success.
First things first. Go to that url with firefox, or with a plain old wget:
wget http://urbanizem.ljubljana.si/Googleearth/3Draba/BarjeCrnaVas.kmz
But we are not welcome…
2014-09-29 10:53:18 ERROR 403: Forbidden.
This is Forbidden for us, but not for Google earth as it happily presents all those nice polygons in my screen. I suspect this has something to do with the petition headers, I could try to tell the server that I am Google earth. Or I could search my local filesystem for those temporal files. The second option seems easier.
To make matters worse, the files are stored with some beautiful names as khTemp_140.kmz. At least I found one. Before I continue my filesystem scouting (or use something like wireshark or fiddler) I will try to convert from that file. If I can do that, then I’ll spend some quality time looking for the good files.
The file does not have any fancy models stored, just another doc.kml file that I rename to data.kml
Encodings
Nothing is ever as straightforward as it should. Being a country with a non ascii language I find some strange errors when trying to read the file. Since we won’t care for the names, we’ll strip any non ascii characters from the file.
perl -pi -e 's/[[:^ascii:]]//g' data.kml
This may be a little harsh, but will do the trick. I use perl regex substitution to remove anything that is not ascii.
Converting plain data
I have to check if this is really a plain dataset with no network links. Grepping
and counting the matching lines shows me that there are no links, and a lot
of polygons:
> grep -i link data.kml | wc -l
0
> grep -i polygon link data.kml | wc -l
12112
Before adventuring to convert anything, we’ll get some information about this
file. The libKML driver reads it and prints 30 different layers.
> ogrinfo data.kml
INFO: Open of `temp.kml'
using driver `LIBKML' successful.
1: Features
2: BC
3: CDc
...
30: ZS
For each layer I can print how many polygons it has, and more importantly, review that those are indeed polygons with 3 coordinates:
> ogrinfo data.kml BC
(lots of output)
OGRFeature(BC):32
(more output on that feature)
POLYGON ((14.543736182 46.080990304000103 4.14001465500002,14.543713817 46.080997581000098 4.14001465500002,14.5437137750001 46.080997596 4.14001465500002,14.543713739000101 46.080997611 4.14001465500002,14.5437137050001 46.080997628 4.14001465500002,....
We are clearly on the right track, this is the file that we want. And trying to transform again using ogr2ogr we get a slightly better result than with the root document.
> ogrinfo data.kml BC | grep -i count
34
> ogr2ogr -skipfailures -f "DXF" BC.dxf data.kml BC
> ogrinfo BC.dxf entities | grep -i count
34
We generated with GDAL a DXF that contains the same amount of entities of the original data layer. There is a big limitation with the DXF format in GDAL and that is that it does not support layers. But seems that layer by layer we get something.
The Z conondrum
I was happy too soon. When comparing the data from the DXF with the data in the KML I see that we lost some information in the process. And I am not talking about feature names and extra data (as the DXF driver won’t store those), I am talking about the Z information. The DXF contains exactly the same polygons but without the Z coordinate.
When checking the DXF writer source, it seems that it shoud be writing that 3rd coordinate as long as the input is a polygon25D. It would seem that the KML reader did not read everything as 2.5D polygons.
Since I find no easy way to do so directly from ogr2ogr. I get my python toolbox to do the job. But of course, the python bindings simply bind directly to the ogr drivers, so I have a similar problem.
Getting deeper. CPP time
Time to get into some serious work. I download the GDAL toolset source and build it for myself. When this is done I can change how the DXF driver works and at least solve my problem.
Developers know the drill: configure, make, test, change, make, test, change until I get what I want.
The DXF driver is doing something that I don’t quite understand, it uses the autocad Hatch to store the information. And apparently, that does not support a Z coordinate which makes my efforts from the command line totally unsuccessful.
If the hatch can’t support a Z value, why use the hatch? A direct approach that writes the result on the output as I expect is to change the used function. From writeHatch to writePolyline. This won’t be a painted polygon, but at least I’ll get a result.
We can override the HATCH using a GDAL runtime option called DXF_WRITE_HATCH, the beauty of that option is that it is not documented anywhere, well, except if you read the code (as seen on stackExchange). I also added some code changes myself to be sure that I am getting always a Z value (at least 0). My command from bash gets bigger
DXF_WRITE_HATCH=NO ogr2ogr -f "DXF" CU.dxf data.kml CU
From here, I finally got a DXF from a subset of the dataset! Yahoo 😀
What next?
From this point, I have to go back to what I left. This is only a layer, of a subdataset of the original dataset. Three different levels, this is nothing.
Finding the subdataset files will be a manual process. There are only 4 or 5 files to find, I hope it won’t take long.
To extract each layer into its own DXF, a small bash script will suffice. Starting with an ogrinfo, parsing the layer names, and running ogr2ogr with those layer names.
Closing
This has been a long road, and my brute-force approach is clearly not the way to go. Maybe the KML reader had an option to specify a 2.5D line, or the DXF writer had a specific option that I did not find. The sad part is that it took me quite a while and some programming knowledge to be able to transform a KML to a DXF using GDAL, I know is part of the game, but anyway a simple “export” turns into a small nightmare.
I am not totally sure about this, but maybe there’s a software piece that lets you do that more easily, but probably won’t be free.
Another comment is about the DXF format. Oh dear lord, so many things that I don’t know. No idea of what half of the options refer to (never been an AutoCAD user myself), and a lot of unkown concepts, it would take a good deal of time with an AutoCAD user to be able to decypher it, and maybe improve the driver. Don’t get me wrong, it is awesome to have this driver up and running, but it did not serve my purpose easily. It lacks at least some basic layer support.
I love GDAL! and the great team of people that make it possible! we have a long road ahead.
Numbers for fun
In this HTML document
- dxf appears 33 times
- kml appears 36 times
- gdal appears 15 times
- autoCad appears 15 times
References
Ljubljana 3D model ⇒GO
GDAL DXF driver ⇒GO
GDAL libKML driver ⇒GO
GIS stackExchange ⇒GO
DXF format specs ⇒GO
Pingback: Retrieve KML links | Castells
Thanks to this post, DXF_WRITE_HATCH is now documented in the GDAL DXF driver documentation: http://www.gdal.org/drv_dxf.html