On my previous post I worked with a kml that was splitted in various networked kmz files. This was particularly annoying given that I had to track the files in my computer.
I am ashamed, this should have been automatized! And since a friend prepared a nice script for it, here it is.
Simple idea
We have the urls to download the kmz files, but we can’t do it directly because the server will refuse our petition.
The server will only accept connections from google earth, so we have to generate that petition, telling the server that we are google earth.
Get the User-Agent
The User-Agent is the header of any http connection telling the server who is talking to him, firefox? chrome? or google earth?.
With the help of wireshark we can monitor the packets going through the network, the task is to obtain the ones generated by google earth, so we know that the user agent is:
GoogleEarth/7.1.2.2041(X11;Linux (3.13.0.0);en;kml:2.2;client:Free;type:default
The script
I did not have the pleasure to write this script, the credit goes to fvila. But since it has been very valuable to retrieve everything, i put it all in here.
But let’s have a quick look on what it is using:
from zipfile import ZipFile from os.path import join, isfile, isdir import urllib2 import sys
It imports some standard libraries to do the basic job. Zipfile to unpack the kml files, os.path to handle the file paths, urllib2 to download the remote kmz and sys for a basic set of command line parameters.
After that, apart from the basic script managing, the meat of it can be found in the function downloadKMLresources
. There is nothing out of the ordinary in this function, it uses urllib2 to download a file, but using the small trick of changing the user agent
req = urllib2.Request(url) req.add_header('User-Agent', 'GoogleEarth/7.1.2.2041(X11;Linux (3.13.0.0);en;kml:2.2;client:Free;type:default')
The full script, ready to use to download the networked kmz of any kmz file follows:
from zipfile import ZipFile from os.path import join, isfile, isdir import urllib2 import sys try: import xml.etree.cElementTree as ET except ImportError: import xml.etree.ElementTree as ET def usage(): print "Usage: %s [KMZ file] [destination]" % sys.argv[0] exit(-1) def downloadKMLresources(kmz_file, dest): kmz = ZipFile(kmz_file, 'r') kml = kmz.open('doc.kml') xml = ET.ElementTree(file=kml) root = xml.getroot() elements = root.findall('.//{http://www.opengis.net/kml/2.2}href') urls = [e.text for e in elements if e.text[-3:] == 'kmz'] downloaded_files = [] for url in urls: name = join(dest, url[url.rfind('/')+1:]) downloaded_files.append(name) print "Creating %s..." % name req = urllib2.Request(url) req.add_header('User-Agent', 'GoogleEarth/7.1.2.2041(X11;Linux (3.13.0.0);en;kml:2.2;client:Free;type:default') response = urllib2.urlopen(req) contents = response.read() with open(name, 'wb') as f: f.write(contents) response.close() return downloaded_files if __name__ == '__main__': if len(sys.argv) != 3: usage() kmz_file = sys.argv[1] destination = sys.argv[2] if not(isfile(kmz_file)) or not(isdir(destination)): usage() downloadKMLresources(kmz_file, destination)
Conclusion
This was the last piece that I needed to automate all the process described in KML hell post.
Once each kmz file is downloaded, I can process them separately:
- Unzip the kmz
- Copy the doc.kml with an appropiate name
- Remove any non-ascii character
- Run through my slightly modified gdal
- GDAL reprojection from WGS 84 to UTM zone 33N. I did not mention that in my previous post, but it is quite important.