Retrieve KML links

kmlhell

On my previous post I worked with a kml that was splitted in various networked kmz files. This was particularly annoying given that I had to track the files in my computer.

I am ashamed, this should have been automatized! And since a friend prepared a nice script for it, here it is.

Simple idea

We have the urls to download the kmz files, but we can’t do it directly because the server will refuse our petition.

The server will only accept connections from google earth, so we have to generate that petition, telling the server that we are google earth.

Get the User-Agent

The User-Agent is the header of any http connection telling the server who is talking to him, firefox? chrome? or google earth?.

With the help of wireshark we can monitor the packets going through the network, the task is to obtain the ones generated by google earth, so we know that the user agent is:

GoogleEarth/7.1.2.2041(X11;Linux (3.13.0.0);en;kml:2.2;client:Free;type:default

The script

I did not have the pleasure to write this script, the credit goes to fvila. But since it has been very valuable to retrieve everything, i put it all in here.

But let’s have a quick look on what it is using:

from zipfile import ZipFile
from os.path import join, isfile, isdir
import urllib2
import sys

It imports some standard libraries to do the basic job. Zipfile to unpack the kml files, os.path to handle the file paths, urllib2 to download the remote kmz and sys for a basic set of command line parameters.

After that, apart from the basic script managing, the meat of it can be found in the function downloadKMLresources. There is nothing out of the ordinary in this function, it uses urllib2 to download a file, but using the small trick of changing the user agent

req = urllib2.Request(url)
		req.add_header('User-Agent',
        	           'GoogleEarth/7.1.2.2041(X11;Linux (3.13.0.0);en;kml:2.2;client:Free;type:default')

The full script, ready to use to download the networked kmz of any kmz file follows:

from zipfile import ZipFile
from os.path import join, isfile, isdir
import urllib2
import sys

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

def usage():
	print "Usage: %s [KMZ file] [destination]" % sys.argv[0]
	exit(-1)

def downloadKMLresources(kmz_file, dest):
	kmz = ZipFile(kmz_file, 'r')
	kml = kmz.open('doc.kml')
	xml = ET.ElementTree(file=kml)
	root = xml.getroot()

	elements = root.findall('.//{http://www.opengis.net/kml/2.2}href')
	urls = [e.text for e in elements if e.text[-3:] == 'kmz']
	downloaded_files = []

	for url in urls:
		name = join(dest, url[url.rfind('/')+1:])
		downloaded_files.append(name)
		print "Creating %s..." % name

		req = urllib2.Request(url)
		req.add_header('User-Agent',
        	           'GoogleEarth/7.1.2.2041(X11;Linux (3.13.0.0);en;kml:2.2;client:Free;type:default')

		response = urllib2.urlopen(req)
		contents = response.read()
		with open(name, 'wb') as f:
			f.write(contents)

		response.close()

	return downloaded_files

if __name__ == '__main__':
	if len(sys.argv) != 3:
		usage()

	kmz_file = sys.argv[1]
	destination = sys.argv[2]

	if not(isfile(kmz_file)) or not(isdir(destination)):
		usage()

	downloadKMLresources(kmz_file, destination)

Conclusion

This was the last piece that I needed to automate all the process described in KML hell post.
Once each kmz file is downloaded, I can process them separately:

  • Unzip the kmz
  • Copy the doc.kml with an appropiate name
  • Remove any non-ascii character
  • Run through my slightly modified gdal
  • GDAL reprojection from WGS 84 to UTM zone 33N. I did not mention that in my previous post, but it is quite important.

Leave a comment

Filed under gis, tools

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.