I was kind of bored while doing some work, and just past week we were discussing with a colleague about my vague-places project.
This project was forgotten in time, but today I’ve blown the dust away, recovered it, and updated the Europe DbPedia map.
Most of you won’t be interested in the full story. So here, see a set of results. If something picks your interest (like, why Portugal has almost no points) just keep reading 😀
Fixes and improvements
Coming back to my project, I had to fix a couple of issues (check the last commits). It was mostly about incorrect class checking, and a change by dbpedia. On the past i used
dbpedia-owl tag, now it is called
This was the only fix needed to make it work. But I added another improvement. The python script was thought as a kind of interactive application, it gives you all feedback to the stdout. A nice new option
--reportFile lets us dump the results to a reportFile instead of just printing.
Cmaking alpha shaper
Alpha shaper is a small application generating the alpha shape of the resulting points. It uses CGAL, and I’m happy to see there’s a CMakeLists.txt file. I only had to reinstall some dependencies (CGAL library) and it was good to go. It’s not very difficult to build.
I want to retrieve as many Europe points as possible, the idea is to OR with various names of common places. I’m using: city, town, village, coastal. This differs with the retrieval of the full dataset, and of course will have less points.
python vagueplaces.py --query city town village capital coastal --live --alpha 1 data.csv --reportFile alpha1_report.txt
I run this command with different alphas: 2,5,10.
Additionally it seems that the program recommends me to use an alpha of ~1900, we’ll try that too.
First let’s have a look at the obtained Europe Dataset
With a quick glance wee see that there’s a lot of density in Europe as a whole. Eastern countries have even more density than other countries like Portugal or France. But overall seems like a good dataset to work with.
There are some outliers on this dataset as shown with this whole world map. The most jarring one being the long trail of points in Russia. The other ones being colonies (I assume) from France, UK, the Netherlands… etc.
So far I’m happy with the result, and already surprised that this thing is still working :-D. Now it’s time to check the results from the alpha shaper provided by CGAL. On my previous post about Europe I just talked about the points, but now we’ll also check the polygons.
A picture is worth a thousand words.
The results are not heartbreaking, but not what I expected. Alphas of 10 and 5 approach the Europe polygon. I’m not showing the result of the alpha 1900 run as it is almost a convex hull.
Alpha 2 gets a closer result to what Europe looks like. But, landmasses like UK are still recognized as the same polygon. I can live with that since a smaller alpha should take care of this. Even worse, there are some artifacts that should not be there. Just look at the alpha 2 and 10 representation of the points in Russia, those points should belong to the same polygon, but instead we are obtaining other islands.
Seeing the results I reduced the alpha to 1, expecting a more sharp polygon. But this is what I get:
I did not dive into why the alpha shape 1 is giving me this results. This is something that I have in my mind as PENDING. I understand that lower alphas can have more islands, but this result points to something wrong in my approach.
Since I have the dataset here, I want to repeat a similar map to the one that I provided in 2012, for comparison. I did not try to reproduce the same map (in colorscheme and style) but it presents a similar dataset.
The main difference between the maps is how the data was obtained. The 2012 map presents all wikipedia points with x,y data and
yago:EuropeanCountries, while this 2017 one is generated with a more sparse query on the Abstract of the points. This causes the new dataset to have less points than the one used in 2012.
I had a little fun doing this, and my mind started to boil some improvements and a more “live” view. But I always think in new stuff to do and most of it does not get done 😛