QGIS TimeManager part 1

I found an open source project I would like to contribute. It is called Time Manager and it enables the QGIS software to show an animation of the progression of geospatial data in time. It is tremendously useful to provide insights on the sort of data we use at work. Before I knew about this I would just create a layer for each timestamp, hide all the others, screen capture my screen, do this all for all timestamps and use imagemagick to create an animation for this. (This was back in September off course). With Timemanager, you just need to load a layer that has a time field, and the plugin takes care of the rest for you. I generated a tiny set of data to test it with:

longitude,latitude,date

1.00000, 5.00000001,2014-01-01
1.00000, 5.00000021,2014-01-02
1.00000, 5.00000031,2014-01-03
1.00000, 5.00000020,2014-01-04
1.00001, 5.00000015,2014-01-05

And this is how it works:

ImageImage

Posted in Uncategorized | Tagged , | Leave a comment

Avro and other data serialization formats

Today I wrote an avro schema and handled avro data in both python and java — seems sane enough, unlike protobuffer and its weird c++-to-java bugs. Here is a collection of interesting links.

Also, compiling a schema to java:

java -jar  avro-tools.jar compile schema schema.avsc .

 

Posted in Uncategorized | Leave a comment

Daily log

Today I continued my rtsp scraping project. I wrote a function to query the scraped data so far within a time-range and location. I toyed with the idea of introducing parallel modules in python but then I decided to just  launch a separate process for non overlapping parts of the data I want to analyze since it’s not gonna grow anytime soon.

Databases are not exactly made to deal with concatenating strings, so a stored procedure where I managed to automate a select and copy-to-disk statement was very annoying to write.

Other than that, I spent the day debugging and refactoring a map reduce job, which we can only run from within a client’s building (5 min walk from the office). It was cool to go inside the company and also to see how deserted it looked at 6 in the afternoon. Takeaway from debug session: I should really take care of my exceptions better.

I turned the bits of code I wrote to do a regex-like matching on lists and generally, ordered sequences of data (as opposed to mere strings) into a python class. I really think this is a cool idea. I also worked on my verification module and added some more empirical filtering to detect the events I am trying to detect. So far the output looks reasonable.

Tommorow I need to parse the data from the Map reduce job. Also, one of these days I need to either bite the bullet and manually transform an Excel from a client in a format we can query, or think of a smarter way to automatically deal with the fuzziness of the whole thing. Also, I really need to understand how git works for somewhat complicated cases, because right now I do not really get it and I feel bad about myself.

Posted in Uncategorized | Leave a comment

Postgis convex hull

Quantum Gis has been  pissing me off for the last week with its slow rendering time.  I probably should switch to something leaner. For the time being I will just export what I want as an image, for instance the convex hulls of the area codes that I want to keep in mind.

select area_code,
 ST_ConvexHull(ST_Collect(geom))
 from stations
 group by area_code
 having count(*)>10;

The interesting thing here is that the Convex Hull function is not an aggregate, but the collect function is, as explained by the Postgis manual. I like the manual, it usually has the correct clarifications regarding functions so that I can make good decisions.

Posted in Uncategorized | Leave a comment

Learning moar sql and regexp

Today I woke up at 5:30 and wisely chose not to go back to sleep because then the whole day would go to the dogs. Anyway, it’s 10 now and I have already achieved some things. I parsed the source code of an internal partner webpage (because this would be faster than waiting for them to give us the data in the format we want and they have some WEIRD data formats in there) which means I finally got to dip my feet in regular expressions, something I’ve been putting off in favor of doing lame substring() and split() calls all over the place [eww]. Anyway, that was fun. I like the stringbehind operator as explained here by Gwen.

After parsing the data I loaded it into the database using this tidy script.


drop table sensors;

CREATE TABLE sensors (
  id varchar(50),
  lon double precision,
  lat double precision);

\copy sensors(id,lon,lat) FROM 'sensors.csv' DELIMITERS ',' CSV HEADER;

alter table sensors add column the_geom geometry;
update sensors set the_geom=ST_SetSRID(ST_MakePoint(lon,lat),4326);
CREATE INDEX idx_sensors_geom ON sensors USING GIST(the_geom); -- create index, just in case

Now I can analyze those things in QuantumGIS – for which I ordered a book yesterday – this one here by Anita Graser – and hope to get some pointers on doing batch processing and stuff like that.

I am at the moment struggling with making the solution of a variant of “HOW DO I RETURN THE ROW HAVING THE MAX OF ITS GROUP, LOL?” finish in reasonable time because I have “big data”, a couple of joins, and it really sucks. It is weird that this thing is so hard to do in SQL, it’s not exactly an obscure thing that nobody needs. It would be cool to be able to get some query time estimates for postgres so that I know a query is doomed after waiting a bit.

I also need to write a simple map reduce job which should probably take precedence. I also want to learn what stored procedures are exactly (yesterday I wrote my first sql functions and it’s really good for my sanity – goodbye copy pasting 5 lines of SQL from txt files I forget where I saved them the next day).

Posted in Uncategorized | Leave a comment

First steps with pgrouting

I installed pgrouting today (not especially painful) and loaded some data both from osm2pgsl and osm2pgrouting. The tables generated from both are not 100% compatible. The worst thing is that the points in the planet_osm_point table are not usable as vertices for the graph algorithms of pgrouting.So, you need to find the closest vertex that is in the table ways (generated by osm2pgrouting) that is closest to the point from table planet_osm_point (generated by osm2pgsql). Ah, and the geometries are in different SRIDs, but I suppose I will fix that manually later, like so.

update osm_planet_point set way=ST_Transform(way,4326); -- standard SRID, not the weird one osm uses

After about half an hour of hacking around mildly frustrated (because I don’t have muscle memory for these libs, since I am new to the whole thing) ,the solution was to find the closest target (or source, doesn’t make a difference) osm_id to the point id that corresponds to a city, like so

select line.target as nearest_street, pt.osm_id as city
from planet_osm_point as pt,
ways as line 
where pt.name='Genève' and pt.place='city'
order by 
ST_Transform(line.the_geom,900913) <-> pt.way
 LIMIT 1;

Yet another issue with the Open Street Map data is the fuzziness of the names. I could not find the city Lucerne or Luzern and there is stuff in the names like:

 Peney-Dessous
 Aire-la-Ville
 Peney
 Les Clos
 Café de Peney
 Restaurant 2
 022
 021
 020
 017

OK, whatever that means. I run this query to try to find Zurich:

select name,place from planet_osm_point where name like 'Z%ri%'

and sure enough, there’s both Zurich and Zürich with an umlaut on the u and only the latter is listed as a city in the place column. (and there is also a Zürich that doesnt have anything in the place column). Thankfully, I just want one point inside Zurich, so any of these will do.

Posted in Uncategorized | Leave a comment