Today I continued my rtsp scraping project. I wrote a function to query the scraped data so far within a time-range and location. I toyed with the idea of introducing parallel modules in python but then I decided to just launch a separate process for non overlapping parts of the data I want to analyze since it’s not gonna grow anytime soon.
Databases are not exactly made to deal with concatenating strings, so a stored procedure where I managed to automate a select and copy-to-disk statement was very annoying to write.
Other than that, I spent the day debugging and refactoring a map reduce job, which we can only run from within a client’s building (5 min walk from the office). It was cool to go inside the company and also to see how deserted it looked at 6 in the afternoon. Takeaway from debug session: I should really take care of my exceptions better.
I turned the bits of code I wrote to do a regex-like matching on lists and generally, ordered sequences of data (as opposed to mere strings) into a python class. I really think this is a cool idea. I also worked on my verification module and added some more empirical filtering to detect the events I am trying to detect. So far the output looks reasonable.
Tommorow I need to parse the data from the Map reduce job. Also, one of these days I need to either bite the bullet and manually transform an Excel from a client in a format we can query, or think of a smarter way to automatically deal with the fuzziness of the whole thing. Also, I really need to understand how git works for somewhat complicated cases, because right now I do not really get it and I feel bad about myself.