A Night at the Museum

2016-11-30

Links(!):


Hacking the Stacks

On the weekend of November 18th-20th, I participated in the American Museum of Natural History's Hack the Stacks hackathon. Here's a little bit about what I worked on that weekend.

Making an AMNH API Portal

When I got to the hackathon, I decided to join the AMNH API Portal team, in a challenge that organizations in all sectors face—creating a layer for calling multiple APIs simaltaneously and joining the responses into single entries. I chose this challenge because:

  1. It was clearly important to the Library.
  2. It was focused on data processing, a skill I've been building up a lot recently
  3. It required minimal knowledge of specific libraries, languages, or frameworks that I wouldn't have time to learn over the course of a weekend

The AMNH has five APIs that staff and patrons routinely call for information on research subjects. Currently each API has its own portal, requiring the user to search five times and them manually join the results into a coherent entry on the topic.

Over the course of the weekend, we created a rudimentary system for hitting all five APIs at the same time and appending the responses into a single result.

Flipping through the Library Catalog

My piece of the work involved figuring out how to get results from one of the data sources, the AMNH Library Catalog. The Catalog includes metadata for individual books and journal articles.

Working with the information, which is based on Sierra (a common software solution for library catalog APIs), I was able to create an individual json object for each entry in the Catalog and store each entry as an individual file, to be uploaded into an elastic search system set up by a teammate.

While writing the script itself wasn't that difficult, there were two subjects I needed to learn a lot about on the fly (late at night), OAuth2 and threading.

OAuth2 is a common tool for gaining permissions for requests, and I'd worked with OAuth from a web developer perspective, but fiddling with headers and understanding the correct form for making requests was an annoyance. The steps for gaining permission to get the accesss token needed to make a request to the library catalog was as follows:

  1. Get an auth code, which is client_key:client_secret converted to Base 64.
  2. Make a POST request to the library catalog website url, specifying what kind of credentials requested in the header and the application type and the auth code in the body of the request.
  3. Parse the response, which is a json object, for the access token.

Threading is something I'd never used before, but I wanted to learn more about, as needing to speed up processes is an issue that comes up time and again (sometimes vectorized code only goes so far). Working with a teammate, we were able to set up a pool of six workers to make requests to the Catalog. While there's some future work to be done around exception handling (our threads had a tendency to die), it was a great introduction to parallelized code, a key topic in data science.

End Notes

In the end, I think what was most valuable for me about the hackathon was being able to work with a team and to feel like I made a contribution. I was by far the least experienced technologist on my team, which consisted of five software developers and me. Still, I had enough knowledge to be able to successfully figure out how to work with the Library Catalog and get the data my team needed. More importantly, I enjoyed working on a team. Because I was able to successfully contribute, I was respected and treated like an equal. It was fun to work with a smart, positive, interesting group on a technical challenge. I want to thank Joseph Spens, Evan Hammer, Jesse Lee, Alex Washburn, and Tom Lavenziano for being just generally awesome.

Finally, it was a big bonus for me that I was able to help out a cultural organization I've always loved (my mom noted that eight-year-old me would have killed to sleep under the dinosaurs on the 4th floor on the museum, which I got to do on Saturday night).

Hello, dino!

Overall, a fantastic weekend.


Tags: museums, data science, nyc, python

[Return Home]