Something I think about off and on again whenever I code, is the state of the documentation of tooling when actually deploying applications to the web. This weekend at hackNY, proved to be a case point in my theory documentation regarding software deployment is kinda terrible.
For the project our team worked on, I was responsible for the backend. This was a API made in flask using scikit-learn to deploy a machine learning model I trained based on the income data of refugees that entered the United States. The model itself wasn’t that bad to create, I prototyped the design in R, and then used Flask to allow users to make POST requests to it. The trouble came during deployment the API to the internet.
I initially used Heroku, a PAAS (Platform as a Service), to deploy the Flask API to the internet. Unfortunately, Heroku had a extreme dislike to the file structure that I was uploading. One strange problem was that while I did
run.py to keep the app in a
app folder, the way the directory’s read meant the CSV file that the model was trained off of would refuse to be read by pandas.
After around seven deployments and many
Heroku Logs, I decided to user Render, another PAAS albeit much newer. The virtue of Render, compared to Heroku, is that I could run everything with just a
Requirements.txt file, along with have the CSV file be in the main directory.
The reason I bring this up is, why do accept that our technical infrastructure will be so poorly documented?
While StackOverflow is great and all, it’s a bit concerning how spare the documentation for debugging and deploying applications that are not the default that these services have.
This concern is also combined with some snooping I did with networking tools I did recently regarding a application quiet a few people use. In the
GET requests, one’s API keys are accessible on the frontend and by making more requests, other APIs that require that key are easily callable.
The information that can be called from these APIs is somewhat uncomfortable given the fact while they in theory need the data, it should be much harder to reverse engineer it then it actually was.
Back to the point about infrastructure, it should not be the case where one has to dive through a small army of blog posts to actually learn how to use a technical service properly.
And no, I am not telling you what app I reverse engineered. I will give a hint, the parent company got sued recently.
While we have made great progress in infrastructure since 2015, I still believe that we have a long way to go in regards to providing good documentation and debugging tools to help developers put applications in production. Considering how tedious it still is to setup CORS and make
POST requests, we still have some ways to go.
Back to the Hackathon
The actual hackathon itself was great. It was nice to be a part of a time where everyone had their own bit to work on, in my case the backend and machine learning model, and in my partners case designing the frontend.
It is pretty cool to have more experience deploying machine learning models into the wild as APIs, where anyone (until I take the service down), could potentially use the API. A lot of the times I use machine learning as a form of data analysis, to see if their are interesting trends in the data. Using machine learning to create predictions was a nice change of pace.
The fellows at hackNY were pretty helpful as one. One of them tried to help me solve the problem I was having with Heroku, and they were in general a good source of advice for not only technical help, but advice regarding the tech industry as well.
To be honest, one of the things I will miss about college are all the hackathons I been able to go to in my spare time.