A System for Data Storage and Collection

The full code of the project can be found here: BioLargo on Github

During the course of my bachelor's degree, I decided to take advantage of the Science Internship Program (SIP) that the University of Alberta offered. I was originally looking to find placement in a program that related directly to my major, however I was offered the opportunity to work at BioLargo Water , which would take advantage of my minor in biological sciences.

In order to develop my skills in the lab, I decided to take advantage of the offer from BioLargo water, and began to work on the decontamination of water samples with BioLargo's AOS. After a short while, it became apparent that the current system that BioLargo had been using to store experiemental results was leading to a lot of confusion. The status-quo consisted of simply writing experiments down in a lab notebook, storing them on a shelf, and then forgetting about them. Obviously not very helpful when you are looking backwards to validated results!

Initially, I developed a simple solution consisting of an SQLite Database, with a frontend written in Bottle. This prototype simply accepted .CSV files, parsed them, and stored them in a database which could later be searched and sorted to your heart's content. This system had some pitfalls though. The database was not secure, it could not handle multiple connections at a time, and the system did not handle users, logins, sessions, or any other major feature that you would expect from a project.

This project then began to sit on the backburner as I continued with my experiments. The project acheived what I needed it to do for my own usage, and that seemed to be enough.

During BioLargo's annual meeting, I presented my software to much acclaim. Many people were interested in buying the software, and my employer saw value in continuing the development of the software. The following semester, we brought in another developer to assist in designing and developing a proper solution.

We began by tossing out the software that was already written. It was far too small and not scalable for our needs. If we were going to develop a solution that could handle hundreds of users, a more robust framework was needed.

We chose Django because it handled much of the "behind the scenes" and security work for us. Handling the passwords, CSRF, XSS, and every other neat security feature that Django offers made our lives much easier. Later we adopted django-channels, however depending on when you are reading this, channels may be the official django by now.

Another issue that we ran into was generalizing the data. For our own personal use, having a rigid structure that we stored experiments in was something that we could cound on. Every experiment tracked the same variables, and the analysis of this data was similar for each experiment. Generalizing the data meant that not only were we going to be dealing with time-series data, such as the data that BioLargo produces, but we would also be handling datasets such as expression maps, sequences, spectra, images, and so on. This was a difficult problem to solve, as sql databases rely on having a rigid structure in order to keep things fast. We opted to store experimental data in our postgres database as a jsonb field. It allowed us to find the experiment quickly, however for experiments with many fields, lookups were not as quick as they could have been.

We chose to use jsonb because it allowed flexibility. Typically the choice would usually be to simply turn the experiment on it's side. That is, make every column in the experiment into a row. We did consider doing this, but we opted to go with jsonb because we believed that doing this would lead to fragmentation, and causing the database to swell with tons of redundant information. We took the performance hit, and chose jsonb instead.

Once we had the rough outline of our database designed and working, we moved forward on implementing features. It was at this point that we began to run into issues with both feature creep, as well as trying to find which features would be the best for our customers. Then we had to answer the question, who were our customers? Were we aiming for students? Labs? professors? small companies? It was this issue that, looking backwards, killed both the momentum for the project as well as the actual project itself. More time was spent trying to work out the aspects of the business rather than developing good software. The team worked in circles trying to figure out which product we should create, but in the end developed very little of a saleable product.

Looking back on the project, I don't see any specific "final nail in the coffin" moments, the project seemed to be death by one thousand cuts. Development teams were trying to develop a model for a consumer, then try to develop software for that model, and because we didn't know what our customer was, our product was changing on an almost daily basis, which made it completely unusable and unfinished. The biggest lesson from this project was to identify a really strong idea of what your customer is, and what they need to accomplish, then develop the tools to assist them. If the software isn't being designed for a purpose, it is very, very difficult to make a working product. The customer in our case would have been how we measured our success, and because we didn't have a customer, we couldn't measure if our software was actually accomplishing a need.

By @Charles in
Tags : #OSS, #Open Source, #Data Collection, #Python, #Django, #Javascript,