Building (Easy-To-Adopt) Software while Doing Visualization Research
Should Ph.D. students focus more on building software people will use?
There is a sense in visualization research (and probably in many other research areas) that building software that a sizable number of people use is not recognized as “research” and, as such, it’s not a particularly worth activity to perform while pursuing your career (as a Ph.D. student first and as a faculty later).
Maybe. Maybe not.
Let me start by observing that building software as a simple demonstration of an idea (as opposed to software that is actually used) is not only recognized in research but often also necessary. I can’t think of any applied work in visualization, and probably in CS in general, that does not require building some piece of software. Students (and some faculty) build software all the time anyway. So, what we are talking about here is whether building software that can be easily adopted is worth the extra effort necessary to facilitate adoption. Is it worth going the extra mile to build software that meets the needs of a sizable number of people?
I am not sure how to answer this question, but it seems pretty important to me because if many people voluntarily decide to use a given software, it’s a pretty good indication the work meets the important needs of a sizable number of people.
There are interesting examples of software that was originally developed in research settings and then attracted a large user base. There’s a tweet a wrote a while back where I asked people to share examples of software developed in research that has some decent user base and the response was quite remarkable.
There’s a lot out there! I encourage you to take a look at the responses. They are quite instructive to get a sense of how much there is out there.
From what I have heard from colleagues and acquaintances, it seems most researchers implicitly assume that there is only a very high cost in building such software and very few benefits. Is this true? I don’t know … but maybe one useful thing to do is to explore the potential costs and benefits. So, in the next two sections, I will try to list what to me seem to be the most obvious costs and benefits.
Costs
Here are the main costs of building software that can be adopted more easily.
Development. The first big cost that comes to mind is obviously the cost of writing the code. However, it is important to notice that a large part of this cost exists anyway. While building software that is easier to adopt (vs. just a simple demo) may require writing more code, I am not persuaded it necessarily translates to writing a lot more code. As an example, the type of software my students build is really complex and it’s not evident to me that the extra cost necessary for adoption concentrates on the extra coding part. In fact, the constraints necessary to make the software easier to adopt may even lead to reducing the complexity of the software and to discarding functions that are not particularly relevant or necessary.
Usability. Building software that people can use requires devoting way more attention to usability. While this may not necessarily translate into writing a lot more code, it does translate into taking care of multiple aspects that affect the usability of the software. First, the code itself needs to be usable because you may need to implement many modifications over time and you may also need to collaborate with other developers. Second, the interface itself needs to be usable and this may require more effort in designing a usable interface. Third, you need effective documentation and a streamlined installation process. At a minimum, one needs a GitHub page with instructions and code documented well enough for people to get started.
Interestingly, the way one decides to design and implement the software has an impact on how easy this step is. Anecdotally, in our lab, we have been pulling our hairs every time we built something that requires the user to install and configure a lot of preliminary infrastructural software (Docker anyone?).
But again, even the cost of having a nice GitHub page already exists. I personally find it really suspicious when I read a paper that does not give access to at least a demo and a nice project page. Nowhere it is stated that you have to have one, but I suspect I am not the only one who thinks this way. So, if you have to create a nice demo and project page anyway, why not do it in a way that makes it easier for people to try out and even adopt your software?
Promotion. Another potential cost is promotion. If you want people to use your software you also need to promote it somehow. But again, even here, don’t you need to let the world know about your research work anyway? Don’t you need to go to conferences and talk about your work? Why limit yourself to those rare moments? You can talk about your work all the time and work towards finding the niches of people who may really benefit from your work. Some people even manage to build a whole community around their software. And sometimes promotion is not about reaching out to as many people as you can, but more about reaching the specific target of users who may benefit from your software. This can have a big cost but it can also have very big rewards; including rewards that impact research in very substantial ways. For example, in visualization research feedback from users is valued as an important component of the evaluation step.
Maintainance. One more cost that comes to mind is maintenance. And this is probably the hardest one. The more people use a piece of software, the more people come back to you with requests. That type of work does not add anything to a Ph.D. thesis (at least as far as I can tell) and it’s potentially very demanding. However, that cost also scales with the success of a project and once a project is very popular there are different kinds of forces that could be put in motion. That’s where I have very little experience so I am just guessing here. But if I were to find myself in that situation I would try to build a financial infrastructure around the project.
Opportunity cost. Finally, the most pernicious cost is opportunity cost. The time spent working on software building could be devoted to other research activities that may eventually make your work more refined and impactful. This cost is hard to quantify and it’s not immediately clear that all the time spent developing software would otherwise be time used productively for research (I know people who find coding a very relaxing activity they do when they are “not working” - unfortunately, I am not that kind of person).
Maybe there are other costs that I am missing but these seem to be the main ones. Most of these costs already exist when someone wants to do quality research. Some others require an additional investment.
Benefits
After talking about costs, let’s try to consider the benefits.
Personal satisfaction. The first and maybe biggest benefit I see is personal satisfaction. The feeling of knowing that the artifact you built is used by hundreds, thousands, or even more people to do something they deem useful or important can be a big source of personal satisfaction. Not the ego-boosting type of satisfaction but the one that stems from knowing you did something useful for someone.
Of course, if the idea of building something people use does not make you excited, what I am writing here is beyond the point. You just don’t find it attractive as an endeavor and this is fine. But if you feel this could be an actual source of satisfaction, then I would consider this one of the biggest rewards behind building software.
Learning and evaluation. But I also see other benefits that are more functional to the actual research one is pursuing. The main benefit is to learn from people what type of problems they have and how they would like to use your tool. In other words, software can be used as a discovery process that leads to better ideas. I personally had countless moments where I learned something really useful or inspiring by observing potential users use the prototypes we built. In this sense, I have grown increasingly annoyed with our fixation on demonstrating that the thing we built is “good”. In other words, I don’t think that observing people use our software is particularly good as a way to validate what we have done. A much better use of software is to use it as a way to better understand what kind of problems people actually have and how they think about a problem. Software can be used as a “probe” to help you discover mental models and latent needs. And these are all examples of very useful knowledge that is without a doubt worth publishing and including in a Ph.D. thesis.
Recognition. The final benefit is the one I feel is maybe more contentious and more hidden, which is recognition. From what I hear around I think most people believe there is no recognition for having developed successful software and I am inclined to disagree. Regarding this point, let me start by admitting that you don’t get a Ph.D. for developing software. If you do not have publications there’s no amount of software that is going to help you with your degree. And I am also willing to recognize that there is nothing in the official descriptions of what a Ph.D. entails that points to the value of software. That said, this does not mean that great software is not recognized by our peers when they have to make decisions about who to hire. On the contrary, showing real-world impact, in addition to having all the rest in place (i.e., papers) always has an impression on people. Even more so because it’s incredibly rarer than having papers! When a person is invited for an interview for a faculty job it means the person already has enough papers to get a foot in the door. Once that foot is there, there’s always something else that makes a person’s profile more attractive than another. There is a myriad of factors that can play a role there, and this is not the right article to talk about them. But I am absolutely convinced that having done something that had a certain level of resonance beyond papers and citations always makes someone stand out. Always. Among these, showing a considerable user base for something you built, certainly qualifies as a plus.
What kind of software?
So far I mentioned software as if it was just one monolithic thing, but of course, software can take so many different shapes that it does not make much sense to talk about software in general.
When I look at the type of software developed in visualization I see at least the following two broad categories:
Full applications: these typically have a complex user interface, often with many windows to carry out data analysis or presentation tasks. A great example here is Polaris, which was developed as a Ph.D. thesis and later turned into Tableau.
Libraries and frameworks: these include packages and languages that permit to carry out specific tasks programmatically. Two very notable examples are D3.js and ggplot, both developed by students during their Ph.D. (by Mike Bostock and his colleagues at Stanford and by Hadley Wickham at Rice).
One interesting problem with visualization software is how easy it is for someone to integrate into their own environments. Anecdotally, libraries are way easier to try out and integrate than actual full applications. For full applications to work properly there is a huge overhead that needs to be dealt with.
In my own work I typically always focused on some sort of full applications and I have found that it is incredibly hard to get them adopted. There are several problems with full applications. One is that they are hard to integrate into existing workflows. People already have their own way of solving problems and adopting a whole new application requires extra effort. Another big problem is installation: visualization software tends to have a complex architecture that requires following quite a few intricate steps before somebody is able to use it. Finally, visualization researchers tend to have a bias toward very complex novel visualization techniques that people in the real world find really intimidating. Typical tools developed in visual analytics research are some sort of interactive dashboards on steroids: many views, all linked, many innovative visual representations, and lots of options. More than a dashboard they look like the cockpit of an airplane. In this sense, what gets rewarded in research is exactly what makes visualization applications hard to adopt. Most people do not need that level of complexity and do not have too much time to invest in learning a whole new way to look at their data unless it’s really necessary.
Libraries tend to be way easier to use and integrate into existing workflows but visualization has this sort of “original sin” where you can’t just create a library that spits numbers. You also need some visuals and some degree of interactivity. I suspect that the success of many visualization libraries is due to the fact that integrating them into existing workflows and “giving them a try” is way easier than for full applications.
An interesting hybrid solution I have seen recently is to build applications that can be launched from Jupiter notebooks or similar environments. This has the advantage that the application can be integrated more easily into existing workflows while retaining the capability of showing more sophisticated visualizations and including some useful degree of interactivity. This is however limited to highly technical environments where you can expect your users to already use these platforms, typically scientists and engineers. For less technically savvy people the problem I mentioned above still stands.
Conclusion
I hope I managed to summarize the main costs and benefits associated with building software that is adopted more easily. This post stems from my own frustration with seeing software developed in my lab being used for a few papers and then disappear forever. In a way, I am writing this for myself to see if I can think about this problem more systematically. If you have any experience or tips to share please let me know. I’d be happy to hear more about this topic.