My name is Piotr Moczurad and I’m a software developer and a member of the Luna team. We have recently released Luna, a data processing language meant to revolutionize the way people gather, understand and manipulate data, as an Open Source project. Let’s take a quick look at the problems it is trying to solve and how it attempts to do so. This post focuses more on the solutions we propose than on the guiding philosophy behind Luna, which was discussed broadly in the previous post.
Data processing software is all around us. We carry powerful computers in our pockets. We are constantly connected to the Internet, producing unprecedented amounts of data. We have started to take the tech revolution for granted, resting assured that the computers will keep becoming more and more powerful and we will be using the data in a new and marvelous way. The problem is that the scale of the phenomenon appears to be slowly getting the better of us. We already have more software than we are able to comprehend. And even more data that we yet have to process and understand.
As of now, to comprehend our data, we are using tools that themselves incur a cognitive cost. This results in a variety of problems: software is hard to write but also hard to understand. The more complex it gets, the harder it is to have a good overview of what it actually does. We are building upon the trust we have in the software and computers, but the recent events of various exploits may prove that we have a general problem: the software and hardware we build is simply too complex to understand using the current tooling.
There are numerous domains that benefit from using computer-accelerated methods: physics, medicine, bioinformatics, social sciences or finance, to name only a few. Each domain is unique and has its own set of requirements and challenges. And in each domain there exist wonderful experts, who need to have the tooling powerful enough to help them express their ideas and intuitive enough not to burden them with additional complexity. Even though nowadays virtually every business has massive amounts of data to process, the tooling we are using (or, more precisely, the way we use it) hasn’t undergone any breakthrough changes since the seventies: it is complicated and unintuitive. We are focusing on learning to use the tools, instead of solving the problems at hand.
The computer is, beyond any doubt, an indispensable and incredibly powerful tool. However, it is the human brain that has the “spark”: creates ideas, invents new things and tells the software what to do. It is the brain that we need to design against, not the machine. The user experience researchers have known it for years: now it is time for the programming languages to catch up. This is the core philosophy behind Luna and in the following few paragraphs we will try to break it down piece-by-piece and back up with examples.
What is Luna?
Luna is an open source, WYSIWYG data processing language. Its goal is to revolutionize the way people are able to gather, understand and manipulate data.
Luna targets domains where data processing is the primary focus, including data science, machine learning, IoT, bioinformatics, computer graphics or architecture. Each domain requires a highly tailored data processing toolbox and Luna provides both a unified foundation for building such toolboxes and a growing library of existing ones. At its core, Luna delivers a powerful data flow modeling environment and an extensive data visualization and manipulation framework.
Our key goal has always been to change the way people interact with the computer in order to solve problems. We want the languages to change to reflect the way we think, not the other way round! To this end, we have built Luna around four main pillars:
- Data flow graph – You can express the flow of data as a visual graph. It is immediately intuitive and reflects the way we think about the data: flowing through the program, with subsequent transformations and modifications. At each step you can see how the data looks with interactive visualizations and if you want to try an alternative modification you can branch of off the graph, keeping the original flow clear and intact.
- Adjustable levels of abstraction – Depending on the circumstances, you may want to view the data flow on different levels of abstraction. Sometimes you need to get a birds-eye view of the flow and sometimes you need to dive into the details. Every component in Luna is built out of other components, without exception. You can always dive all the way down to the desired level of abstraction and fine tune it to your needs. You can also collapse several connected components into a new, more powerful one and share it with others.
- Data visualization and manipulation environment – From the highest perspective, Luna allows you to visualize and manipulate data using interactive and extensible WYSIWYG components. Moreover, Luna provides a way to easily define new components, modify existing ones and share them with the community. Visualizations are fully interactive, they always reflect the current data flowing through the node.
- Code representation – Luna provides its users with a unique capability to switch between representations: from the data flow graph to code and vice versa. It implies a very important truth: the graph is as powerful as the code, or phrasing it even more strongly: the graph IS the code. Even though we believe that the visual representation is the natural way of expressing data flows, sometimes the amount of details is inherent to the problem and may be more easily comprehensible in the text form. In that respect, Luna maintains a textual representation of the graph in the form of an elegant and concise functional language. Even though you want your data processing pipeline to be visual and clear, if it involves a fine-tuned implementation of the Fast Fourier Transform, you will be more efficient working on the code. This reflects another important principle of Luna’s design: empower the users, not enforce some specific philosophy.
Let’s think how the traits above make Luna address the problems we very briefly outlined in the beginning of the post.
Manageable complexity ⇒ productivity + security
Introducing visual representation and, equally importantly, the ability to view the data flows at different levels of abstraction, lets the creators of the systems express their thoughts more intuitively and focus on the vital aspects of their creation. By taking away the cognitive strain introduced by having to think like a machine, we boost productivity and decrease the probability of making a mistake. Moreover, when you are able to view the flow at several, clearly defined levels of abstraction, you are able to focus on the overall correctness of the logic with no distractions, and on the implementation details only when you need to.
This amounts to higher productivity and, even more importantly, higher correctness and robustness. In a way, Luna is playing on the same team as your brain, which is mutually beneficial. If this argument does not convince you, we strongly recommend reading the fantastic article about the problems of software development and ways of improving the current state-of-art or the great piece about software complexity.
Data processing flow that finally looks the part
The problem with current state of data processing tools is that, even though really ingenious, they look like any other programming language library. And the problem with data processing pipelines is that they look like any other code. For example, the very generic data science workflow in Jupyter is the following:
- read the data from CSV into the data frames,
- calculate and visualize key statistics,
- make some transformations like smoothing, removing outliers, etc.,
- plot again to see if the data looks better,
- train and evaluate several models,
- check and assess the predictions.
One of the most annoying and counterintuitive aspects of this work is, surprisingly, the mix of visualizations and code. You need to quickly visualize some data, but that requires you to write some code in the middle of your logic. That process repeats and you start losing track of what is the actual logic and what is only auxiliary. You try to replay the steps, but you are no longer sure what they actually are.
This is where Luna shines: your pipeline is a simple, readable graph. The visualizations are built-in, so you can reveal them simply by clicking on the pipeline at a given stage. Even if you need to quickly test a hypothesis or serialize data to JSON, you can do it by creating a branch off of the graph: you can do all the processing you want, but it is clear what is the main pipeline and what is not.
All of this is best witnessed by trying it yourself: it is amazing how well the graph-based flow fits into the data-processing setting.
Tearing down communication barriers
Our claim is that the majority of the problems in project management stems from insufficient communication. As described in Edward Yourdon’s Death March, it is often the case that the engineers are keeping the management in the dark. Sure, the programmers can be difficult to manage (and I’m saying this as a programmer myself), but the nature of the problem is somewhere else: in lack of the common language. The programmers focus on implementation details. The management focuses on goals and OKRs, the business focuses on money, the domain experts focus on the problem, etc. Our dream is to make this barrier disappear. To create a language powerful enough to create everything our brains come up with and at the same time comprehensible even for the non-technical users (the different layers of abstraction are a great help: not everybody needs to know nitty-gritty details of the code, but everyone should know what the product does!)
If you look at all the traits we described earlier, you’ll see how comprehensibility was one of the main goals of Luna. The dual representation, different layers of abstraction, easy visualization: we hope this will allow Luna to become a common language of all IT practitioners, both on the technical and non-technical side of the project. Ideally, to such extent that this duality will no longer exist.
Who is Luna for?
Even though Luna is designed to be a very general-purpose tool, there are some areas in which we think it will shine. We cooperated with professionals from different industries and tried to address the needs of many areas. The primary fields in which Luna will be helpful include:
- Data Science: the vast amounts of data and intricate processing workflows are an ideal fit for Luna, as described above.
- Microservices integration: a trendy technology that has never quite gotten the tooling it deserved. Integrating numerous services is a cumbersome task when all you have are XML configs and a piece of cake when you see how all the components interact.
- Internet of Things (IoT): streams of data from different sensors, that would otherwise be hard to comprehend, become much more manageable when processed using a visual environment. In fact, our very first alpha testers were coming from the IoT industry.
- Bioinformatics: in bio-tech companies, you have biologists working with computer scientists on complicated pipelines. The ease of communication that Luna provides is bound to increase the productivity of teams.
- Entertainment: Luna was inspired by the VFX industry and processing graphics, video and sound are still among the target use-cases. Node-based programming is the de facto standard in video compositing. It can be used equally well for processing graphics and sound, though.
What are the next steps for Luna?
Luna has just become Open Source, so we encourage everyone to contribute! Creating Luna is a huge task that cannot be completed without a great community of contributors. The core Luna team is so small, that it is virtually impossible for it to take on all of the tasks. Ever since Luna is open, the issues are constantly created and terrific ideas are born. We try to resolve issues as they arise, but we will not be able to do so without community support. On the technical side, we use a really interesting stack with Haskell, GHCJS, and React.js, so you will not get bored easily. We encourage you to talk to us on our chat, visit our discussion forum and check out the code on our github. Remember, we need you!
The core Luna team, which is a small group of people passionate about our mission, will continue to work on Luna’s performance, libraries and provide constant support. Expect Luna to get faster and even more beautiful!
We hope you’ll have a great time using Luna!