In Section 6.1 below we provide answers to the research questions, based on the collected data. The results of applying Vargha and Delaney’s A statistic to the absolute errors of our models’ predictions are given in Table 4. As explained above, other techniques provided similar models, only slightly less accurate. The models obtained with different techniques provided generally concordant indications. NN models provided slightly more accurate predictions then the other models, hence we report only the results from NN models.
During the experiment sessions, participants could have experienced fatigue or tiredness, which could have affected the time taken to perform tasks. Concerning this issue, none of the participants reported to have experienced fatigue or tiredness. Models were built and evaluated via 10-times 10-fold cross validation, i.e., the dataset was split randomly in ten subsets, and each subset was used as a test set to evaluate the model built on the basis of the other data. The procedure was repeated 10 times to average out the effects of random splitting. To ensure the homogeneity of participants, we involved Master’s students in Computer Science, all having similar levels of knowledge of the coding language and similar levels of programming experience.
Featured in DevOps
We built models using Support Vector Regression (SVR), Random Forests (RF) and Neural Networks (NN) approaches. The analysis was carried out using the R programming language and environment (R core team 2015). At first, we tried building models via Ordinary Least Squares (OLS) regression, both linear and after log-log transformation, given that the distribution of data is not normal. Therefore, we tried applying more sophisticated Machine Learning techniques. In our empirical study, it is possible that a faulty method \(m_1\) calls a method \(m_2\) that is also faulty.
Then he showed the prototype to Sean, who did not like it and felt this response time was bad. Initially, David considered also the Database Subsystem (DBS) and the Hardware Subsystem (HWS) as relevant artifacts, because the quality of FRG depended on their characteristics. Sean, however, only dealt with FRG because he did not see the other components of the system.
A Guide to Software Understandability: Why It’s Essential To Every Developer
This means we collect data about a predefined set of events, which tends to be about how the system is interacting with the world around it. In such organizations, Understandability takes on an even more powerful form, determining how well engineers can understand how the software operates and how it is being utilized by the application’s customers. When your phone is going off in the middle of the night because something is wrong, your understanding of the application is vital. First of all, you have to use the information you have to verify this is an actual service disruption.
A recent paper by Scalabrino et al. (2021) describes an extensive study in which 444 evaluations concerning 50 methods were provided by 63 Java professional developers and students. Perceived understandability was measured by asking the empirical study participants whether they understood a code snippet. If so, they were asked to answer three confirmation questions, with the purpose of measuring actual understandability.
In particular, models based on structural and textual features are more accurate than the previous ones. In our empirical study, we considered source code measures that have been present in the literature and in practical use for several years. In addition, we considered Cognitive Complexity, a novel measure that was introduced with the purpose of overcoming the pitfalls of existing measures (Campbell 2018). Throughout the paper we name this measure CoCo, to avoid confusion with the actual cognitive complexity, i.e., what CoCo is supposed to evaluate.
But it is difficult to evaluate software understandability because understanding is an internal process of humans. So, we propose “software overhaul” as a method for externalizing process of understanding software systems and propose a probability model for evaluating software understandability based on it. This paper presented the experiment of evaluating software understandability using a probabilistic model. The model by Buse and Weimer was later simplified by Posnett et al. (2011), who used only 3 source code measures, namely LOC, entropy, and HVOL.
Counterexamples are likely to contain an event that causes the property violation. They are usually provided as error traces or in a tree-like structure to guide through the systems source code or operational behavior. In the nonprobabilistic setting, model-checking approaches are prominent for generating counterexamples as a byproduct of the verification process [584]. The generation of counterexamples in the probabilistic setting is, however, more involved [987–989]. In [990], Leitner-Fischer and Leue describe an approach to generate counterexamples based on event variables in a specialized event-order logic and use them to reason about causality, implemented in the tool SpinCause [991]. Causality-based approaches that analyze counterexamples and extract user-understandable visualizations have been suggested only recently [992,993].
Consequently, we designed all code correction tasks in such a way that their actual difficulty lay in identifying the problem (which, in turn, required understanding the code), while correcting the code required little time. Participants confirmed that performing corrections and checking them via the available test cases was actually quite easy and fast, once the problem had been understood. Finally, we interviewed industrial developers from two companies to have their opinions on the study and its validity. The interviewed developers appreciated the idea of correlating code metrics with understandability measures and supported the conclusions of the study.
With respect to the latter point, it is worth noting that none of the metrics that are statistically significant in Trockman et alii’s models belongs to the set of metrics we investigated. Since there is no code metric that appears to be a much better predictor of understandability time than any other code metric, we proceeded to investigate the accuracy of models with multiple independent variables. We must consider that understandability is an external property, i.e., it does not depend exclusively on code properties, but also on additional properties and conditions, with special reference to who has to understand the code.
Specifically, even the newly introduced Cognitive Complexity measure does not seem able to fulfill the promise of providing substantial improvements over existing measures, at least as far as code understandability prediction is concerned. It seems that, to obtain models of code understandability of acceptable accuracy, https://www.globalcloudteam.com/ process measures should be used, possibly together with new source code measures that are better related to code understandability. A consensus exists that readability is an essential determining characteristic of code quality, but not about which factors contribute to human notions of software readability the most.
- The dynamic activation and deactivation of features can be used to describe adaptive behaviors of feature-oriented systems, e.g., to model adaptive heterogeneous hardware systems [1005] and context-dependent systems [1007,1008].
- Discover the semantic link network of sentences in the original representation.
- Models were built and evaluated via 10-times 10-fold cross validation, i.e., the dataset was split randomly in ten subsets, and each subset was used as a test set to evaluate the model built on the basis of the other data.
- Most of them share the idea that causes raise the probabilities for their effects and rely on a formalization using conditional probabilities.
- The more you understand the application you are responsible for, the easier it is to put the pieces together.
To minimize the impact of developers’ capability and experience on the maintenance time, we selected a set of developers having similar experience and similar capability. Therefore, our results depend almost exclusively on the properties of code. A study by Scalabrino et al. (2018) took into account textual features based on source code lexicon analysis in addition to structural features. Their study takes into account the datasets collected by Buse and Weimer and by Dorn along with a new dataset built based on a new set of code snippets. Empirical findings show that textual features complement structural ones.
As a result of negotiation, David agreed not to discuss DBS and HWS with Sean. Understandability is the extent to which representations are designed in ways clear to pivotal audiences. With the rise of Software as a Service (SaaS) and other new software delivery paradigms, many organizations are practicing total ownership of software, empowering engineers to take responsibility for the application throughout its lifecycle. We take it upon ourselves (or our teams) to change that existing application so that it meets some requirement(s), such as developing a new feature, fixing an existing bug, etc.
On the other hand, we have also challenged the assumption that Understandability can always be improved by tackling complexity and designing and writing better, simpler software. More often than not, we find ourselves boarding the train midway through, with little or no control over how it got there. And so, we must start tracking and managing Understandability as its own key metric, maximizing engineering velocity and quality, even under less than optimal conditions. Last but not least, make sure to have the scaffolding in place to deal with complexity when it arises. Write automated tests in the form of both unit-tests and system-tests to ensure that your engineering team can safely refactor that complexity away. Put in high-quality observability tools to help you gain a high-level understanding of the system.