Arjen Wiersma

education

Today, I took the first step towards starting my master's thesis. I have decided to document my journey and experiences on my blog. Given that the thesis will play a pivotal role in my life for the next nine months, I expect it to be a thrilling journey.

Finding a topic {#finding-a-topic}

As a student at Open Universiteit in The Netherlands, I had the opportunity to work with the Rascal programming language during the Software Quality Management course. I found the language intriguing and enjoyed working with it throughout the module. After consulting with my professor, we brainstormed various possibilities for my master's thesis topic, ultimately settling on the creation of a visualization library for the language.

Although I had to complete my Software Architecture course before embarking on my thesis, I finally had the chance to visit the Center for Mathematics and Informatics (CWI), an important organization in the Netherlands. The CWI has a large department devoted to quantum computing and is responsible for developing and maintaining the Rascal programming language, in addition to being the first entity to connect to the internet and register a “.nl” domain name. Oh, and it is also the place where Python was invented. During my visit, we discussed a plethora of options, including reusable components and compositional visualizations, but ultimately decided to focus on projectional editors.

Projectional editors are a type of text editor that allows users to work with a representation of the code's abstract syntax tree (AST), instead of working with the code directly. This approach enables users to work with a higher-level representation of the code, which can make it easier to comprehend and manipulate complex code structures. Typically, users create and edit code in projectional editors by selecting elements from a palette or menu and positioning them in a visual representation of the code's syntax tree. The editor then automatically generates the corresponding code based on the user's selections. This approach is especially useful for languages with intricate syntax or when developing domain-specific languages.

To create a visualization of source code in the form of a diagram, a domain-specific language (DSL) is required to describe the visualization. A DSL is a programming language designed to solve problems in a specific domain or context. In this case, a DSL would be used to describe the structure and layout of the visualization of the source code.

Currently, there is no existing DSL that can be used to describe the visualization of source code. Therefore, as part of the master's thesis, the focus will be on creating a context-free grammar (CFG) capable of generating the DSL required for the visualization. The CFG will define the syntax and structure of the language, including the rules and patterns used to create the visual representation of the code.

Once the CFG is created, the DSL can be generated and used to create the visualization of the source code in the form of a diagram. This visualization will enable users to reason about their code in a high-level manner, helping to identify and understand complex code structures and relationships.

Context-Free Grammars {#context-free-grammars}

A context-free grammar (CFG) is a formalism used to describe the syntax of a formal language. A CFG consists of a set of rules that define how strings of symbols can be constructed in the language. The rules in a CFG specify how non-terminal symbols can be replaced with sequences of terminal and non-terminal symbols.

A non-terminal symbol represents a syntactic category or type of element in the language, while a terminal symbol represents a basic unit of meaning or a symbol that cannot be further reduced. The rules in a CFG define how non-terminal symbols can be replaced by sequences of terminal and non-terminal symbols, and how these sequences can be combined to form valid strings in the language.

Consider the following CFG:

S -> aSb | ε

In this CFG, S is a non-terminal symbol that represents a string of the form a^n b^n where n >= 0. The rule S -> aSb specifies that a string of the form a^n b^n can be constructed by taking a string of the form a^(n-1) b^(n-1) and appending an a to the beginning and a b to the end. The rule S -> ε specifies that the empty string is also a valid string of the language.

Using this CFG, we can generate the following strings:

ε
ab
aabbb
aaabbbb
aaaabbbbb

To generate the string aaaabbbbb, we can use the following sequence of derivations:

S => aSb          (apply rule S -> aSb)
  => aaSbb        (apply rule S -> aSb)
  => aaaSbbb      (apply rule S -> aSb)
  => aaaaSbbbb    (apply rule S -> aSb)
  => aaaabbbbb    (apply rule S -> ε)

Note that at each step in the derivation, we apply a rule to replace a non-terminal symbol with a sequence of terminal and non-terminal symbols, until we arrive at a string composed entirely of terminal symbols (i.e., the empty string or a string of the form a^n b^n).

Moving forward {#moving-forward}

The preparation phase for the master's thesis at Open Universiteit spans 11 weeks, followed by 22 weeks of research and writing. As I am just starting the preparation phase, I anticipate the topic to evolve over time, and I look forward to sharing my journey with you.

#education

Background {#background}

I have been working on getting my Master's degree in Software Engineering. To do this I have started following courses at the Dutch Open Universiteit. They have a Master's program that takes 2 years and you are able to follow it from home.

As this is my first post on the courses, I will give some background. As a teacher it is recommended to have a Master's degree. As I had a whole lot of work experience my teaching activities were not a problem, but from an overall point of view I was advised to pursue an Master degree. As my highest educational level up to now is a Bachelor I first had to follow a pre-master program before I could start the Master program. I will write about my experiences of the pre-master program later.

Software Quality Management {#software-quality-management}

I just finished the Software Quality Management course. It was a fun course that focused on the maintainability of code and measuring objective quality metrics. As is usual in a Master you read a lot of papers on the topic, from McCabe's complexity measure to Fuggetta et al. on the evolution of the software development process. All these papers were very interesting and have given me insights into the evolution of software development. It was enlightening to offset those papers with my own experience.

Most of the papers are from the late 1990's up to the early 2010's. During that time I worked in Silicon Valley and within the Dutch startup environment. All these companies were on the frontline of new technologies and methodologies. The papers hit home most of the time. Several papers on the implementation of Scrum and how software evolves felt a bit one-sided from an academic point of view, but that did not bother me too much.

The Rascal Programming Language {#the-rascal-programming-language}

One of the more interesting parts of the course was to write our own metric collection and visualization in the Rascal Meta Programming Language. Rascal is a domain specific language that allows you to reason about source code by building models and then querying these models.

In my group we implemented metrics collection for unit size, complexity, duplication and code coverage (testing). These were very interesting problems to attack and the result was a very cool visualization using D3js.

The project hit a happy place for me. As a result I going to work on an extension for Rascal for my Master Thesis.

References {#references}

Some of the articles that were used in the course are listed below.

  • McCabe (1976) A Complexity Measure, IEEE Transactions on Software Engineering.
  • Fuggetta, Di Nitto (2014) Software Process, ACM.
  • Bennett, Rajlich (2000) Software Maintenance and Evolution: A Roadmap, Association for Computing Machinery.

#education