Skip to content

Blog

Testing your algo on a java project

When developping algorithm on top of the Moose platform, we can easily hurt a wall during testing.

To do functional (and sometimes unit) testing, we need to work on a Moose model. Most of the time we are getting this model in two ways:

  • We produce a model and save the .json to recreate this model in the tests
  • We create a model by hand

But those 2 solutions have drawbacks:

  • Keeping a JSON will not follow the evolutions of Famix and the model produce will not be representative of the last version of Famix
  • Creating a model by hand has the drawback of taking the risk that this model will not be representative of what we could manipulate in reality. For example, we might not think about setting the stubs or the source anchors

In order to avoid those drawbacks I will describe my way of managing such testing cases in this article. In order to do this, I will explain how I set up the tests of a project to build CallGraph of Java projects.

The idea I had for testing callgraphs is to implement real java projects in a resources folder in the git of the project. Then, we can parse them when launching the tests and manipulate the produced model. This would ensure that we always have a model up to date with the latest version of Famix. If tests breaks, this means that our famix model evolved and that our project does not work anymore for this language.

Parse the project
Parse the project
Create java project
Create java project
Import the model
Import the model
Run tests on the model
Run tests on the model
Text is not SVG - cannot display

The first step to build tests is to write some example java code.

I will start with a minimal example:

public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}

I’ll save this file in the git repository of my project under Famix-CallGraph/resources/sources/example1/Main.java.

Now that we have the source code, we need a way to access it in our project.

In order to access our resources, we will use GitBrigde.

You can install it by executing:

Metacello new
githubUser: 'jecisc' project: 'GitBridge' commitish: 'v1.x.x' path: 'src';
baseline: 'GitBridge';
load

But we should add it to our baseline:

BaselineOfFamixCallGraph >> #gitBridge: spec
spec baseline: 'GitBridge' with: [ spec repository: 'github://jecisc/GitBridge:v1.x.x/src' ]
BaselineOfFamixCallGraph >> #baseline: spec
<baseline>
spec for: #common do: [
"Dependencies"
self gitBridge: spec.
"Packages"
spec
package: 'Famix-CallGraph';
package: 'Famix-CallGraph-Tests' with: [ spec requires: #( 'Famix-CallGraph' 'GitBridge' ) ]. "<== WE ADD GITBRIDGE HERE!"
].
spec for: #NeedsFamix do: [
self famix: spec.
spec package: 'Famix-CallGraph' with: [ spec requires: #( Famix ) ] ]

Now that we have the dependency running, we can use this project. We will explain the minimal steps here but you can find the full documantation here.

The usage of GitBridge begins with the definition of our FamixCallGraphBridge:

GitBridge << #FamixCallGraphBridge
slots: {};
package: 'Famix-CallGraph-Tests'

Now that this class exists we can access our git folder using FamixCallGraphBridge current root.

Let’s add some syntactic suggar:

FamixCallGraphBridge class >> #resources
^ self root / 'resources'
FamixCallGraphBridge class >> #sources
^ self resources / 'sources'

We can now access our java projects doing FamixCallGraphBridge current sources.

This step is almost done, but in order for our tests to work in a github action (for example), we need two little tweaks.

In our smalltalk.ston file, we need to register our project in Iceberg (because GitBridge uses Iceberg to access the root folder).

SmalltalkCISpec {
#loading : [
SCIMetacelloLoadSpec {
#baseline : 'FamixCallGraph',
#directory : 'src',
#registerInIceberg : true "<== This line"
}
]
}

Also, in our github action we need to be sure that the checkout action will get enough info for git bridge to run and not the minimal ammount (which is the default) adding a fetch-depth: option.

steps:
- uses: actions/checkout@v4
with:
fetch-depth: '0'

Now we need to be able to parse our project. For this, we will use a Java utility thaht is directly in Moose: FamixJavaFoldersImporter.

We can parse and receive a model doing:

model := (FamixJavaFoldersImporter importFolders: { FamixCallGraphBridge sources / 'example1' }) anyOne.

Now that we can access the model it is possible to implement our tests.

I’m starting by an abstract class:

TestCase << #FamixAbstractJavaCallGraphBuilderTestCase
slots: { #model . #graph };
package: 'Famix-CallGraph-Tests'

Now I will create a TestCase that needs my java model

FamixAbstractJavaCallGraphBuilderTestCase << #FamixJavaCHAExample1Test
slots: {};
package: 'Famix-CallGraph-Tests'

And now I will create a setup importing the model and creating a call graph:

FamixAbstractJavaCallGraphBuilderTestCase >> #setUp
super setUp.
model := (FamixJavaFoldersImporter importFolders: { self javaSourcesFolder }) anyOne.
graph := (FamixJavaCHABuilder entryPoints: self entryPoints) build
FamixJavaCHAExample1Test >> #javaSourcesFolder
"Return the java folder containing the sources to parse for those tests"
| folder |
folder := FamixCallGraphBridge sources / 'example1'.
folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].
^ folder

And now you have your model available for the testing!

I am using this technic to tests multiple projects such as parsers or call graph builders. In those projects I do touch my model and the setup can take time. So I optimize this setup in order to build a model only once for all the test case using a TestResource.

In order to do this we can remove the slots we added to FamixAbstractJavaCallGraphBuilderTestCase and create a test resource that will hold them

TestResource << #FamixAbstractJavaCallGraphBuilderTestResource
slots: { #model . #graph };
package: 'Famix-CallGraph-Tests'

Then we can move the setup to this class

FamixAbstractJavaCallGraphBuilderTestResource >> #setUp
super setUp.
model := (FamixJavaFoldersImporter importFolders: { self javaSourcesFolder }) anyOne.
graph := (FamixJavaCHABuilder entryPoints: self entryPoints) build

Personally I’m also adding a tearDown cleaning the vars because TestResources are singletons and I do not want to hold a model in memory all the time.

Then I’m creating my test resource for the example1 project.

FamixAbstractJavaCallGraphBuilderTestResource << #FamixJavaCHAExample1Resource
slots: {};
package: 'Famix-CallGraph-Tests'
FamixJavaCHAExample1Resource >> #javaSourcesFolder
"Return the java folder containing the sources to parse for those tests"
| folder |
folder := FamixCallGraphBridge sources / 'example1'.
folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].
^ folder

And now we can declare that the TestCase will use this resource:

FamixJavaCHAExample1Test class >> #resources
^ { FamixJavaCHAExample1Resource }

The model then become accessible like this:

FamixJavaCHAExample1Resource >> #model
^ self resources anyOne current model

Here is a few tricks I use to simplify even better the setting of my tests cases

The first one is to make automatic the detection of the java source folder by using the name of the test cases:

FamixAbstractJavaCallGraphBuilderTestResource >> #javaSourcesFolder
^ self class javaSourcesFolder
FamixAbstractJavaCallGraphBuilderTestResource class >> #javaSourcesFolder
"Return the java folder containing the sources to parse for those tests"
| folder |
folder := FamixCallGraphBridge sources / ((self name withoutPrefix: 'FamixJavaCHA') withoutSuffix: 'Resource') uncapitalized.
folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].
^ folder

We can now remove this method from all subclasses! But makes sure the name of your source folder matches the name of the tests ressource ;)

Automatic test resource detection and access

Section titled “Automatic test resource detection and access”

We can do the same with the detection of the test resource in the test case.

FamixAbstractJavaCallGraphBuilderTestCase class >> #resources
^ self environment
at: ((self name withoutSuffix: 'Test') , 'Resource') asSymbol
ifPresent: [ :class | { class } ]
ifAbsent: [ { } ]
FamixAbstractJavaCallGraphBuilderTestCase class >> #sourceResource
^ self resources anyOne current
FamixAbstractJavaCallGraphBuilderTestCase >> #sourceResource
"I return the instance of the test resource I'm using to build the sources of a java project"
^ self class sourceResource
FamixAbstractJavaCallGraphBuilderTestCase >> #model
^ self sourceResource model

Et voila ! Now adding a test case ready to use on a new java project is equivalent to create a test case:

FamixAbstractJavaCallGraphBuilderTestCase << #FamixJavaCHAExample2Test
slots: {};
package: 'Famix-CallGraph-Tests'

And the resource associated!

FamixAbstractJavaCallGraphBuilderTestResource << #FamixJavaCHAExample2Resource
slots: {};
package: 'Famix-CallGraph-Tests'

Nothing much.

Easily find the sources of the tested project

Section titled “Easily find the sources of the tested project”

A last thing I am doing to simplify thing is to implement a method to access easily the sources.

FamixJavaCHAExample1Test >> #openSources
<script: 'self new openSources'>
self resources anyOne javaSourcesFolder openInOSFileBrowser

It is possible to do the same thing for other languages than java but maybe not exactly in the same way than in this blogpost for the section “Parse and import your model”. But this article is meant to be an inspiration!

I hope this helps improve the robustness of our projects :)

Generation of new FAST-Language metamodel using Pharo-Tree-Sitter project

If you’re here, you’re probably interested in creating a new FAST metamodel and expanding Moose to represent the AST (Abstract Syntax Tree) of an additional language. In this post, we explain to you how to generate a “First version” of a new FAST-Language metamodel using the project Pharo-Tree-Sitter. To be able to understand that, we assume you are already familiar with:

  • Tree-Sitter
  • Pharo-Tree-Sitter
  • FAST
  • Metamodel generators
  • Tree-Sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. It is able to parse a large variety of programming languages such as Java, C++, C#, Python and many others.

  • Pharo-Tree-Sitter is a project developed in Pharo that integrates the original Tree-Sitter parsers and allows visualizing their results (such as ASTs) directly in Pharo. It relies on the FFI protocol, which requires the corresponding libraries depending on the OS (.dll, .so, or .pylib) to be present in Pharo’s VM folders. The project supports parsing several languages, and for some of them (like Python, TypeScript, and C), the library generation is automated. You can find more details in the repository’s README. This is the project that we will use to generate a new FAST-Language metamodel, so you need to download it into your Pharo image.

  • FAST means Famix AST. Contrary to Famix that represent application at a high abstraction level, FAST uses a low-level representation: the AST. FAST defines a set of traits that can be used to create new meta-models compatible with Moose tools. When developing a new FAST-Language metamodel, you will rely on these FAST traits to structure your metamodel. However, this does not apply to the “First version” described in this post, but rather to the upgraded versions when you evolve and refine it.

  • Metamodel generator is a Pharo library used to create new metamodels such as FAST-Java, Famix-Java, or FAST-Fortran. The generation of any new version of a FAST-Language metamodel can only be achieved through the metamodel generator. As you will see in this post, Pharo-Tree-Sitter enables you to define a new metamodel generator. Once executed, it produces the corresponding FAST-Language metamodel. We will explain this process in more detail in the following sections.

Download Pharo-Tree-Sitter and get the correspondent libraries

Section titled “Download Pharo-Tree-Sitter and get the correspondent libraries”

First you need to create a Moose image and download Pharo-Tree-Sitter:

Metacello new
baseline: 'TreeSitter';
repository: 'github://Evref-BL/Pharo-Tree-Sitter:main/src';
load.

Once downloaded, you need to make sure that Pharo-Tree-Sitter is able to parse the language that you intend to create the metamodel for. If it is not included, you need to follow the instructions in the readme file of this repository and add the new language. For this blog post we will assume that the language is already supported and we will continue with “Python” 🐍🐍🐍.

To be able to continue, and if this is the first time you’re using this project (Pharo-Tree-Sitter), you need to launch the tests of python in package “TreeSitter-Tests” class “TSParserPythonTest”. This is needed to launch the process of downloading the original tree-sitter and tree-sitter-python projects from GitHub, generating the correspondent libraries and moving them to the correspondent VM folder based on the image version you create: for example Moose 12. If you create another image of another version, you need to launch the tests again to make sure the libraries are again moved to the correspondent folder. Now that you have the libraries, you can parse python code and get an AST, but not FAST-Python model. So in the next step we explain how this can be possible.

Create the first version of the metamodel (FAST-Python in our example)

Section titled “Create the first version of the metamodel (FAST-Python in our example)”

Don’t worry, not too much to be done, but a snippet of code needs to be written and executed. But we have to explain to you first how it is working.

This package contains two main classes: “TSFASTBuilder” and “TSFASTImporter”. For our task we will rely on the first one. The second is used to make the transition between an AST generated by TreeSitter and a FAST-Language model.

“TSFASTBuilder” contains a set of methods responsible for generating a new metamodel generator:

  • #tsLanguage: is used to set an instance of TSLanguage, which is TSLanguage python in our case.
  • #createMetamodelGeneratorClass is responsible for creating a new package and a class inside. By default, the class name will be “FASTLanguageNameMetamodelGenerator” which is “FASTPythonMetamodelGenerator” and the package name is “FAST-LanguageName-Model-Generator”. This method also calls another one “typesToReify”, which gets all the symbols from the initial TreeSitter project (using an FFI call), and add them as slots in the class definition. These symbols represent the nodes of the language in question like “class” for Python.
  • #addPrefixMethodIn: adds #prefix method on the class side of the metamodel generator class. By default it is FASTLanguage.
  • #addPackageNameMethodIn: adds #packageName method on the class side of the metamodel generator class. By default it’s ‘FAST-Language-Model’.
  • #addSubmetamodelsMethodIn: adds #submetamodels method on the class side of the metamodel generator class, and by default it contains FASTMetamodelGenerator.
  • #addDefineClassIn: adds #defineClasses method. In this method slots are defined, starting by #entity then all the symbols imported from TreeSitter.
  • #addDefineTraitsIn: adds #defineTraits method. By default FASTTEntity trait is created.
  • #addDefineHierarchyIn: adds #defineHierarchy method. By default only #entity relation is defined with FASTTEntity.
  • #addDefineRelationsIn: adds #defineRelations method. By default only #entity relations are defined with genericChildren and genericParent.

Voilà, now that you understand how it works, we will show you how to generate one for Python:

tsb := TSFASTBuilder new.
tsb languageName: 'Python'.
tsb tsLanguage: TSLanguage python.
tsb build.

This will generate the metamodel generator. Now that the generator is created you can use it to generate the metamodel:

FASTPythonMetamodelGenerator new generate.

Now you can access the packages and classes created: ‘FAST-Python-Model’ and ‘FAST-Python-Model-Generator’.

From now on you have to handle the metamodel manually. You have to add missing traits (including FAST Traits), properties that should be imported from TreeSitter… You benefit from the importer to handle the parsing on the metamodel side. You can create a package for tools having a #parse method doing this for example:

| parser tsLanguage importer |
Smalltalk image garbageCollect.
parser := TSParser new.
tsLanguage := TSLanguage python.
parser language: tsLanguage.
importer := TSFASTImporter new.
importer tsLanguage: tsLanguage.
importer languageName: 'Python'.
importer originString: string.
^ importer import: (parser parseString: string) rootNode "pay attention to #source: "

You can check FASTTypeScript for more details.

N.B: We recommend you to parse many python examples (you can find a lot in the main project of TreeSitter-Python), using Pharo-Tree-Sitter project. Once parsed you can inspect in Pharo the properties for each node using #collectFieldNameOfNamedChild and find the properties for each one. Then you can add them in #defineRelations of the metamodel.

That’s it for now!

Visualizing java dependencies between microservices

In July, I had to analyze the dependencies between microservices for Berger-Levrault. To do so, I chose to use the Moose tool.

Here is the simple but effective process I followed.

The backend I analyzed follows a common pattern. In the git repository, there is a folder api containing the microservices, and a folder lib with resources for each microservice. There is also an additional project called lib-common.

Thus, the microservice home is composed of a project named api-home and a project named lib-home.

  • Directorysrc
    • Directoryapi
      • Directoryapi-home
        • Directorysrc/
    • Directorylib
      • Directorylib-home
        • Directorysrc/
      • Directorylib-common/

We wanted to check that dependencies were correctly implemented in the project:

  • no api project should directly depend on another api (API calls are allowed, but not classic Java dependencies)
  • each api project can depend on its equivalent lib project
  • lib projects can depend on lib-common

Let’s see how to perform this check with Moose.

To perform the analysis, I used Moose and followed these steps:

  1. I installed the latest version of Moose.
  2. I cloned the repository containing the backend to analyze.
  3. I installed the project dependencies with:
    Terminal window
    mvn clean install
  4. I used VerveineJ to generate a model of the code. To avoid version issues, I used the Docker version of VerveineJ, which gave me a model.json file:
    Terminal window
    docker run -v /path/to/my/project:/src -v /home/badetitou/.m2/repository:/dependency ghcr.io/evref-bl/verveinej:v3.3.1 -alllocals -anchor assoc -format json -o model.json
  5. I loaded the model into a Moose 12 image by drag-and-dropping the model.json file into the running Moose image.

Moose provides ready-to-use visualizations to represent dependencies. In my case, I chose to use the Architectural map. This visualization presents the entities of the model (packages, classes, methods) as a tree and displays the associations between them (i.e., the dependencies).

I first asked this visualization to display all the classes. It works, but does not allow us to distinguish the different microservices.

Unhelpful architectural map

The main problem is that too much information is displayed and we cannot see the microservices. To fix this, I used Moose’s tag feature. A tag allows you to associate a color and a name to an entity.

So I tagged the classes of my system depending on their location in the repository.

To do this, in a Moose Playground, I used the following script (adapt it to your context 😉):

model allTaggedEntities do: [ :entity | entity removeTags ].
((model allWithSubTypesOf: FamixJavaType) reject: [ :type | type sourceAnchor isNil ]) do: [ :class |
class sourceAnchor ifNotNil: [ :sa |
(sa fileName beginsWith: './services/api-A') ifTrue: [ class tagWithName: 'A' ].
(sa fileName beginsWith: './services/api-B') ifTrue: [ class tagWithName: 'B' ].
(sa fileName beginsWith: './services/api-C') ifTrue: [ class tagWithName: 'C' ].
(sa fileName beginsWith: './libraries/lib-A') ifTrue: [ class tagWithName: 'lib-A' ].
(sa fileName beginsWith: './libraries/lib-common') ifTrue: [ class tagWithName: 'lib-common' ].
(sa fileName beginsWith: './libraries/lib-B') ifTrue: [ class tagWithName: 'lib-B' ].
(sa fileName beginsWith: './libraries/lib-C') ifTrue: [ class tagWithName: 'lib-C' ].
]
].
(model allWithSubTypesOf: FamixJavaType) reject: [ :type | type tags isEmpty ]

The result is not perfect yet because entities are not grouped by tag. To fix this, simply select the tag to add option in the architectural map settings.

Correct architectural map

You then get a clear visualization of the links between the microservice projects and the libraries they use. We see that no api is linked to an incorrect lib project. We also notice that microservice B is linked to lib-B as well as lib-common. Maybe this link to lib-common should be removed? But that’s another story…

Building a Famix importer with TreeSitterFamixIntegration

Analyzing source code starts with parsing and for this you need semantic understanding of how symbols in the code relate to each other. In this post, we’ll walk through how to build a C code importer using the TreeSitterFamixIntegration framework.

  • Basic knowledge of Famix and Moose.
  • Basic knowledge of what Tree-sitter is.
  • Familiarity with the Visitor design pattern. You can check this blog post which explains the Visitor pattern in the context of tree-sitter ASTs.

The TreeSitterFamixIntegration stack provides tools to ease the development of Famix importers using tree-sitter. This package offers some great features for parsing such as (but not limited to):

  • Useful methods for source management (getting source text, positions, setting sourceAnchor of a famix entity).
  • Error handling to help catch and report parsing issues
  • a better TreeSitter node inspector (which is very helpful when debugging)
  • Utility to efficiently import and attach single-line and multi-line comments to their corresponding entities.
  • Context tracking for symbol scope (no more context push and pop 😁)

There is a detailed documentation you can check that explain every features.

After creating a new Moose image, let’s start by loading the necessary packages.

First, we need to load the C metamodel. This metamodel provides the Famix classes that represent C entities such as functions, structs, variables, etc.

Metacello new
baseline: 'FamixCpp';
repository: 'github://moosetechnology/Famix-Cpp:main';
load

Next, we need to load the TreeSitterFamixIntegration project. It provides both pharo-tree-sitter and SRSymbolResolver.

Metacello new
githubUser: 'moosetechnology' project: 'TreeSitterFamixIntegration' commitish: 'main' path: 'src';
baseline: 'TreeSitterFamixIntegration';
load

Now that we have the necessary packages loaded, we can create our C importer.

Create a new package named Famix-C-Importer.

The minimum classes we will have to create inside are:

  • FamixCimporter: This class will be responsible for importing C files and parsing them using Tree-sitter.
  • FamixCVisitor: This class will walk through the parsed C syntax tree and create Famix entities.
  • FamixCCommentVisitor: This class will handle comments and attach them to the corresponding Famix entities.

The FamixCimporter class is the entry point for our importer. It will handle the parsing of C files into Abstract Syntax Trees (AST).

This class will inherit from FamixTSAbstractImporter (defined in the TreeSitterFamixIntegration project), which provides the necessary methods for importing and parsing C files using Tree-sitter.

FamixTSAbstractImporter << #FamixCImporter
slots: {};
package: 'Famix-C-Importer'

Now, let’s override some methods to set up our importer:

FamixCImporter >> treeSitterLanguage
"Should return a TreeSitter language such as TSLanguage python"
^ TSLanguage cLang

This method returns the Tree-sitter language we want to use for parsing. In this case, we are using the C language. You can find the available languages in the Pharo-Tree-Sitter package.

FamixCImporter >> visitorClass
^ FamixCVisitor

It returns the visitor class that will walk through the parsed syntax tree and create Famix entities. We will define this class later.

FamixCImporter >> importFileReference: aFileReference
aFileReference isFile
ifTrue: [
(self isCFile: aFileReference) ifFalse: [ ^ self ].
self importFile: aFileReference
]
ifFalse: [
aFileReference children do: [ :each |
self importFileReference: each
].
]

This method calls importFile: on all C files recursively found in a directory. We will add more logic to this method later but for now, it serves as a starting point for our importer.

The isCFile: method checks if the file has a .c or .h extension.

FamixCImporter >> isCFile: aFileReferencemon
^ #( 'c' 'h' ) includes: aFileReference extension

The importFile: method is defined in the FamixTSAbstractImporter class (provided by the TreeSitter-Famix-Integration project). It parses the file content to create an AST and then passes the visitor (the FamixCVisitor that we previously defined) to walk through the AST.

The FamixCVisitor class is responsible for walking through the parsed AST and creating Famix entities. It will inherit from FamixTSAbstractVisitor, which provides the necessary methods for visiting Tree-sitter nodes.

FamixTSAbstractVisitor << #FamixCVisitor
slots: {};
package: 'Famix-C-Importer'

For this class, we will just need to override one method:

FamixCVisitor >> modelClass
^ FamixCModel

It returns the Famix metamodel class that will be used to create Famix entities. In this case, we are using FamixCModel which is in the Famix-Cpp package.

Now that we have our importer and visitor classes set up, we can already test it. To test our importer, we can create a simple C file and import it using the FamixCImporter class.

test.c
#include <stdio.h>
int aGlobalVar = 1;
int main() {
int aLocalVar;
aLocalVar = aGlobalVar + 2;
}

To import this file, we can use the following code in the Playground (cmd + O + P to open it):

import c project

Before running the above code, open the Transcript to see the logs (cmd + O + T to open it).

Then select all the code and run it by inspecting it (cmd + I or click the “Inspect” button). You will get something similar to this.

Model inspector

The above screenshot shows what is inside our model. We can see that there is pretty much nothing there yet apart from the SourceLanguages which is added by default by TreeSitterFamixIntegration.

Now if we look at the Transcript, we can see that the importer has imported the file but we didn’t implement the visitor methods yet for every node in the AST, so no Famix entities were created.

transcript log

If you want to inspect the corresponding AST of our test file, you can do something similar to what is in this other blog post on tree-sitter.

translation unit AST

In this section we are going to see some examples of visiting methods for creating compilation unit and function entities.

Let’s go back to our FamixCImporter class and from there we will create a CompilationUnit and HeaderFile entities. We need to do that there because we have to check if the file is a header file or a source file.

FamixCImporter >> importFileReference: aFileReference
aFileReference isFile
ifTrue: [
| fileEntity |
(self isCFile: aFileReference) ifFalse: [ ^ self ].
fileEntity := aFileReference extension = 'c'
ifTrue: [
visitor model newCompilationUnitNamed: aFileReference basename.
]
ifFalse: [
visitor model newHeaderFileNamed: aFileReference basename.
].
visitor
useCurrentEntity: fileEntity
during: [ self importFile: aFileReference ] ]
ifFalse: [
aFileReference children do: [ :each |
self importFileReference: each
].
^ self ]

We use the useCurrentEntity:during: to provide a context for the visitor. This is same as pushing the fileEntity to a context, visit children and then popping it from the context. And it will set the current entity to the fileEntity.

Now try importing a whole directory containing C files. You should see that the importer creates a FamixCHeaderFile for each header file and a FamixCCompilationUnit for each source file.

To set the source anchor for any Famix entity, we can use the setSourceAnchor: aFamixEntity from: aTSNode method provided by the FamixTSAbstractVisitor class. This method takes a Famix entity and a Tree-sitter node.

We can use it to set the source anchor for our fileEntity . Go to visitTranslationUnit: in the FamixCVisitor class and add the following code:

FamixCVisitor >> visitTranslationUnit: aNode
self setSourceAnchor: self currentEntity from: aNode.
self visitChildren: aNode "for not cutting the traversal"

Now if we import our test.c file again, we will see that the CompilationUnit entity has a source anchor.

Next, we will create FamixCFunction entities for each function declaration in the C file. We will do this in the visitFunctionDefinition: method of the FamixCVisitor class.

But first we need to know where the function name is located to create the FamixCFunction entity. Create the method and put a halt there to inspect the node.

visitFunctionDefinition: aNode
self halt.
self visitChildren: aNode.

function definition ast

If we look at the function definition node, we can see that the function name is in the identifier node, which is a child of the function declarator node.

To get that name, there are two ways:

  • visit the function_declarator until the identifier returns its name using self visit: aNode
  • get it by child field name using aNode _fieldName that returns the child node with the given field name. And you don’t need to implement the _fieldName method because it is already handled by the framework.

For simplicity, and to show other available features in the framework, we will use the second way.

Let’s inspect the function definition node to see what fields it has.

function definition fields

So if we do aNode _declarator it will return the function declarator node

function declarator And if we do aNode _declarator from the function_declarator it will give us the identifier that we want.

Now we can create the function entity and set its name and source anchor.

visitFunctionDefinition: aNode
| declaratorNode identifierNode functionName entity |
declaratorNode := aNode _declarator.
identifierNode := declaratorNode _declarator.
functionName := identifierNode sourceText.
entity := (model newFunctionNamed: functionName) functionOwner: self currentEntity.
self setSourceAnchor: entity from: aNode.
self useCurrentEntity: entity during: [ self visitChildren: aNode ]

The self currentEntity returns the compilation unit entity which is the parent of the function entity.

And before visiting the children, we set the current entity to the newly created function entity using useCurrentEntity:during:. This will allow us to create other entities that are related to this function, such as parameters and local variables.

The difference between local and global variables is that local variables are declared inside a function, while global variables are declared outside any function.

To create the variable entities, we will create the visitDeclaration: method in the FamixCVisitor class. This method is called for each variable declaration in the C file.

FamixCVisitor >> visitDeclaration: aNode
"fields: type - declarator"
| varName entity |
self visit: aNode _type.
varName := self visit: aNode _declarator.
entity := self currentEntity isFunction
ifTrue: [
(model newLocalVariableNamed: varName)
parentBehaviouralEntity: self currentEntity;
yourself ]
ifFalse: [
(model newGlobalVariableNamed: varName)
parentScope: self currentEntity;
yourself ].
self setSourceAnchor: entity from: aNode.

The visitDeclaration: method does the following:

  1. Visits the variable’s type. This will allow us to parse its type information.
  2. Retrieves the variable name by visiting the declarator field. If the variable is initialized, this will be an init_declarator node; otherwise, it will be an identifier. We should implement visit methods for both cases to extract the name correctly.
FamixCVisitor >> visitInitDeclarator: aNode
"fields: declarator - value"
self visit: aNode _value.
^ self visit: aNode _declarator "variable name is in the declarator node"
FamixCVisitor >> visitIdentifier: aNode
^ aNode sourceText "returns the name of the variable"
  1. Creates a variable entity, either a local variable or a global variable, depending on whether the current entity is a function or not.
  2. Sets the source anchor for the variable entity using the setSourceAnchor:from: method.

In this section, we will implement the symbol resolution for our C importer. This will allow us to resolve references to variables and functions in our C code.

As an example, we will resolve the reference to the local variable aLocalVar in the main function, which will be represented as a famix write access entity.

To create the write access entity, we will implement the visitAssignmentExpression: method in the FamixCVisitor class. This method is called for each assignment expression.

visitAssignmentExpression: aNode
"fields: left - right"
| access leftVarName |
leftVarName := self visit: aNode _left.
access := model newAccess accessor: self currentEntity;
isWrite: true;
yourself.
self setSourceAnchor: access from: aNode.

Add the following code to the visitAssignmentExpression: method to resolve the variable:

visitAssignmentExpression: aNode
"fields: left - right"
| access leftVarName |
leftVarName := self visit: aNode _left.
access := model newAccess accessor: self currentEntity;
isWrite: true;
yourself.
self setSourceAnchor: access from: aNode.
self
resolve: ((SRIdentifierResolvable identifier: leftVarName)
expectedKind: {
FamixCLocalVariable.
FamixCGlobalVariable };
yourself)
foundAction: [ :variable :currentEntity | access variable: variable ].

The resolve: aResolvable foundAction: aBlockClosure method is provided by the FamixTSAbstractVisitor class.

It takes two arguments:

  1. aResolvable: an instance of SRIdentifierResolvable. This resolvable is created with the identifier (the variable name) and the expected kinds of entities (in this case, either a local variable or a global variable). The identifier: method sets the identifier to resolve, and the expectedKind: method sets the expected kinds of entities that can be resolved.
  2. aBlockClosure: a block that will be executed when the resolvable is resolved (we found the variable). In this case we set the variable of the access entity to the resolved variable.

See the SRIdentifierResolvable documentation

The SRIdentifierResolvable is a generic resolver that can be used to resolve identifiers. However, in some cases, we may need to create a custom resolver to handle specific cases. In that case, we can create a class that inherits from SRResolvable and override the resolveInScope:currentEntity: method to implement our custom resolution logic.

For more information about the symbol resolver, you can check the documentation.

The TreeSitterFamixIntegration package provides a utility to parse comments and attach them to the corresponding Famix entities. This is done using the FamixCCommentVisitor class.

let’s add some comments to our test.c file:

test.c
#include <stdio.h>
int aGlobalVar = 1;
/*
entry point of our programm
*/
int main() {
int aLocalVar; // a local variable
aLocalVar = aGlobalVar + 2;
}

To parse comments, we will create the FamixCCommentVisitor class that will inherit from FamixTSAbstractCommentVisitor. And we just need to override the visitNode: method.

FamixCCommentVisitor >> visitNode: aNode
aNode type = #comment ifTrue: [
(aNode sourceText beginsWith: '/*')
ifTrue: [ self addMultilineCommentNode: aNode ]
ifFalse: [ self addSingleLineCommentNode: aNode ] ].
super visitNode: aNode

We use the addMultilineCommentNode: and addSingleLineCommentNode: methods provided by the FamixTSAbstractCommentVisitor class to add the comment to the model.

For a detailed explanation of how to use the comment visitor, you can check the documentation.

Last thing to do is to use the comment visitor somewhere in our importer. We can do that everytime we finish visiting every children of translation unit node.

FamixCImporter >>visitTranslationUnit: aNode
self setSourceAnchor: self currentEntity from: aNode.
self visitChildren: aNode.
FamixCCommentVisitor visitor: self importCommentsOf: aNode

We use the visitor: aFamixVisitor importCommentsOf: aNode method to import the comments of the translation unit node.

In this blog post, we have seen how to build a Famix importer for C code using the TreeSitterFamixIntegration framework. We have covered the following topics:

  • Setting up the environment and creating the importer and visitor classes.
  • Creating Famix entities for compilation units, functions, and variables.
  • Implementing symbol resolution for local and global variables.
  • Parsing comments and attaching them to the corresponding Famix entities.

This is just a starting point for building an importer with this stack. You have to implement more tests and methods to handle other entities. The TreeSitterFamixIntegration framework provides a lot of other utilities we didn’t cover to help you with that.

Parametrics next generation

How do we represent the relation between a generic entity, its type parameters and the entities that concretize it? The Famix metamodel has evolved over the years to improve the way we represent these relations. The last increment is described in a previous blogpost. We present here a new implementation that eases the management of parametric entities in Moose.

The major change between this previous version and the new implementation presented in this post is this: We do not represent the parameterized entities anymore.

What’s wrong with the previous parametrics implementation?

Section titled “What’s wrong with the previous parametrics implementation?”

Difference between parametric and non-parametric entities

Section titled “Difference between parametric and non-parametric entities”

The major issue with the previous implementation was the difference between parametric and non-parametric entities in practice, particularly when trying to trace the inheritance tree. Here is a concrete example: getting the superclass of the superclass of a class.

  • For a non-parametric class, the sequence is straightforward: ask the inheritance for the superclass, repeat.

Getting super inheritances - Non-parametric entities.

  • For a parametric class (see the little code snippet below), there was an additional step, navigating through the concretization:
import java.util.ArrayList; "public class ArrayList<E> { /* ... */ }"
public MySpecializedList extends ArrayList<String> {}

Getting super inheritances - Parametric entities.

This has caused many headaches to developers who wanted to browse a hierarchy: how do we keep track of the full hierarchy when it includes parametric classes? How to manage both situations without knowing if the classes will be parametric or not? The same problem occurred to browse the implementations of parametric interfaces and the invocations of generic methods.

The previous implementation naming choices were a little complex to grasp and did not match the standard vocabulary, especially in Java:

  • A type parameter was named a ParameterType
  • A type argument was named a ConcreteParameterType

Each time there was a concretization, a parametric entity was created. This created duplicates of virtually the same entity: one for the generic entity and one for each parameterized entity. Let’s see an example:

public MyClass implements List<Float> {
public List<Integer> getANumber() {
List<Number> listA;
List<Integer> listB;
}
}

For the interface List<E>, we had 6 parametric interfaces:

  • One was the generic one: #isGeneric >>> true
  • 3 were the parameterized interfaces implemented by ArrayList<E>, its superclass AbstractList<E> and MyClass. They were different because the concrete types were different: E from ArrayList<E>, E from AbstractList<E>and Float.
  • 2 were declared types: List<Number> and List<Integer>.

When deciding of a new implementation, our main goal was to create a situation in which the dependencies would work in the same way for all entities, parametric or not. That’s where we introduce parametric associations. These associations only differ from standard associations by one property: they trigger a concretization.

Here is the new Famix metamodel traits that represent concretizations:

Class diagram for Parametric Associations

There is a direct relation between a parametric entity and its type parameters. A concretization is the association between a type parameter and the type argument that replaces it. A parametric association triggers one or several concretizations, according to the number of type parameters the parametric entity has. Example: a parametric association that targets Map<K,V> will trigger 2 concretizations.

The parametric entity is the target of the parametric association. It is always generic. As announced, we do not represent parameterized entities anymore. Coming back to the entities’ duplication example above, we now represent only 1 parametric interface for List<E>and it is the target of the 5 parametric associations.

This metamodel evolution is the occasion of another major change: the replacement of the direct relation between a typed entity and its type. This new association is called Entity typing.

Class diagram for Entity Typing

The choice to replace the existing relation by a reified association is made to represent the dependency in coherence with the rest of the metamodel.

With this new association, we can now add parametric entity typings.

In a case like this:

public ArrayList<String> myAttribute;

we have an “entity typing” association between myAttribute and ArrayList. This association is parametric: it triggers the concretization of E in ArrayList<E> by String.

Type parameters can be bounded:

public class MyParametricClass<T extends Number> {}

In the previous implementation, the bounds of type parameters were implemented as inheritances: in the example above, Number would be the superclass of T. Since this change, bounds were introduced for wildcards. We have now the occasion to also apply them to type parameters. In the new implementation, Number is the upper bound of T.

This diagram sums up the new parametrics implementation in Famix traits and Java metamodel. Please note that this is not the full Java metamodel but only a relevant part.

Class diagram for all changes

Should Concretization really be an association?

Section titled “Should Concretization really be an association?”

The representation of parametric entities is a challenge that will most likely continue as Famix evolves. The next question will probably be this one: should Concretization really be an association? An association is the reification of a dependency. Yet, there is no dependency between a type argument and the type parameter it replaces. Each can exist without the other. The dependency is in fact between the source of the parametric association and the type parameter.

With one of our previous examples:

public MySpecializedList extends ArrayList<String> {}

MySpecializedList has a superclass (ArrayList<E>) and also depends on String, as a type argument. However, String does not depend on E neither E on String.

The next iteration of the representation of parametric entities will probably cover this issue. Stay tuned!