Skip to content

Blog

Introducing Java initializers

In Java, we can define behavior that is executed exclusively at the initialization of an instance. For now, our metamodel represented these behaviors as methods. This evolution represents them as Initializers.

We consider as initializers the following elements:

  • Constructors: they are called when creating a new instance. When a constructor is called, if no explicit call is defined, it implicitly calls the default no-argument constructor, that calls the no-argument constructor in the superclass. We do not represent implicit constructors and these invocations.
  • Initialization blocks: blocks that are executed when a new instance is created. They are copied by the Java compiler into each constructor and avoid code duplication. We do not represent this implicit invocation.
  • In Famix: the <Initializer> method: we create a method to hold all attribute definitions in a type.

The main motivation for this change is to adapt the metamodel to the needs of building call graphs. Call graphs must be able to create the implicit invocations described above and to distinguish between the 3 types of initializers.

Another motivation is to differentiate between initializers and actual methods. In analyses, we often need to focus on methods and initializers can add noise when treated as actual methods, especially the <Initializer> method.

We introduce FamixJavaInitializer, a subclass of FamixJavaMethod. An Initiliazer has 2 properties:

  • #isInitializationBlock: boolean, false by default.
  • #isConstructor: boolean, derived. In java, a constructor is an initializer with the same name as its parent type, with no declared type (or void as declared type).

We do not merge all initializers as we did before, but we still merge similar initializers, that will always be called together. In a Java model, a type (TWithMethods) can define a maximum of 4 initializers (besides constructors):

  • An instance initialization block, that is the merge of all instance initialization blocks.
  • An instance-side <Initializer> method, similar to the one we created before.
  • A static initialization block, merge of all static initialization blocks (isClassSideis true).
  • A static <Initializer>method (isClassSideis true) for static attributes definition.

When inspecting a Java model, initializers can now be found under Initializers and Model initializers.

Speed up models creation: application to JSON/MSE parsing

In order to be able to work with Moose there is a prerequisite we cannot avoid: we need a model to analyze. This can be archieved in 2 principal ways:

  • Importing an existing JSON/MSE file containing a model
  • Importing a model via a Moose importer such as the Pharo importer or Python importer

While doing this, we create a lot of entities and set a lot of relations. But this can take some time. I found out that this time was even bigger than I anticipated while profiling a JSON import.

Here is the result of the profiling of a JSON of 330MB on a Macbook pro M1 from 2023:

Image of a profiling

Form this profiling we can see that we spend 351sec for this import. We can find more information in this report:

Image of a profiling 2

On this screenshot we can see some noise due to the fact that the profiler was not adapted to the new event listening loop of Pharo. But in the leaves we can also see that most of the time is spent in FMSlotMultivaluedLink>>#indexOf:startingAt:ifAbsent:.

This is used by a mecanism of all instance variables that are FMMany because those we do not want duplicated elements. Thus, we check if the collection contains the element before adding it.

But during the import of a JSON file, we should have no duplicates making this check useless. This also explains why we spend so much time in this method: we always are in the worst case scenario: there is no element matching.

In order to optimize the creation of a model when we know we will not create any duplicates, we can disable the check.

For this, we can use a dynamic variable declaring that we should check for duplicated elements by default, but allowing to disable the check during the execution of some code.

DynamicVariable << #FMShouldCheckForDuplicatedEntitiesInMultivalueLinks
slots: {};
tag: 'Utilities';
package: 'Fame-Core'
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks>>#default
^ true

And now that we have the variable, we can use it:

FMSlotMultivalueLink >> unsafeAdd: element
(self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ]
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value
ifTrue: [ (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ] ]
ifFalse: [ self uncheckUnsafeAdd: element ]
FMMultivalueLink >> unsafeAdd: element
(self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ]
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value
ifTrue: [ (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ] ]
ifFalse: [ self uncheckUnsafeAdd: element ]

And the last step is to disable the check during the MSE/JSON parsing:

FMMSEParser >> basicRun
self Document.
self atEnd ifFalse: [ ^ self syntaxError ]
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [
self Document.
self atEnd ifFalse: [ ^ self syntaxError ] ]

Now let’s try to import the same JSON file with the optiwization enabled:

Image of a profiling

Image of a profiling 2

We can see that the import time went from 351sec to 113sec!

We can also notice that we do not have one bottleneck in our parsing. This means that it will be harder to optimize more this task (even if some people still have some ideas on how to do that).

This optimization has been made for the import of JSON but it can be used in other contexts. For example, in the Moose Python importer, the implementation is sure to never produce a duplicate. Thus, we could use the same trick this way:

FamixPythonImporter >> import
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [ super import ]

Testing your algo on a java project

When developping algorithm on top of the Moose platform, we can easily hurt a wall during testing.

To do functional (and sometimes unit) testing, we need to work on a Moose model. Most of the time we are getting this model in two ways:

  • We produce a model and save the .json to recreate this model in the tests
  • We create a model by hand

But those 2 solutions have drawbacks:

  • Keeping a JSON will not follow the evolutions of Famix and the model produce will not be representative of the last version of Famix
  • Creating a model by hand has the drawback of taking the risk that this model will not be representative of what we could manipulate in reality. For example, we might not think about setting the stubs or the source anchors

In order to avoid those drawbacks I will describe my way of managing such testing cases in this article. In order to do this, I will explain how I set up the tests of a project to build CallGraph of Java projects.

The idea I had for testing callgraphs is to implement real java projects in a resources folder in the git of the project. Then, we can parse them when launching the tests and manipulate the produced model. This would ensure that we always have a model up to date with the latest version of Famix. If tests breaks, this means that our famix model evolved and that our project does not work anymore for this language.

Parse the project
Parse the project
Create java project
Create java project
Import the model
Import the model
Run tests on the model
Run tests on the model
Text is not SVG - cannot display

The first step to build tests is to write some example java code.

I will start with a minimal example:

public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}

I’ll save this file in the git repository of my project under Famix-CallGraph/resources/sources/example1/Main.java.

Now that we have the source code, we need a way to access it in our project.

In order to access our resources, we will use GitBrigde.

You can install it by executing:

Metacello new
githubUser: 'jecisc' project: 'GitBridge' commitish: 'v1.x.x' path: 'src';
baseline: 'GitBridge';
load

But we should add it to our baseline:

BaselineOfFamixCallGraph >> #gitBridge: spec
spec baseline: 'GitBridge' with: [ spec repository: 'github://jecisc/GitBridge:v1.x.x/src' ]
BaselineOfFamixCallGraph >> #baseline: spec
<baseline>
spec for: #common do: [
"Dependencies"
self gitBridge: spec.
"Packages"
spec
package: 'Famix-CallGraph';
package: 'Famix-CallGraph-Tests' with: [ spec requires: #( 'Famix-CallGraph' 'GitBridge' ) ]. "<== WE ADD GITBRIDGE HERE!"
].
spec for: #NeedsFamix do: [
self famix: spec.
spec package: 'Famix-CallGraph' with: [ spec requires: #( Famix ) ] ]

Now that we have the dependency running, we can use this project. We will explain the minimal steps here but you can find the full documantation here.

The usage of GitBridge begins with the definition of our FamixCallGraphBridge:

GitBridge << #FamixCallGraphBridge
slots: {};
package: 'Famix-CallGraph-Tests'

Now that this class exists we can access our git folder using FamixCallGraphBridge current root.

Let’s add some syntactic suggar:

FamixCallGraphBridge class >> #resources
^ self root / 'resources'
FamixCallGraphBridge class >> #sources
^ self resources / 'sources'

We can now access our java projects doing FamixCallGraphBridge current sources.

This step is almost done, but in order for our tests to work in a github action (for example), we need two little tweaks.

In our smalltalk.ston file, we need to register our project in Iceberg (because GitBridge uses Iceberg to access the root folder).

SmalltalkCISpec {
#loading : [
SCIMetacelloLoadSpec {
#baseline : 'FamixCallGraph',
#directory : 'src',
#registerInIceberg : true "<== This line"
}
]
}

Also, in our github action we need to be sure that the checkout action will get enough info for git bridge to run and not the minimal ammount (which is the default) adding a fetch-depth: option.

steps:
- uses: actions/checkout@v4
with:
fetch-depth: '0'

Now we need to be able to parse our project. For this, we will use a Java utility thaht is directly in Moose: FamixJavaFoldersImporter.

We can parse and receive a model doing:

model := (FamixJavaFoldersImporter importFolders: { FamixCallGraphBridge sources / 'example1' }) anyOne.

Now that we can access the model it is possible to implement our tests.

I’m starting by an abstract class:

TestCase << #FamixAbstractJavaCallGraphBuilderTestCase
slots: { #model . #graph };
package: 'Famix-CallGraph-Tests'

Now I will create a TestCase that needs my java model

FamixAbstractJavaCallGraphBuilderTestCase << #FamixJavaCHAExample1Test
slots: {};
package: 'Famix-CallGraph-Tests'

And now I will create a setup importing the model and creating a call graph:

FamixAbstractJavaCallGraphBuilderTestCase >> #setUp
super setUp.
model := (FamixJavaFoldersImporter importFolders: { self javaSourcesFolder }) anyOne.
graph := (FamixJavaCHABuilder entryPoints: self entryPoints) build
FamixJavaCHAExample1Test >> #javaSourcesFolder
"Return the java folder containing the sources to parse for those tests"
| folder |
folder := FamixCallGraphBridge sources / 'example1'.
folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].
^ folder

And now you have your model available for the testing!

I am using this technic to tests multiple projects such as parsers or call graph builders. In those projects I do touch my model and the setup can take time. So I optimize this setup in order to build a model only once for all the test case using a TestResource.

In order to do this we can remove the slots we added to FamixAbstractJavaCallGraphBuilderTestCase and create a test resource that will hold them

TestResource << #FamixAbstractJavaCallGraphBuilderTestResource
slots: { #model . #graph };
package: 'Famix-CallGraph-Tests'

Then we can move the setup to this class

FamixAbstractJavaCallGraphBuilderTestResource >> #setUp
super setUp.
model := (FamixJavaFoldersImporter importFolders: { self javaSourcesFolder }) anyOne.
graph := (FamixJavaCHABuilder entryPoints: self entryPoints) build

Personally I’m also adding a tearDown cleaning the vars because TestResources are singletons and I do not want to hold a model in memory all the time.

Then I’m creating my test resource for the example1 project.

FamixAbstractJavaCallGraphBuilderTestResource << #FamixJavaCHAExample1Resource
slots: {};
package: 'Famix-CallGraph-Tests'
FamixJavaCHAExample1Resource >> #javaSourcesFolder
"Return the java folder containing the sources to parse for those tests"
| folder |
folder := FamixCallGraphBridge sources / 'example1'.
folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].
^ folder

And now we can declare that the TestCase will use this resource:

FamixJavaCHAExample1Test class >> #resources
^ { FamixJavaCHAExample1Resource }

The model then become accessible like this:

FamixJavaCHAExample1Resource >> #model
^ self resources anyOne current model

Here is a few tricks I use to simplify even better the setting of my tests cases

The first one is to make automatic the detection of the java source folder by using the name of the test cases:

FamixAbstractJavaCallGraphBuilderTestResource >> #javaSourcesFolder
^ self class javaSourcesFolder
FamixAbstractJavaCallGraphBuilderTestResource class >> #javaSourcesFolder
"Return the java folder containing the sources to parse for those tests"
| folder |
folder := FamixCallGraphBridge sources / ((self name withoutPrefix: 'FamixJavaCHA') withoutSuffix: 'Resource') uncapitalized.
folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].
^ folder

We can now remove this method from all subclasses! But makes sure the name of your source folder matches the name of the tests ressource ;)

Automatic test resource detection and access

Section titled “Automatic test resource detection and access”

We can do the same with the detection of the test resource in the test case.

FamixAbstractJavaCallGraphBuilderTestCase class >> #resources
^ self environment
at: ((self name withoutSuffix: 'Test') , 'Resource') asSymbol
ifPresent: [ :class | { class } ]
ifAbsent: [ { } ]
FamixAbstractJavaCallGraphBuilderTestCase class >> #sourceResource
^ self resources anyOne current
FamixAbstractJavaCallGraphBuilderTestCase >> #sourceResource
"I return the instance of the test resource I'm using to build the sources of a java project"
^ self class sourceResource
FamixAbstractJavaCallGraphBuilderTestCase >> #model
^ self sourceResource model

Et voila ! Now adding a test case ready to use on a new java project is equivalent to create a test case:

FamixAbstractJavaCallGraphBuilderTestCase << #FamixJavaCHAExample2Test
slots: {};
package: 'Famix-CallGraph-Tests'

And the resource associated!

FamixAbstractJavaCallGraphBuilderTestResource << #FamixJavaCHAExample2Resource
slots: {};
package: 'Famix-CallGraph-Tests'

Nothing much.

Easily find the sources of the tested project

Section titled “Easily find the sources of the tested project”

A last thing I am doing to simplify thing is to implement a method to access easily the sources.

FamixJavaCHAExample1Test >> #openSources
<script: 'self new openSources'>
self resources anyOne javaSourcesFolder openInOSFileBrowser

It is possible to do the same thing for other languages than java but maybe not exactly in the same way than in this blogpost for the section “Parse and import your model”. But this article is meant to be an inspiration!

I hope this helps improve the robustness of our projects :)

Generation of new FAST-Language metamodel using Pharo-Tree-Sitter project

If you’re here, you’re probably interested in creating a new FAST metamodel and expanding Moose to represent the AST (Abstract Syntax Tree) of an additional language. In this post, we explain to you how to generate a “First version” of a new FAST-Language metamodel using the project Pharo-Tree-Sitter. To be able to understand that, we assume you are already familiar with:

  • Tree-Sitter
  • Pharo-Tree-Sitter
  • FAST
  • Metamodel generators
  • Tree-Sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. It is able to parse a large variety of programming languages such as Java, C++, C#, Python and many others.

  • Pharo-Tree-Sitter is a project developed in Pharo that integrates the original Tree-Sitter parsers and allows visualizing their results (such as ASTs) directly in Pharo. It relies on the FFI protocol, which requires the corresponding libraries depending on the OS (.dll, .so, or .pylib) to be present in Pharo’s VM folders. The project supports parsing several languages, and for some of them (like Python, TypeScript, and C), the library generation is automated. You can find more details in the repository’s README. This is the project that we will use to generate a new FAST-Language metamodel, so you need to download it into your Pharo image.

  • FAST means Famix AST. Contrary to Famix that represent application at a high abstraction level, FAST uses a low-level representation: the AST. FAST defines a set of traits that can be used to create new meta-models compatible with Moose tools. When developing a new FAST-Language metamodel, you will rely on these FAST traits to structure your metamodel. However, this does not apply to the “First version” described in this post, but rather to the upgraded versions when you evolve and refine it.

  • Metamodel generator is a Pharo library used to create new metamodels such as FAST-Java, Famix-Java, or FAST-Fortran. The generation of any new version of a FAST-Language metamodel can only be achieved through the metamodel generator. As you will see in this post, Pharo-Tree-Sitter enables you to define a new metamodel generator. Once executed, it produces the corresponding FAST-Language metamodel. We will explain this process in more detail in the following sections.

Download Pharo-Tree-Sitter and get the correspondent libraries

Section titled “Download Pharo-Tree-Sitter and get the correspondent libraries”

First you need to create a Moose image and download Pharo-Tree-Sitter:

Metacello new
baseline: 'TreeSitter';
repository: 'github://Evref-BL/Pharo-Tree-Sitter:main/src';
load.

Once downloaded, you need to make sure that Pharo-Tree-Sitter is able to parse the language that you intend to create the metamodel for. If it is not included, you need to follow the instructions in the readme file of this repository and add the new language. For this blog post we will assume that the language is already supported and we will continue with “Python” 🐍🐍🐍.

To be able to continue, and if this is the first time you’re using this project (Pharo-Tree-Sitter), you need to launch the tests of python in package “TreeSitter-Tests” class “TSParserPythonTest”. This is needed to launch the process of downloading the original tree-sitter and tree-sitter-python projects from GitHub, generating the correspondent libraries and moving them to the correspondent VM folder based on the image version you create: for example Moose 12. If you create another image of another version, you need to launch the tests again to make sure the libraries are again moved to the correspondent folder. Now that you have the libraries, you can parse python code and get an AST, but not FAST-Python model. So in the next step we explain how this can be possible.

Create the first version of the metamodel (FAST-Python in our example)

Section titled “Create the first version of the metamodel (FAST-Python in our example)”

Don’t worry, not too much to be done, but a snippet of code needs to be written and executed. But we have to explain to you first how it is working.

This package contains two main classes: “TSFASTBuilder” and “TSFASTImporter”. For our task we will rely on the first one. The second is used to make the transition between an AST generated by TreeSitter and a FAST-Language model.

“TSFASTBuilder” contains a set of methods responsible for generating a new metamodel generator:

  • #tsLanguage: is used to set an instance of TSLanguage, which is TSLanguage python in our case.
  • #createMetamodelGeneratorClass is responsible for creating a new package and a class inside. By default, the class name will be “FASTLanguageNameMetamodelGenerator” which is “FASTPythonMetamodelGenerator” and the package name is “FAST-LanguageName-Model-Generator”. This method also calls another one “typesToReify”, which gets all the symbols from the initial TreeSitter project (using an FFI call), and add them as slots in the class definition. These symbols represent the nodes of the language in question like “class” for Python.
  • #addPrefixMethodIn: adds #prefix method on the class side of the metamodel generator class. By default it is FASTLanguage.
  • #addPackageNameMethodIn: adds #packageName method on the class side of the metamodel generator class. By default it’s ‘FAST-Language-Model’.
  • #addSubmetamodelsMethodIn: adds #submetamodels method on the class side of the metamodel generator class, and by default it contains FASTMetamodelGenerator.
  • #addDefineClassIn: adds #defineClasses method. In this method slots are defined, starting by #entity then all the symbols imported from TreeSitter.
  • #addDefineTraitsIn: adds #defineTraits method. By default FASTTEntity trait is created.
  • #addDefineHierarchyIn: adds #defineHierarchy method. By default only #entity relation is defined with FASTTEntity.
  • #addDefineRelationsIn: adds #defineRelations method. By default only #entity relations are defined with genericChildren and genericParent.

Voilà, now that you understand how it works, we will show you how to generate one for Python:

tsb := TSFASTBuilder new.
tsb languageName: 'Python'.
tsb tsLanguage: TSLanguage python.
tsb build.

This will generate the metamodel generator. Now that the generator is created you can use it to generate the metamodel:

FASTPythonMetamodelGenerator new generate.

Now you can access the packages and classes created: ‘FAST-Python-Model’ and ‘FAST-Python-Model-Generator’.

From now on you have to handle the metamodel manually. You have to add missing traits (including FAST Traits), properties that should be imported from TreeSitter… You benefit from the importer to handle the parsing on the metamodel side. You can create a package for tools having a #parse method doing this for example:

| parser tsLanguage importer |
Smalltalk image garbageCollect.
parser := TSParser new.
tsLanguage := TSLanguage python.
parser language: tsLanguage.
importer := TSFASTImporter new.
importer tsLanguage: tsLanguage.
importer languageName: 'Python'.
importer originString: string.
^ importer import: (parser parseString: string) rootNode "pay attention to #source: "

You can check FASTTypeScript for more details.

N.B: We recommend you to parse many python examples (you can find a lot in the main project of TreeSitter-Python), using Pharo-Tree-Sitter project. Once parsed you can inspect in Pharo the properties for each node using #collectFieldNameOfNamedChild and find the properties for each one. Then you can add them in #defineRelations of the metamodel.

That’s it for now!

Visualizing java dependencies between microservices

In July, I had to analyze the dependencies between microservices for Berger-Levrault. To do so, I chose to use the Moose tool.

Here is the simple but effective process I followed.

The backend I analyzed follows a common pattern. In the git repository, there is a folder api containing the microservices, and a folder lib with resources for each microservice. There is also an additional project called lib-common.

Thus, the microservice home is composed of a project named api-home and a project named lib-home.

  • Directorysrc
    • Directoryapi
      • Directoryapi-home
        • Directorysrc/
    • Directorylib
      • Directorylib-home
        • Directorysrc/
      • Directorylib-common/

We wanted to check that dependencies were correctly implemented in the project:

  • no api project should directly depend on another api (API calls are allowed, but not classic Java dependencies)
  • each api project can depend on its equivalent lib project
  • lib projects can depend on lib-common

Let’s see how to perform this check with Moose.

To perform the analysis, I used Moose and followed these steps:

  1. I installed the latest version of Moose.
  2. I cloned the repository containing the backend to analyze.
  3. I installed the project dependencies with:
    Terminal window
    mvn clean install
  4. I used VerveineJ to generate a model of the code. To avoid version issues, I used the Docker version of VerveineJ, which gave me a model.json file:
    Terminal window
    docker run -v /path/to/my/project:/src -v /home/badetitou/.m2/repository:/dependency ghcr.io/evref-bl/verveinej:v3.3.1 -alllocals -anchor assoc -format json -o model.json
  5. I loaded the model into a Moose 12 image by drag-and-dropping the model.json file into the running Moose image.

Moose provides ready-to-use visualizations to represent dependencies. In my case, I chose to use the Architectural map. This visualization presents the entities of the model (packages, classes, methods) as a tree and displays the associations between them (i.e., the dependencies).

I first asked this visualization to display all the classes. It works, but does not allow us to distinguish the different microservices.

Unhelpful architectural map

The main problem is that too much information is displayed and we cannot see the microservices. To fix this, I used Moose’s tag feature. A tag allows you to associate a color and a name to an entity.

So I tagged the classes of my system depending on their location in the repository.

To do this, in a Moose Playground, I used the following script (adapt it to your context 😉):

model allTaggedEntities do: [ :entity | entity removeTags ].
((model allWithSubTypesOf: FamixJavaType) reject: [ :type | type sourceAnchor isNil ]) do: [ :class |
class sourceAnchor ifNotNil: [ :sa |
(sa fileName beginsWith: './services/api-A') ifTrue: [ class tagWithName: 'A' ].
(sa fileName beginsWith: './services/api-B') ifTrue: [ class tagWithName: 'B' ].
(sa fileName beginsWith: './services/api-C') ifTrue: [ class tagWithName: 'C' ].
(sa fileName beginsWith: './libraries/lib-A') ifTrue: [ class tagWithName: 'lib-A' ].
(sa fileName beginsWith: './libraries/lib-common') ifTrue: [ class tagWithName: 'lib-common' ].
(sa fileName beginsWith: './libraries/lib-B') ifTrue: [ class tagWithName: 'lib-B' ].
(sa fileName beginsWith: './libraries/lib-C') ifTrue: [ class tagWithName: 'lib-C' ].
]
].
(model allWithSubTypesOf: FamixJavaType) reject: [ :type | type tags isEmpty ]

The result is not perfect yet because entities are not grouped by tag. To fix this, simply select the tag to add option in the architectural map settings.

Correct architectural map

You then get a clear visualization of the links between the microservice projects and the libraries they use. We see that no api is linked to an incorrect lib project. We also notice that microservice B is linked to lib-B as well as lib-common. Maybe this link to lib-common should be removed? But that’s another story…