Introduction
When there are long labels in a visualization the displayed elements can overlap which renders the visualization very difficult to read, or the elements have to be very spread out (to not overlap) and then the visualization does not fit in a normal screen or paper.
The Label Contractor project comes to solve this problem by offering several ways to reduce the length of labels (hence its name).
For example:
LbCContractor new
removeVowels;
reduce: 'MergedSuperClasses'.
will return ‘MrgdSprClsss’ by suppressing all vowels from the label.
In this blog post, I will explain how you can apply a reduction following different strategies and how you can combine them.
How to install the project
In order to install this project, on a Pharo 9.0/Moose Suite 9.0 image execute the following script in the Playground:
Metacello new
baseline: 'LabelContractor';
repository: 'github://moosetechnology/LabelContractor/src';
load
The full project including examples of the application of LabelContractor on visualizations and Spec2 can be obtained with:
Metacello new
baseline: 'LabelContractor';
repository: 'github://moosetechnology/LabelContractor/src';
load: 'full'.
Label Contractor Description
The idea was to build a tool that can reduce labels without losing too much information, and is to provide the user with a set of strategies, allowing him to apply them separately or in combination.
There are startegies for: removing some arbitrary substring from labels, removing all vowels, removing fully qualified path names, etc.
Implementation choices
The contraction of labels is based on two decisions:
- First, filenames are treated by default to remove the full pathname, therefore ‘/home/idtaleb/Label Contractor/images/src/LbCContractor.st’ will be truncated as ‘LbCContractor.st’. If a label is not a filename, this has no effect on it;
- Second, some strategies working on words assume the labels follow the CamelCase convention.
Currently these decisions are hardcoded in the contractor, but they will be implemented as normal strategies in the future.
There are 13 strategies that we are going to review now.
Remove Filename Extension
This strategy removes the extension of filenames. The extension is the part of the label after the last dot (‘.’)
LbCContractor new
removeFilenameExtension ;
reduce: 'LbCContractor.st'
will return ‘LbCContractor’.
Abbreviate Names
This strategy abbreviates the words in the label to their first capital letter. As explained before, the label is assumed to follow the CamelCase convention. Only the first three words can be abbreviated (if there are more than three words). On top of that, the last word is not abbreviated.
LbCContractor new
abbreviateNames;
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘CMSAndInheritedTraitsHierarchyTest’ (only the first tree words Cly, Merged, and Superclasses were abbreviated).
Remove Vowels
This strategy removes all vowels from the label. Notice that the first letter of a word is always kept whether it is a vowel or a consonant.
Note: In English, the letter ‘y’ is sometimes considered a vowel and sometimes a consonant. This strategy assumes that ‘y’ is a consonnant when it is followed by a vowel like in ‘layer’.
LbCContractor new
removeVowels;
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClMrgdSprclsssAndInhrtdTrtsHrrchTst’.
LbCContractor new
removeVowels;
reduce: 'layer'
will return ‘lyr’.
Susbtitute Substring
This strategy replaces a word by another one. If the word appears more than once, then all occurrences of the word will be replaced.
Example:
LbCContractor new
substitute: 'Superclasses' by: 'Sc';
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClyMergedScAndInheritedTraitsHierarchyTest’.
Size Reduction Strategies
There are three strategies based on specifically fixing a maximal size for the contracted label.
Remove Frequent Letters
This strategy removes the frequent letters until having the maximal size. The frequency of letters is hard coded from know frequency of letters in english texts. Letters are removed, one at a time, from the most frequent (in english) to the least frequent until the label is the maximum size. The startegy is not case sensitive, meaning that a ‘T’ is counted as a ‘t’.
LbCContractor new
removeFrequentLettersUpTo: 20;
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClyMgdpcldIhidiHichy’.
removing the letters (number of apparition in parentheses) ‘e’, ‘r’, ‘s’, ‘u’, ‘a’, ‘n’, and ‘t’.
Ellipsis
This strategy keeps the beginning and the end of the label and replace the middle by ellipsis represented as a ‘~’. The default size is eight, so it keeps the first four characters and the last four characters af the label and separates them with a tilde ‘~’. The default size can be changed.
LbCContractor new
ellipsis;
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClyM~Test’.
Pick First Characters
This strategy takes the first eight characters of a label. Again, the default size can be changed.
LbCContractor new
pickFirstCharacters;
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClyMerge’ (the first eight letters are kept)
Remove Substrings
This is another group of three strategies that remove some given substring from a label.
Notice that by default the startegies are not case sensitive.
Remove Any Substrings
This strategy accepts one or a collection of substring to be removed, and it removes all the occurrences of these substrings in the label.
An example with only one substring to remove:
LbCContractor new
removeSubstring: 'Merged';
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClySuperclassesAndInheritedTraitsHierarchyTest’.
An other example with a collection of substrings:
LbCContractor new
removeSubstrings: #('cly' 'merged' 'and' 'test');
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘SuperclassesInheritedTraitsHierarchy’.
Remove Prefix
The same idea, this strategy removes the prefix of the label if it matches the given prefix: A collection of prefixes can be given if the same contractor is applied to several labels (with different prefixes).
LbCContractor new
removePrefix: 'ClyMerge';
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘dSuperclassesAndInheritedTraitsHierarchyTest’.
Remove Suffix
This strategy is similar to the last one, except that it removes the suffix substrings.
Remove Words At
This is a group of three strategies which is very similar to the Remove Substrings group, except that it removes words in the label (assuming a CamelCase convention). The words to remove are specified by their indexes.
Remove Any Words At
This strategy removes words of the label, that are specified by their indexes. Like Remove Any Substrings, you can give an index or a collection of indexes of the words to remove.
LbCContractor new
removeWordAt: 2;
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClySuperclassesAndInheritedTraitsHierarchyTest’ (the second word, ‘Merged’ was removed).
Remove First Word
This strategy removes automatically the first word of the label, whatever it is.
Remove Last Word
This strategy removes automatically the last word of the label, whatever it is.
Strategies Combination
Finally, there are two ways to combine the strategies together, in the both cases the user must provides the strategies:
- The user provides the strategies in the order to apply them:
LbCContractor new ellipsisUpTo: 20; removeVowels; removeSubstrings: #('Merged' 'Test'); reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClMrgdS~rrchTst’ by applying first ‘ellipsisUpTo:’, then ‘removeVowels’, and then ‘removeSubstrings:’. Note that the last one was actually not applied because the other two had already changed the label, and the ellipsis is shorter than expected because ‘removeVowels’ came after.
- Combining following predefined priorities:
To avoid unreasonable result (as in the previous example), the strategies have built-in priorities that can be applied with ‘usingPriorities’.
The same example but with priorities:
LbCContractor new
usingPriorities;
ellipsisUpTo: 20;
removeVowels;
removeSubstrings: #('Merged' 'Test');
reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClSprclsss~dTrtsHrrch’
The result is different, because the substrings were removed before applying removeVowels strategy which was itself applied before ‘ellipsisUpTo:’.
The priority system is defined as follows (the color green means that the strategy has the highest priority):
Conclusion
In this post, we have seen how to compact labels in a visualization using the LabelContractor. The goal is to improve the readability of a visualization while retaining as much information as possible.
Note that LabelContractor is not just for visualizations, but you can use it whenever you want.