Label Contractor for shortening labels
Introduction
Section titled “Introduction”When there are long labels in a visualization the displayed elements can overlap which renders the visualization very difficult to read, or the elements have to be very spread out (to not overlap) and then the visualization does not fit in a normal screen or paper.
The Label Contractor project comes to solve this problem by offering several ways to reduce the length of labels (hence its name).
For example:
LbCContractor new removeVowels; reduce: 'MergedSuperClasses'.
will return ‘MrgdSprClsss’ by suppressing all vowels from the label.
In this blog post, I will explain how you can apply a reduction following different strategies and how you can combine them.
How to install the project
Section titled “How to install the project”In order to install this project, on a Pharo 9.0/Moose Suite 9.0 image execute the following script in the Playground:
Metacello new baseline: 'LabelContractor'; repository: 'github://moosetechnology/LabelContractor/src'; load
The full project including examples of the application of LabelContractor on visualizations and Spec2 can be obtained with:
Metacello new baseline: 'LabelContractor'; repository: 'github://moosetechnology/LabelContractor/src'; load: 'full'.
Label Contractor Description
Section titled “Label Contractor Description”The idea was to build a tool that can reduce labels without losing too much information, and is to provide the user with a set of strategies, allowing him to apply them separately or in combination.
There are startegies for: removing some arbitrary substring from labels, removing all vowels, removing fully qualified path names, etc.
Implementation choices
Section titled “Implementation choices”The contraction of labels is based on two decisions:
- First, filenames are treated by default to remove the full pathname, therefore ‘/home/idtaleb/Label Contractor/images/src/LbCContractor.st’ will be truncated as ‘LbCContractor.st’. If a label is not a filename, this has no effect on it;
- Second, some strategies working on words assume the labels follow the CamelCase convention.
Currently these decisions are hardcoded in the contractor, but they will be implemented as normal strategies in the future.
There are 13 strategies that we are going to review now.
Remove Filename Extension
Section titled “Remove Filename Extension”This strategy removes the extension of filenames. The extension is the part of the label after the last dot (’.’)
LbCContractor new removeFilenameExtension ; reduce: 'LbCContractor.st'
will return ‘LbCContractor’.
Abbreviate Names
Section titled “Abbreviate Names”This strategy abbreviates the words in the label to their first capital letter. As explained before, the label is assumed to follow the CamelCase convention. Only the first three words can be abbreviated (if there are more than three words). On top of that, the last word is not abbreviated.
LbCContractor new abbreviateNames; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘CMSAndInheritedTraitsHierarchyTest’ (only the first tree words Cly, Merged, and Superclasses were abbreviated).
Remove Vowels
Section titled “Remove Vowels”This strategy removes all vowels from the label. Notice that the first letter of a word is always kept whether it is a vowel or a consonant.
Note: In English, the letter ‘y’ is sometimes considered a vowel and sometimes a consonant. This strategy assumes that ‘y’ is a consonnant when it is followed by a vowel like in ‘layer’.
LbCContractor new removeVowels; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClMrgdSprclsssAndInhrtdTrtsHrrchTst’.
LbCContractor new removeVowels; reduce: 'layer'
will return ‘lyr’.
Susbtitute Substring
Section titled “Susbtitute Substring”This strategy replaces a word by another one. If the word appears more than once, then all occurrences of the word will be replaced.
Example:
LbCContractor new substitute: 'Superclasses' by: 'Sc'; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClyMergedScAndInheritedTraitsHierarchyTest’.
Size Reduction Strategies
Section titled “Size Reduction Strategies”There are three strategies based on specifically fixing a maximal size for the contracted label.
Remove Frequent Letters
Section titled “Remove Frequent Letters”This strategy removes the frequent letters until having the maximal size. The frequency of letters is hard coded from know frequency of letters in english texts. Letters are removed, one at a time, from the most frequent (in english) to the least frequent until the label is the maximum size. The startegy is not case sensitive, meaning that a ‘T’ is counted as a ‘t’.
LbCContractor new removeFrequentLettersUpTo: 20; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClyMgdpcldIhidiHichy’.
removing the letters (number of apparition in parentheses) ‘e’, ‘r’, ‘s’, ‘u’, ‘a’, ‘n’, and ‘t’.
Ellipsis
Section titled “Ellipsis”This strategy keeps the beginning and the end of the label and replace the middle by ellipsis represented as a ''.
The default size is eight, so it keeps the first four characters and the last four characters af the label and separates them with a tilde ''.
The default size can be changed.
LbCContractor new ellipsis; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClyM~Test’.
Pick First Characters
Section titled “Pick First Characters”This strategy takes the first eight characters of a label. Again, the default size can be changed.
LbCContractor new pickFirstCharacters; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClyMerge’ (the first eight letters are kept)
Remove Substrings
Section titled “Remove Substrings”This is another group of three strategies that remove some given substring from a label.
Notice that by default the startegies are not case sensitive.
Remove Any Substrings
Section titled “Remove Any Substrings”This strategy accepts one or a collection of substring to be removed, and it removes all the occurrences of these substrings in the label.
An example with only one substring to remove:
LbCContractor new removeSubstring: 'Merged'; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClySuperclassesAndInheritedTraitsHierarchyTest’.
An other example with a collection of substrings:
LbCContractor new removeSubstrings: #('cly' 'merged' 'and' 'test'); reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘SuperclassesInheritedTraitsHierarchy’.
Remove Prefix
Section titled “Remove Prefix”The same idea, this strategy removes the prefix of the label if it matches the given prefix: A collection of prefixes can be given if the same contractor is applied to several labels (with different prefixes).
LbCContractor new removePrefix: 'ClyMerge'; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘dSuperclassesAndInheritedTraitsHierarchyTest’.
Remove Suffix
Section titled “Remove Suffix”This strategy is similar to the last one, except that it removes the suffix substrings.
Remove Words At
Section titled “Remove Words At”This is a group of three strategies which is very similar to the Remove Substrings group, except that it removes words in the label (assuming a CamelCase convention). The words to remove are specified by their indexes.
Remove Any Words At
Section titled “Remove Any Words At”This strategy removes words of the label, that are specified by their indexes. Like Remove Any Substrings, you can give an index or a collection of indexes of the words to remove.
LbCContractor new removeWordAt: 2; reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'
will return ‘ClySuperclassesAndInheritedTraitsHierarchyTest’ (the second word, ‘Merged’ was removed).
Remove First Word
Section titled “Remove First Word”This strategy removes automatically the first word of the label, whatever it is.
Remove Last Word
Section titled “Remove Last Word”This strategy removes automatically the last word of the label, whatever it is.
Strategies Combination
Section titled “Strategies Combination”Finally, there are two ways to combine the strategies together, in the both cases the user must provides the strategies:
- The user provides the strategies in the order to apply them:
LbCContractor new ellipsisUpTo: 20; removeVowels; removeSubstrings: #('Merged' 'Test'); reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClMrgdS~rrchTst’ by applying first ‘ellipsisUpTo:’, then ‘removeVowels’, and then ‘removeSubstrings:’. Note that the last one was actually not applied because the other two had already changed the label, and the ellipsis is shorter than expected because ‘removeVowels’ came after.
- Combining following predefined priorities:
To avoid unreasonable result (as in the previous example), the strategies have built-in priorities that can be applied with ‘usingPriorities’.
The same example but with priorities:
LbCContractor new usingPriorities; ellipsisUpTo: 20; removeVowels; removeSubstrings: #('Merged' 'Test'); reduce: 'ClyMergedSuperclassesAndInheritedTraitsHierarchyTest'.
will return ‘ClSprclsss~dTrtsHrrch’
The result is different, because the substrings were removed before applying removeVowels strategy which was itself applied before ‘ellipsisUpTo:’.
The priority system is defined as follows (the color green means that the strategy has the highest priority):
Conclusion
Section titled “Conclusion”In this post, we have seen how to compact labels in a visualization using the LabelContractor. The goal is to improve the readability of a visualization while retaining as much information as possible.
Note that LabelContractor is not just for visualizations, but you can use it whenever you want.