OPENRNDR & Processing - Word frequencies

Word frequencies and using collections

Hi! :slight_smile: I found one of the examples of Processing which analyzes two books (Dracula and Frankenstein), gets the word frequencies, then draws the words moving at different speeds and font sizes depending on those frequencies. It only shows words which appear more than 5 times in one of the books while not appearing at all in the other book.

I like how this can be expressed in a concise way in Kotlin.

Note this is not a literal port of the Processing program to OPENRNDR, but more a reinterpretation with a similar output and simple code.

Processing / Java

The code is split in two files. I suggest opening them in new browser windows to see them side by side with the Kotlin code.

OPENRNDR / Kotlin

Imports
import org.openrndr.application
import org.openrndr.color.ColorRGBa
import org.openrndr.draw.loadFont
import org.openrndr.extra.noise.uniform
import org.openrndr.math.Vector2
import org.openrndr.math.map
import org.openrndr.shape.IntRectangle
import java.io.File
fun main() = application {
    program {
        val fonts = List(10) {
            loadFont("data/fonts/SourceCodePro-Regular.ttf", 4.0 + 5 * it)
        }

        // Make spawnArea tall so it's not too crowded with words
        val spawnArea = Rectangle(-50.0, 0.0, width * 1.0, height * 3.0)

        // Render area is larger than the window so appearing objects don't pop at the top
        // but start rendering above the edge and then slide in
        val renderArea = drawer.bounds.offsetEdges(30.0)

        class Word(val word: String, count: Int, val color: ColorRGBa) {
            // Pick initial random position inside spawnArea
            private var position = Vector2.uniform(spawnArea)

            // Calculate speed based on word count
            private var speed = Vector2(
                0.0, count.toDouble().map(5.0, 25.0, 0.1, 5.0, true)
            )

            // Pick font (size) based on word count
            private var font = fonts[count.toDouble().map(
                5.0, 25.0, 0.0, fonts.size - 1.0, true
            ).toInt()]

            fun display() {
                // Move Word down
                position += speed

                // If too far down, bring back up
                if (position.y > spawnArea.height) {
                    position -= Vector2(0.0, spawnArea.height)
                }

                // If `position` inside `renderArea`, draw it
                if (renderArea.contains(position)) {
                    drawer.fill = color
                    drawer.fontMap = font
                    drawer.text(word, position)
                }
            }
        }

        // Create a `Map<String, Int>` with words and their counts
        val freqsDracula = File("data/texts/dracula.txt")
            .readText().lowercase().split(Regex("\\W+"))
            .groupingBy { it }.eachCount().filter { it.value >= 5 }

        val freqsFranken = File("data/texts/frankenstein.txt")
            .readText().lowercase().split(Regex("\\W+"))
            .groupingBy { it }.eachCount().filter { it.value >= 5 }

        // Make sure no words appear in both Maps. Unique words only.
        val uniqueDracula = freqsDracula - freqsFranken.keys
        val uniqueFranken = freqsFranken - freqsDracula.keys

        // Finally create a List<Word> to be displayed
        val words = uniqueDracula.map { (word, count) ->
            Word(word, count, ColorRGBa.WHITE)
        } + uniqueFranken.map { (word, count) ->
            Word(word, count, ColorRGBa.BLACK)
        }

        configure {
            width = 640
            height = 360
        }

        extend {
            drawer.clear(ColorRGBa.GRAY)
            words.forEach { it.display() }
        }
    }
}

Notice how I changed the approach making the Word class simpler. It has now just one method called display().

In the original example the Word class was used to keep various counts: how many times it appeared in book A, in book B, and in both together, with the goal of later filtering those words that appear 5 times or more only in one of the books but not in the other.

Let’s take a look at how the Kotlin version works by jumping to the line that starts with val freqsDracula =. It reads like this: take a txt file, read its content, make it lower case, split (by white space) into words (likely repeated words), group by word, count words in each group, finally discard words with less than 5 repetitions.

At this point we have one Map object per book linking String (the word) to Int (its count). Now, according to the original example, I want to remove words that are present in the other book, which I do just by subtracting the keys (words) from the other book. This gives me the words appearing 5 or more times in each book which never appear in the other book. In 4 lines of code.

As a last step I transform our two Map of unique words into one list of Word objects which I called words. This way I don’t need to execute qualify() on each word to figure out if it is worth displaying: every item in words is displayable and it has a speed, a color and a font size pre-calculated. The original version calculates those three properties each time move or display are called.

Manipulating collections this way is one of my favorite aspects of Kotlin :slight_smile:
I hope you find it as concise and readable as I do.

Text size and text centering

One difference I want to point out is that one does not specify the text size when drawing text in OPENRNDR, the size is set when loading the font. Therefore I created an array of fonts with different sizes. An alternative would have been to call drawer.scale() to scale down the text and use just one font size.
Another difference regarding text is that there’s isn’t a direct equivalent to Processing’s textAlign so the left edge of the spawnArea is at -50 to make sure the left margin of the window is not empty.

About IDEs

The Kotlin code in this page may be a bit harder to grasp than when inside a good IDE because the IDE shows type hints and tooltips describing what is under the mouse:


thick lines under type hints, thin orange line shows a tooltip for freqsDracula

:point_down: Share your questions and comments below . :mag_right: Find other OPENRNDR & Processing posts