Principle Data Science Techniques for Senior Scala Developers

Vakindu Philliam
4 min readMay 24, 2021

If you’re new to data processing using the Scala programming language, these 3 Scala programs will give you a better understanding of how to work with data processing pipelines in the language. Understanding the principle of data immutability, frequency mapping, and how to process data through a data function pipeline are a crucial part of data science application development.

This tutorial assumes the reader has expert understanding of the syntax of the Scala programming language.

The three techniques we explore are;

1. Data Function Pipeline: How to analyze and manipulate data with a series of sequential functions.

2. Frequency Occurrence Mapping: How to identify frequently occurring data.

3. Data Immutability Principle: How to fetch data from a resource without tainting the state of the data.

Data Function Pipeline:

This technique explores how to manipulate and process data on-the-fly while it’s passed from one endpoint to another in a scala program. A series of sequential functions are defined to manipulate the data as it is transmitted between the endpoints.

The Script: DataPipeline.scala

Open your favorite code editor. Then Copy and Paste the Scala code below into a new Scala file. Then run the code.

The Code:

/**

Data Function Pipeline Processing:

The data processing pipeline, the makePipeline function:

How it works:

The function accepts a variable number of functions, and it returns a new function that accepts one parameter arg.

The returned function should call the first function in the makePipeline with the parameter arg, and call the second function with the result of the first function.

The returned function should continue calling each function in the makePipeline in order, following the same pattern, and return the value from the last function, and so on.

For example, makePipeline((x:Int) =>x+3, (x:Int) =>x+1, (x:Int) => (x/2), and then calling the returned function with 3 should return 5.

*/

object Pipeline {

def compute[T](input:T, fn:T=>T) = {

fn(input)

}

def makePipeline[T](functions: (T=>T)*): (T=>T) = {

(arg:T) => {

var v=functions.foldLeft(arg)((arg,fn) => compute(arg, fn))

v

}

}

def main(args:Array[String]) = {

println(makePipeline((x:Int) => x*3, (x:Int) => x+1, (x:Int) => x/2)(3))

}

}

Frequency Occurrence Mapping:

This Scala technique explores how to identify frequently occurring elements in an array and mapping out their occurrences in descending order of frequency. This technique is especially important when analyzing data for frequency patterns.

The Script: FrequencyMapping.Scala

Copy and Paste the Scala code below into a new Scala file. Then run the code.

The Code:

/**

Order by Descending Frequency:

A Scala Program to map out an array of elements in descending frequency

(from the most frequently occurring element to the least frequently occurring element).

*/

import scala.collection.mutable.LinkedHashMap

object Frequency {

def main(args:Array[String]):Unit = {

println(descendingSort(wordFrequency(Array(“one”,”two”,”three”,”two”,”one”,”two”,”seven”,”two”,”ten”,”THREE”,”FOUR”,”FIVE”,”thrEE”))))

}

def wordFrequency(words:Array[String]):Map[String,Int] = {

var wordcounts = Map.empty[String,Int]

words.foreach((w)=>{

var word = w.toLowerCase()

if (wordcounts.contains(word)) {

wordcounts += (word -> (wordcounts(word) + 1))

}

else {

wordcounts += (word -> (0+1))

}

})

wordcounts

}

def descendingSort(unsorted: Map[String,Int]) = {

var sortedMap = new LinkedHashMap[String,Int]()

val keys = unsorted.keys.toList

// sort in descending order of frequency

val sortedKeys = keys.sortWith((a,b)=>{unsorted(a)>unsorted(b)})

sortedKeys.foreach(key => sortedMap += (key -> (unsorted(key))))

sortedMap

}

}

Data Immutability Principle:

This technique explores how to prevent pollution of resource data through immutability. To do this, we’re going to use Scala vectors. Vectors are immutable therefore ensure the original elements stored in it remain unchanged during data analysis.

The Script: ImmutableData.Scala

Copy and Paste the Scala code below into a new Scala file. Then run the code.

The Code:

/**

Immutability of Vectors:

The aim is to demonstrate how to work with immutable data.

How it works:

When the program is loaded, the function PlayerInventory is initialized with basic elements in the “items” variable.

The aim here is to demonstrate how to add or remove items from the original “items” list; since vectors are immutability (their original states CANNOT change).

It should also be possible to add and drop items from the inventory, with duplicate items added and removed separately.

For example, if “lumber” was added to the inventory and “stone” was removed, getItems() should return a Vector containing “lumber”, “magic potion”, and “lumber”, in any order.

REMEMBER: The aim of this technique is to ensure the original elements in “items” remain unchanged.

*/

import scala.collection.mutable.ListBuffer

class PlayerInventory {

private var items: Vector[String] = Vector(“lumber”, “stone”, “magic potion”)

var itemAdded=new ListBuffer[String]

var itemDropped=new ListBuffer[String]

def getItems(): Vector[String] = {

var v1 = items++itemAdded.toList

var newList =v1.toList

itemDropped.foreach((arg)=>{

var i:Int = newList.indexOf(arg)

newList = newList.patch(i,Nil,1)

})

var v2=newList.toVector

v2

}

def addToInventory(item: String):Unit = {

itemAdded+=item

}

def dropFromInventory(item:String): Unit = {

itemDropped.append(item)

}

}

object PlayerInventory {

def main(args:Array[String]) = {

var p:PlayerInventory = new PlayerInventory

p.addToInventory(“lumber”)

p.addToInventory(“door”)

p.addToInventory(“wood”)

p.addToInventory(“window”)

p.addToInventory(“brakes”)

p.addToInventory(“window”)

p.dropFromInventory(“magic potion”)

p.dropFromInventory(“brakes”)

println(p.getItems()) // lumber, stone, lumber, door, wood, window, window

}

}

Conclusion:

For the complete repository of Scala Developer Techniques, Coding exercises, and challenges (Junior Level, Mid-Level and Senior Level), Go to: http://Github.com/VakinduPhilliam/Scala_Dev_Knight/Code_Challenges

Find Me:

Github: http://Github.com/VakinduPhilliam

Twitter: http://Twitter.com/VakinduPhilliam

--

--

Vakindu Philliam

Below average chess player. Imperfect. A Work in Progress. Backend Developer. Blockchain Developer. Data Science. Christ loved me first. 1 John 4:19