How costly are your `Option`s?
Code which is optimised and performant (for a majority of use cases) doesn't have to be ugly and unreadable, for there's a lot of performance gains to be made by simply following good design and clean coding principles.
Scala's Option is a very idiomatic way of representing optional values. Now, I'm not against the use of Option, however, more often than I'd like to, have I come across it being misused/abused. For example, when defining generic types:
Apart from being hard to read and reason about, the usage of such types with Options exert a great deal of Heap pressure when objects of these types have to created over and over again, especially if there are to be millions of instances of such objects on the heap.
To test this out, we've got a snippet which uses a generic case class modelled as:
A 128MB heap with the Epsilon GC soon fills out and exits after allocating roughly 1.37 million instances of ID taking up roughly 54.45MB as shown by these heap dump reports for three separate runs:
Generic Run#1
Generic Run#2
Generic Run#3
Now, if we clean the types up a bit to have specialised instances:
We can see a huge improvement in the number of instances we can allocate with the same set-up as before. Now, we're able to allocate around 2.74 million instances (collectively of all the three specialised types) with just about 42.81MB of heap as can be seen in the reports below:
Specialised Run#1
Specialised Run#2
If we were to look at the savings in terms of memory per object instance, it works out to about:
53,444.6KB / 1,368,181.66 objects = 39.06 Bytes per object
While with specialised types, we can reduce this by more than half:
42,811.5KB / 2,739,917 objects = 15.62 Bytes per object
That works out to about a 23.44 Bytes savings per instance for a trivial example like this, which might not seem like a lot, but scaling this number up for a million instances would translate to 23.44MB less allocation and garbage collected.
Furthermore, depending on the number of `Option`s, there might be more noticeable gains.
Finally, if you'd like to test this out for yourself, please feel free to grab a copy of the code from here:
github.com/ausmarton/scala-performance
Scala's Option is a very idiomatic way of representing optional values. Now, I'm not against the use of Option, however, more often than I'd like to, have I come across it being misused/abused. For example, when defining generic types:
case class Asset(name: String, shareQuantity: Option[Int], sharePrice: Option[BigDecimal], bondYield: Option[BigDecimal], cashValue: Option[Amount])which should've been modelled as specialised types:
sealed abstract class Asset(name: String) case class Stock(name: String, shareQuantity: Int, sharePrice: BigDecimal) extends Asset(name) case class Bond(name: String, bondYield: BigDecimal) extends Asset(name) case class Cash(name: String, cashValue: Amount) extends Asset(name)
Apart from being hard to read and reason about, the usage of such types with Options exert a great deal of Heap pressure when objects of these types have to created over and over again, especially if there are to be millions of instances of such objects on the heap.
To test this out, we've got a snippet which uses a generic case class modelled as:
case class ID(passportNumber: Option[Int] = None, citizenCardNumber: Option[Int] = None, drivingLicenceNumber: Option[Int] = None)
A 128MB heap with the Epsilon GC soon fills out and exits after allocating roughly 1.37 million instances of ID taking up roughly 54.45MB as shown by these heap dump reports for three separate runs:
Generic Run#1
Generic Run#2
Generic Run#3
Now, if we clean the types up a bit to have specialised instances:
sealed trait ID case class Passport(passportNumber: Int) extends ID case class CitizenCard(citizenCardNumber: Int) extends ID case class DrivingLicence(drivingLicenceNumber: Int) extends ID
We can see a huge improvement in the number of instances we can allocate with the same set-up as before. Now, we're able to allocate around 2.74 million instances (collectively of all the three specialised types) with just about 42.81MB of heap as can be seen in the reports below:
Specialised Run#1
Specialised Run#2
Specialised Run#3
If we were to look at the savings in terms of memory per object instance, it works out to about:
53,444.6KB / 1,368,181.66 objects = 39.06 Bytes per object
While with specialised types, we can reduce this by more than half:
42,811.5KB / 2,739,917 objects = 15.62 Bytes per object
That works out to about a 23.44 Bytes savings per instance for a trivial example like this, which might not seem like a lot, but scaling this number up for a million instances would translate to 23.44MB less allocation and garbage collected.
Furthermore, depending on the number of `Option`s, there might be more noticeable gains.
Finally, if you'd like to test this out for yourself, please feel free to grab a copy of the code from here:
github.com/ausmarton/scala-performance
Comments
Post a Comment