Thursday, 24 February 2011

Scala collections: Filtering each n-th element - Part 2

If you read my previous post you know that I'm trying to find the best implementation for filtering each n-th element from a collection. Since then I found two more possible implementations:
  1. Loop - Reverse
    This implementation is faster than other loop-based implementations. It benefits from replacing append :+ with concatenation ::
    def filterUsingLoopReversed[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      var out = List.empty[A]
      val it = in.iterator
      for (i <- 0 to in.size - 1) {
        if (i % step == offset) out = it.next :: out
        else it.next
      }
      out.reverse
    }
    
  2. Range - Map
    This is simply the best implementation I have. It's not only concise but also very fast. The drawback is that it requires the input collection to be an instance of Seq
    def filterUsingMap[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      val seq = in.toSeq
      Range(offset, in.size, step).map(seq(_))
    }
    
    Edit: Peter suggested that the perfomance of this solution depends on the implementation of the input collection. To mitigate this I've decided to convert input collection to IndexedSeq which is supposed to guarantee near constant-time element access. The performance is a bit worse because of the extra conversion time, though it's still the fastest solution I found
    def filterUsingMap[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      val seq = in.toIndexedSeq
      Range(offset, in.size, step).map(seq(_))
    }
    

Benchmark results:
Average execution time of 100 executions

Range-Map (light green) is comparable to Loop-Buffered (dark green) when filtering small-mid collection (<10000 elements). And there is no better choice than Range-Map to filter big collections (>10000 elements).

Saturday, 19 February 2011

Scala collections: Filtering each n-th element

Recently I had to process huge log files. I found it to be a great opportunity to check my Scala language skills. One of the problems I had to figure out was filtering every n-th element from iterable collection. This is supposed to be a trivial task, but well, after thinking for a while I had five competitive implementations ready. Which one should I pick? Perhaps the most concise one. To make the choice easier I've decided to measure how performant they are.
Filtering implementations:
  1. Filter using groups This is my favorite one, the cleanest implementation.
    def filterUsingGroups[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] =
      in.drop(offset).grouped(step).map(_.head).toIterable
    How it works:
    in = (1,2,3,4,5,6,7,8,9,10), step=3, offset=2
    drop(2) => (3,4,5,6,7,8,9,10)
    grouped(3) => ((3,4,5),(6,7,8),(9,10))
    map(_.head) => (3,6,9)
    
  2. Filter using index
    def filterUsingIndex[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] =
      in.zipWithIndex.filter(_._2 % step == offset).unzip._1
    
    How it works:
    in = (1,2,3,4,5,6,7,8,9,10), step=3, offset=2
    zipWithIndex => ((1,0),(2,1),....,(10,9))
    filter(_._2 % 3 == 2) => ((3,2), (6,5), (9,8))
    unzip._1 => (3,6,9)
    
  3. Filter using recursion
    def filterUsingRecursion[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      if (in.size >= step) {
        val (l, r) = in.splitAt(step)
        List(l.drop(offset).head) ++ filterUsingRecursion(r, step, offset)
      } else List.empty[A]
    }
    
    How it works:
    in = (1,2,3,4,5,6,7,8,9,10), step=3, offset=2
    splitAt(3) => l=(1,2,3); r=(4,5,6,7,8,9,10)
    l.drop(2).head == 3 => List(3) ++ filterUsingRecursion((4,5,6,7,8,9,10), 3, 2)
    
  4. Filter using loop (unbuffered)
    def filterUsingLoop[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      var out = List.empty[A]
      val it = in.iterator
      for (i <- 0 to in.size - 1) {
        if (i % step == offset) out = out :+ it.next
        else it.next
      }
      out
    }
    
  5. Filter using loop (buffered)
    def filterUsingLoopWithBuffer[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      val out = collection.mutable.ListBuffer.empty[A]
      val it = in.iterator
      for (i <- 0 to in.size - 1) {
        if (i % step == offset) out += it.next
        else it.next
      }
      out
    }
    
  6. Filter using tail recursion (added later)
    def filterUsingTailRecursion[A](in: Iterable[A], step: Int, offset: Int): Iterable[A] = {
      def filter(acc: ListBuffer[A], in: Iterable[A]): ListBuffer[A] = {
        in.splitAt(step) match {
          case (l, r) if (l.size >= offset) => filter(acc += l.drop(offset).head, r)
          case _ => acc
        }
      }
      filter(ListBuffer.empty[A], in)
    }
    

In my benchmark I've called each method 10 times in a row and measured average execution time (using parameters step=3, offset=2). When benchmarking I call foreach on the result of each execution of the method under test. I do this to materialize returned collection if it would be backed by an iterator.

def benchmark[A](fn: => Iterable[A]): Double = {
  fn // warm-up
  val times = 10
  val start = System.currentTimeMillis
  for (i <- 1 to times) fn.foreach(_ => ())
  val stop = System.currentTimeMillis
  (stop - start).toDouble / times
}
The chart below shows the results I've collected. Conclusions:
  • Buffered loop is the fastest implementation
  • Index is also quite good
  • Recursion is working pretty well with small collections but fails to process huge collections (java.lang.StackOverflowError)
  • Unbuffered loop fails to process huge collections because of intesive data copying
  • Groups give worst performance
  • Tail recursion works very well with huge collections
Sadly the implementation using groups is not the most performant although it's clean and concise. Anyway it's good enough when processing less than 1000 elements. That makes it quite good candidate for initial implementation.

Thursday, 15 July 2010

Tech Tip: How to use fast scala compiler (fsc) from IntelliJ IDEA

Using fsc from IDEA is supposed to be trivial task. IDEA provides dedicated fsc "Run Configuration" out-of-the-box. Unfortunately in reality things are getting more complicated. Scala build in IDEA is hanging more or less often making it not usable.

It took me several hours to figure out what is going on there. I was trying with dozen of switches without success. Finally I've found this simple answer and here it is.

When compiler server is starting it opens random port to listen for commands from compiler clients. Obviously the clients must know that port number. To communicate that server is creating file named as the port number. The file is created under JVM temporary directory ${java.io.tmpdir}/scala-devel/ ... (/tmp/scala-devel/{username}/scalac-compile-server-port/ at my linux box). Problems begin if there are some old unused files left in that directory. So the solution is easy - delete ${java.io.tmpdir}/scala-devel/ before starting compiler server.

I've created simple bash script to do that. I simply use it instead of starting fsc from IDEA.

Monday, 7 December 2009

"Analysis Patterns 2 - Work in Progress"

I was about to buy Analysis Patterns book by Martin Fowler but recently I found on his web page that he's working on version two of this book, thus I've decided to wait a bit. He's also kindly publishing his work so waiting won't hurt so much.

Why I won't use Qi4j (yet)

Several months ago I've published short series about Qi4j basics. I was impressed not only by the long list of interesting features, but also by the overall concept of the application architecture that Qi4j offers. Unfortunately I've recognized it's very difficult for me to write an application using Qi4j quickly and here is why. 1. Mixins. When Mixins are very useful feature the way they are implemented in Qi4j doesn't work for me. I had to write a lot of nested classes and was loosing my focus because of this. I'm wondering what was theirs motivation to create this framework in Java not in Scala which has traits already built-in. 2. Lack of good relational DB support. Another obstacle is that it's strongly focused on the object oriented data stores. Well, that's not bad actually, probably in the future we'll use them mostly. But today when at work I've to deal with relational databases this is serious disadvantage. Hiding these relational databases behind anti-corruption layers would work but it also makes usage of Qi4j's UnitOfWork impossible. 3. Testability. Mocking interfaces with properties modeled with Property<T> needs endless effort. (Maybe I didn't find the right tool to do this) 4. Poor documentation. Why the Spring framework is so popular? Because it's very well documented and its codebase is fairly readable - that's for sure one of the reasons. Nothing more to add here. Anyway it's not as bad as it might look like. I appreciate Qi4j's developers for their innovating approach to software development and I'm tracking theirs mailing list carefully as there is a lot of interesting discussions I can learn from. Qi4j is definitely worth to give it a try also to feel the taste of BDD for a while. It's amazing in how many different ways it's possible to write an application in Java.

Friday, 23 October 2009

Google collections

Google collections is very interesting, modern library which supplements standard library java.util.collections. It focuses on code simplicity and safety (immutability). And it brings some flavours of functional programming into Java land what I hope will help the developers evolve into more functional languages like Scala smoothly. Here is a nice 10 minutes long introduction.

Wednesday, 30 September 2009

Qi4j - First mixin

Qi4j LogoThis is my third post about Qi4j basics. In this post I'll show you how to implement behavior and wire it together with the code I already wrote for the previous post. As you probably remember Qi4j manages composites. Composite is defined as a composition of fragments. Fragment is a piece of code responsible for concrete functionality. Each fragment can be categorized either as a mixin or modifier. Mixin's role is to provide an implementation for composite's methods. At the moment Rational composite has only properties. So let's add some behavior methods. (Mixins are not limited only to provide behavior implementation, in fact properties are managed by predefined mixin too) Step 1. Extend Rational interface.
import org.qi4j.api.property.Property;
import org.qi4j.api.value.ValueComposite;

public interface Rational extends ValueComposite {

    Property<Integer> nominator();

    Property<Integer> denominator();

    Rational add(Rational toAdd);

    Rational mul(Rational toMultiply);
}
The highlighted lines show new behavior methods. If you try to run the application you will see the exception with the following message:
No implementation found for method public abstract com.blogspot.javasnippet.qi4j.Rational com.blogspot.javasnippet.qi4j.Rational.add(com.blogspot.javasnippet.qi4j.Rational) in com.blogspot.javasnippet.qi4j.Rational
That's because Qi4j doesn't see any implementation for these new methods. Step 2. Implement mixin.
public interface Rational extends ValueComposite {

    Property<Integer> nominator();

    Property<Integer> denominator();

    Rational add(Rational toAdd);

    Rational mul(Rational toMultiply);

    abstract class RationalMixin implements Rational {

        @Override
        public Rational add(Rational toAdd) {
            return null;
        }

        @Override
        public Rational mul(Rational toMultiply) {
            return null;
        }
    }
}
Things worth to be noted:
  • RationalMixin class is defined inside Rational interface - although it's not mandatory and is done here for simplicity there are cases (mainly for modifiers) where it makes sense to put implementation directly into declaring interface.
  • RationalMixin class is abstract - as I want to implement only behavior methods I have to make this class abstract. This way it's possible to even split single interface implementation into set of mixins. Qi4j proxies and reflection does the rest.
Qi4j still is not aware of this stub implementation and crashes by throwing this same exception as before. Step 3. Register mixin. To register mixin I have to annotate interface Rational with @Mixins annotation like below.
@Mixins(Rational.RationalMixin.class)
public interface Rational extends ValueComposite {
...
}
Now the application is running fine - Qi4j sees implementation for all declared methods. Step 4. Accessing composite state. UPDATE: It's no longer necessary to inject state to the mixin. See p.4.1 Rational extends ValueComposite which in turn extends Composite interface. If you look at source code of Composite interface then you will see it declares usage of PropertyMixin - that's the one which is responsible for properties state management. So our Rational is composed out of two fragments: PropertyMixin and RationalMixin. To implement methods inside the RationalMixin we need to access the state of this Rational composite instance. Qi4j solves this by dependency injection. See highlighted lines below.
    abstract class RationalMixin implements Rational {

        @This
        private Rational state;

        @Override
        public Rational add(Rational toAdd) {
            return null;
        }

        @Override
        public Rational mul(Rational toMultiply) {
            return null;
        }
    }
According to the JavaDoc, annotation @This "denotes the injection of a reference to the same Composite as the fragment is a part of". Then the method add() can be implemented as:
        public Rational add(Rational toAdd) {
            int n1 = state.nominator().get();
            int d1 = state.denominator().get();
            int n2 = toAdd.nominator().get();
            int d2 = toAdd.denominator().get();

            int d = d1 * d2;
            int n = n1 * d2 + n2 * d1;
            return null;
        }
Step 4.1. Accessing composite state without using @This. I wasn't aware of this feature when I was writing this post. Here is an update. Qi4j routes calls to the abstract methods (in other words - the methods which are not implemented by the mixin) to the composite. So the implementation of RationalMixin can be rewritten as follows:
    abstract class RationalMixin implements Rational {

        @Override
        public Rational add(Rational toAdd) {
            int n1 = this.nominator().get();
            int d1 = this.denominator().get();
            int n2 = toAdd.nominator().get();
            int d2 = toAdd.denominator().get();

            int d = d1 * d2;
            int n = n1 * d2 + n2 * d1;
            return null;
        }

        @Override
        public Rational mul(Rational toMultiply) {
            return null;
        }
    }
Step 5. Accessing module resources. Now it's time to create new Rational instance inside RationalMixin. To do this we need reference to the ValueBuilderFactory. This is solved by the dependency injection as well. Qi4j defines annotation @Structure which "denotes the injection of a resource specific for the module which the injected object/fragment is instantiated in".
    abstract class RationalMixin implements Rational {

        @Structure
        private ValueBuilderFactory builderFactory;

        @Override
        public Rational add(Rational toAdd) {
            int n1 = this.nominator().get();
            int d1 = this.denominator().get();
            int n2 = toAdd.nominator().get();
            int d2 = toAdd.denominator().get();

            int d = d1 * d2;
            int n = n1 * d2 + n2 * d1;

            ValueBuilder<Rational> builder = builderFactory.newValueBuilder(Rational.class);
            builder.prototype().nominator().set(n);
            builder.prototype().denominator().set(d);
            return builder.newInstance();
        }

        @Override
        public Rational mul(Rational toMultiply) {
            return null;
        }
    }
The implementation of method add() is now finished. We can check if it works. Here is new body of method main() which computes 2/3 + 3/4 and prints the result to the console.
    public static void main(final String[] args) throws Exception {
        SingletonAssembler assembler = new SingletonAssembler() {

            @Override
            public void assemble(ModuleAssembly module) throws AssemblyException {
                module.addValues(Rational.class);
            }
        };

        ApplicationSPI application = assembler.application();

        Module module = assembler.module();
        ValueBuilder<Rational> rb;
        rb = module.valueBuilderFactory().newValueBuilder(Rational.class);
        rb.prototype().nominator().set(2);
        rb.prototype().denominator().set(3);
        Rational r1 = rb.newInstance();

        rb = module.valueBuilderFactory().newValueBuilder(Rational.class);
        rb.prototype().nominator().set(3);
        rb.prototype().denominator().set(4);
        Rational r2 = rb.newInstance();

        Rational r3 = r1.add(r2);

        System.out.println(r3.nominator().get() + "/" + r3.denominator().get());
    }
You should see:
17/12
Summary This is last post of my short series about Qi4j basics. Here is short list of things worth to be remembered:
  • Qi4j is composite oriented and manages composites rather than objects.
  • Qi4j aims to help us in easier DDD adoption.
  • Qi4j uses dependency injection.
Qi4j offers much more features than I know about (I'm still learning). If you want to learn more go to its website. I encourage you to give it a try.