Skip to content

Remove synchronized methods in favour of ConcurrentHashMap using#76

Closed
dmitry-grytsovets wants to merge 6 commits intomsgpack:masterfrom
dmitry-grytsovets:master
Closed

Remove synchronized methods in favour of ConcurrentHashMap using#76
dmitry-grytsovets wants to merge 6 commits intomsgpack:masterfrom
dmitry-grytsovets:master

Conversation

@dmitry-grytsovets
Copy link
Copy Markdown

No description provided.

@oza
Copy link
Copy Markdown
Member

oza commented Apr 27, 2014

Thank you for the contribution, @dgreenru! IIUC, TemplateRegistry#buildAndRegister cannot be synchronized-free because it includes check-then-act behavior like this:

    if (cache.containsKey(targetClass)) {
        oldTmpl = cache.get(targetClass);
    }

About Introspector, we should not change it because there is no compatibility between WeakHashMap and ConcurrentHashMap - reference strength is different. Please let me know if I get wrong. Again, thanks for the contribution!

@oza oza closed this Apr 27, 2014
@oza
Copy link
Copy Markdown
Member

oza commented Apr 27, 2014

Please feel free to reopen this issue if you have any opinion or comments.

@dmitry-grytsovets
Copy link
Copy Markdown
Author

if (cache.containsKey(targetClass)) { oldTmpl = cache.get(targetClass); }
I don't understand why you couldn't use this code with ConcurrentHashMap. If you would avoid concurrent template creation you should set synchronization in point where it makes. If you read java memory model you should know what synchronized block will do. I think that better lock for this case is StampedLock.

And this code should looks like

Template tmpl = cache.get(targetClass);
if(tmpl != null) { oldTmpl = tmpl;}

cause you shouldn't do key find twice.

And about WeakHashMap. I can't imagine application where message converter the main memory consumer.

@oza
Copy link
Copy Markdown
Member

oza commented Apr 28, 2014

I think that better lock for this case is StampedLock.

I know that synchronized blocks are very heavy operation in Java memory model. Therefore, I agree with you generally - lock granuality should be smaller. However I'd like to crarify some points in this case. I'm still not certain whether TemplateRegistry can be really bottleneck. The registration runs only at initialization time. Or is register() method called again and again from multiple threads in your use case? One alternative is to have thread-local MessegePack object. Is it not enough for you?

And this code should looks like

Template tmpl = cache.get(targetClass);
if(tmpl != null) { oldTmpl = tmpl;}

cause you shouldn't do key find twice.

+1. We can refactor the code.

About WeakHashMap, I'll check for more detail in a few days.

@dmitry-grytsovets
Copy link
Copy Markdown
Author

I think that high load app can use thousands of threads so ThreadLocal is not a good way to store MessagePack object. Also synchronized block will perform cpu cache synchronization if any changes was made by another cpu ( happens before and so on).

Message conversion is a simple operation so it should be lightweight as possible. A think that
code which lookup for templates should looks like the code below

ConcurrentHashMap templates;
...
Template template = templates.get(type);
if(template == null) {
    // create the lightweight object which will do init and put self in map
    //wrapper will change self to initilized object when init will be completed
    Template wrapper = new TemplateWrapper(templates, type);
    //putIfAbsent will put object only if it not exists
    template = (template = templates.putIfAbsent(type, wrapper)) == null?wrapper:template;
}

And wrapper should works like this

TemplateWrapper extends Template {
    Type type;
    ConcurrentHashMap templates;
    volitile Template template;

    public void doSomething() {
          if(template != null) {
              template.doSomething();          
          } else {
              init(); 
              template.doSomething();   
          }
    }
    private synchronized void init() {
        if(template != null) {
            //create template in local variable cause other thread can see not fully initialized instance
            Template instance = new TemplateImpl(type);
            templates.put(type, instance);
            template = instance;
        }
    }
}

@dmitry-grytsovets
Copy link
Copy Markdown
Author

The bottleneck is

 public synchronized Template lookup(Type targetType)

cause it calls N times over each read\write operation

@oza oza reopened this Apr 29, 2014
@dmitry-grytsovets
Copy link
Copy Markdown
Author

I've changed my pull request to minor modification which only removes synchronized from lookup.
Thanks for your time.

@komamitsu
Copy link
Copy Markdown
Member

I'm interested in the performance improvement of this change. I've not finished reviewing your commits yet, though.

@dgreenru Can you show me the result of the benchmark if you have?

@dmitry-grytsovets
Copy link
Copy Markdown
Author

I haven't any benchmark.
I use msgpack for https://github.com/dgreenru/tarantool-java.

Synchronized method will always produce locks in multithread env.
For example if N threads simply read stream and using map to get template by type they will wait for concurrent threads read.

You can look at https://github.com/msgpack/msgpack-java/pull/76/files cause I reverted my first commit and create small fix which should remove synchronization only from frequently called method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without synchronized keyword, a thread can call register() or unregister() while another thread is executing this method. Did you consider those cases already?

As you know, this method calls register() and cache.get() at some points. So we need to take account of such as the case that thread#A calls unregister() while thread#B is executing in the middle of lookupInterfaceTypes().

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed HashMap to ConcurrentHashMap which is fully thread safe. So you can remove all synchronized methods over the maps if you need :) You can look into the ConcurrentHashMap source code where you will find locked write and optimistic read with locked fallback :)

@komamitsu
Copy link
Copy Markdown
Member

Well, I don't worry about the atomicity of ConcurrentHashMap itself, but of lookup()...

BTW, I think it might be not profitable to merge this change if it doesn't improve the performance significantly. So can you benchmark the performance of this change in your use case?

Of course, I hope your fix improves the performance because I use this library heavily. Thanks.

@dmitry-grytsovets
Copy link
Copy Markdown
Author

I've made performance test

package org.msgpack;

import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Semaphore;


public class Test {

    public static void main(String[] args) throws InterruptedException {
        int cpus = Runtime.getRuntime().availableProcessors();
        ExecutorService exec = Executors.newFixedThreadPool(cpus);
        final MessagePack mpack = new MessagePack();
        final List message = Arrays.asList(1,2,3,4,5);
        try {
            mpack.read(mpack.write(message));
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }

        final Semaphore in = new Semaphore(0);
        final Semaphore out = new Semaphore(0);

        Runnable runnable = new Runnable() {
            @Override
            public void run() {
                try {
                    in.acquire();
                    for (int i = 0; i < 1000000; i++) {
                        mpack.read(mpack.write(message));
                    }
                } catch (Exception e) {
                    throw new IllegalStateException(e);
                } finally {
                    out.release();
                }
            }
        };

        for (int i = 0; i < cpus; i++) {
            exec.execute(runnable);
        }
        in.release(cpus);
        long start = System.currentTimeMillis();
        out.acquire(cpus);
        System.out.println("Cpus: "+Runtime.getRuntime().availableProcessors()+" Total time "+(System.currentTimeMillis() - start));
        exec.shutdown();


    }
}

syncronized version: Cpus: 8 Total time 15704
not synchronized version: Cpus: 8 Total time 5279

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move put to the end of template build cause without synchronization other threads can see not initialized version

@komamitsu
Copy link
Copy Markdown
Member

@dgreenru Thanks! I'll look at it later.

@komamitsu
Copy link
Copy Markdown
Member

@dgreenru I tried to test serialization and deserialization of custom class in your branch and changed your benchmark code like this:

public class Test {
+    @Message
+    static class Foo {}

    public static void main(String[] args) throws InterruptedException {
        :
+        final Foo foo = new Foo();
        Runnable runnable = new Runnable() {
            @Override
            public void run() {
                try {
                    in.acquire();
                    for (int i = 0; i < 1000000; i++) {
                        mpack.read(mpack.write(message));
+                         mpack.read(mpack.write(foo));
                    }

But I ran into the following exception:

Exception in thread "pool-1-thread-5" java.lang.StackOverflowError
    at org.msgpack.template.TemplateReference.validateActualTemplate(TemplateReference.java:46)
    at org.msgpack.template.TemplateReference.write(TemplateReference.java:69)
    at org.msgpack.template.TemplateReference.write(TemplateReference.java:70)
    at org.msgpack.template.TemplateReference.write(TemplateReference.java:70)
    at org.msgpack.template.TemplateReference.write(TemplateReference.java:70)
             :

Also, I executed this test code with msgpack/msgpack-java:origin/master and it worked well.

Can you check it?

@dmitry-grytsovets
Copy link
Copy Markdown
Author

I've executed your test without any problem.
Please ensure that your have latest version from my repository. Do you have

private final CountDownLatch ready = new CountDownLatch(1);

field in TemplateReference?

In my repository https://github.com/dgreenru/msgpack-java/blob/master/src/main/java/org/msgpack/template/TemplateReference.java

@dmitry-grytsovets
Copy link
Copy Markdown
Author

Let me explain the performance test result.

Set TemplateRegistry.lookup call execution time to 1 and number of cpu to 8.

If other serialization and reflection operations execution time more than 8 you will have the same execution time for synchronized and not sychronized version cause syncronization only perfom uniform lookup distribution.

Simple example:

cpu = 2
lookup exec time = 1 name = a
other ops time = 2 name = b

synchronized version
thread 1 = [ a b b a b b ]
thread 2 = [ a a b b a b b ]

as you can see thread 2 simple got 1 step penalty

not synchronized version

thread 1 = [ a b b a b b ]
thread 2 = [ a b b a b b ]

I use array and integers in my performance test to reduce other operations execution time and have more obvious result.

@komamitsu
Copy link
Copy Markdown
Member

Please ensure that your have latest version from my repository

I used the latest revision:

commit 5db19c16ca3391dcb6e9e5f29fbacee1483d5175
Author: dmitriy.grytsovets <dmitriy.grytsovets@odnoklassniki.ru>
Date:   Thu May 1 01:08:06 2014 +0400

    Change wait notify to countdown latch

One question. Why did you add 5db19c1 in your changes though you just only removed synchronized keyword from TemplateRegistry#lookup()?

@dmitry-grytsovets
Copy link
Copy Markdown
Author

As you can see in TemplateRegistry#buildAndRegister

cache.put(targetClass, ref);

made before actual template is ready. That was done to resolve circular dependencies. So other threads should wait for actual template when trying to use reference.

I've added you as collabrator for my forked version can you push your test code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants