记一次Nacos的issue问题解决之并发导致的NPE异常

ISSUE

Spring boot 应用启动被终止 #21

错误分析

DeferredApplicationEventPublisher的继承关系

1
2
3
4
5
6
7
8
9
10
import org.springframework.context.ApplicationContext;
import org.springframework.context.ApplicationEvent;
import org.springframework.context.ApplicationEventPublisher;
import org.springframework.context.ApplicationListener;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.context.event.ContextRefreshedEvent;

public class DeferredApplicationEventPublisher implements ApplicationEventPublisher, ApplicationListener<ContextRefreshedEvent> {
...
}

DeferredApplicationEventPublisher的依赖图

DeferredApplicationEventPublisher的依赖图

现在来分析具体出现NPE错误的原因

先看EventPublishingConfigService中的addListener

1
2
3
4
5
6
@Override
public void addListener(String dataId, String group, Listener listener) throws NacosException {
Listener listenerAdapter = new DelegatingEventPublishingListener(configService, dataId, group, applicationEventPublisher, executor, listener);
configService.addListener(dataId, group, listenerAdapter);
publishEvent(new NacosConfigListenerRegisteredEvent(configService, dataId, group, listener, true));
}

然后看DelegatingEventPublishingListener代码的继承关系

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import com.alibaba.nacos.api.config.ConfigService;
import com.alibaba.nacos.api.config.listener.Listener;
import org.springframework.context.ApplicationEventPublisher;

import java.util.concurrent.Executor;

final class DelegatingEventPublishingListener implements Listener {
DelegatingEventPublishingListener(ConfigService configService, String dataId, String groupId, ApplicationEventPublisher applicationEventPublisher, Executor executor, Listener delegate) {
this.configService = configService;
this.dataId = dataId;
this.groupId = groupId;
this.applicationEventPublisher = applicationEventPublisher;
this.executor = executor;
this.delegate = delegate;
}
}

可以看到,在创建DelegatingEventPublishingListener对象的时候,会传入一个线程池Executor,以及一个ApplicationEventPublisher(其实就是DeferredApplicationEventPublisher

然后再看看CacheData.safeNotifyListener()方法做了什么操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
private void safeNotifyListener(final String dataId, final String group, final String content, final String md5, final ManagerListenerWrap listenerWrap) {
final Listener listener = listenerWrap.listener;
Runnable job = new Runnable() {
public void run() {
ClassLoader myClassLoader = Thread.currentThread().getContextClassLoader();
ClassLoader appClassLoader = listener.getClass().getClassLoader();
try {
if (listener instanceof AbstractSharedListener) {
AbstractSharedListener adapter = (AbstractSharedListener)listener;
adapter.fillContext(dataId, group);
LOGGER.info("[{}] [notify-context] dataId={}, group={}, md5={}", name, dataId, group, md5);
}
// 执行回调之前先将线程classloader设置为具体webapp的classloader,以免回调方法中调用spi接口是出现异常或错用(多应用部署才会有该问题)。
Thread.currentThread().setContextClassLoader(appClassLoader);

ConfigResponse cr = new ConfigResponse();
cr.setDataId(dataId);
cr.setGroup(group);
cr.setContent(content);
configFilterChainManager.doFilter(null, cr);
String contentTmp = cr.getContent();
listener.receiveConfigInfo(contentTmp);
listenerWrap.lastCallMd5 = md5;
LOGGER.info("[{}] [notify-ok] dataId={}, group={}, md5={}, listener={} ", name, dataId, group, md5,
listener);
} catch (NacosException de) {
LOGGER.error("[{}] [notify-error] dataId={}, group={}, md5={}, listener={} errCode={} errMsg={}", name,
dataId, group, md5, listener, de.getErrCode(), de.getErrMsg());
} catch (Throwable t) {
LOGGER.error("[{}] [notify-error] dataId={}, group={}, md5={}, listener={} tx={}", name, dataId, group,
md5, listener, t.getCause());
} finally {
Thread.currentThread().setContextClassLoader(myClassLoader);
}
}
};

final long startNotify = System.currentTimeMillis();
try {
if (null != listener.getExecutor()) {
listener.getExecutor().execute(job);
} else {
job.run();
}
}
...
}

这里看到,safeNotifyListener是将事件广播给所有的Listener,然后有一段及其重要的代码段,它就是导致LinkedList出现并发使用的原因

1
listener.getExecutor().execute(job);

这里还记得刚刚说过的DelegatingEventPublishingListener对象在创建之初有传入Executor参数吗?这里Listener调用Executor将上述的任务调入线程池中进行调度,因此,导致了DeferredApplicationEventPublisher可能存在并发的使用

错误复现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
public class DeferrNPE {

private static LinkedList<String> list = new LinkedList<>();

private static CountDownLatch latch = new CountDownLatch(3);
private static CountDownLatch start = new CountDownLatch(3);

private static class MyListener implements Runnable {

@Override
public void run() {
start.countDown();
try {
start.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
list.add(String.valueOf(System.currentTimeMillis()));
latch.countDown();
}
}

public static void main(String[] args) {
MyListener l1 = new MyListener();
MyListener l2 = new MyListener();
MyListener l3 = new MyListener();
new Thread(l1).start();
new Thread(l2).start();
new Thread(l3).start();
try {
latch.await();
Iterator<String> iterator = list.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next());
iterator.remove();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}

}

最终修正

由于是非线程安全使用在并发的场景下,因此只能更改上层nacos-spring-context的容器使用,将原先的非线程安全的LinkedList转为线程安全的ConcurrentLinkedQueue