JedisPool又搞出了故障?

故障简述

公司在访问Redis时使用了JedisPool。当Redis实例不可达时,会将该实例放入黑名单。后台线程周期性扫描黑名单列表,如果可达,则恢复。在检测时会新建新的JedisPool,通过jedisPool.getResource().close()的方式检测可达性。由于是周期性检测,每次检测都会new一个新的JedisPool,而且在创建JedisPool时,配置了minIdle为1。这样就埋下隐患。如果Redis长时间不可达,会new很多的JedisPool,当Redis恢复时,由于JedisPool有后台的周期性驱逐线程(如果连接长时间空闲,则销毁;为保证该pool内有足够minIdle连接,又会创建新的连接),这样会创建大量的连接。达到Redis的最大连接数限制,正常请求的连接会收到服务端返回的ERR max number of clients reached错误而抛出异常。注意,虽然客户端收到了错误,但是对于客户端而言连接是建立上了,客户端将请求发送到了服务端,在读取服务端请求的返回值时,服务端返回了ERR max number of clients reached错误。对于Redis服务端而言,对于造成服务端达到“最大连接数限制”的连接,服务端会直接关闭。

Caused by: redis.clients.jedis.exceptions.JedisDataException: ERR max number of clients reached
        at redis.clients.jedis.Protocol.processError(Protocol.java:130)
        at redis.clients.jedis.Protocol.process(Protocol.java:164)
        at redis.clients.jedis.Protocol.read(Protocol.java:218)
        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:341)
        at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:277)
        at redis.clients.jedis.BinaryJedis.mget(BinaryJedis.java:606)
复制代码

有个疑问:
为什么日志中还有写失败的请求呢?不应该是正常建立的那些连接,可以正常写数据吗?因为被“达到最大连接数异常”的连接已经被回收了,不可能再被客户端使用了。难道服务端有清理连接的逻辑?

Caused by: java.net.SocketException: Connection reset by peer (Write failed)
        at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
        at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
        at redis.clients.util.RedisOutputStream.flushBuffer(RedisOutputStream.java:52)
        at redis.clients.util.RedisOutputStream.flush(RedisOutputStream.java:216)
        at redis.clients.jedis.Connection.flush(Connection.java:332)
        ... 30 more

复制代码

驱逐线程

驱逐线程的创建


/**
 * Create a new <code>GenericObjectPool</code> using a specific
 * configuration.
 *
 * @param factory   The object factory to be used to create object instances
 *                  used by this pool
 * @param config    The configuration to use for this pool instance. The
 *                  configuration is used by value. Subsequent changes to
 *                  the configuration object will not be reflected in the
 *                  pool.
 */
public GenericObjectPool(PooledObjectFactory<T> factory,
        GenericObjectPoolConfig config) {
    // 还记得之前的JMX问题吗?
    super(config, ONAME_BASE, config.getJmxNamePrefix());

    if (factory == null) {
        jmxUnregister(); // tidy up
        throw new IllegalArgumentException("factory may not be null");
    }
    this.factory = factory;

    idleObjects = new LinkedBlockingDeque<PooledObject<T>>(config.getFairness());

    setConfig(config);
    // 这里开启驱逐线程
    startEvictor(getTimeBetweenEvictionRunsMillis());
}
复制代码

可以看到,驱逐线程是在构造函数中创建开启的。也就是说,每new一个JedisPool都会有一个对应的驱逐线程在周期性执行。
回忆一下,也是在这个构造函数里往JMX进行了注册,并引发了另外一个问题: new JedisPool可能会很慢

驱逐线程的实现


/**
 * <p>Starts the evictor with the given delay. If there is an evictor
 * running when this method is called, it is stopped and replaced with a
 * new evictor with the specified delay.</p>
 *
 * <p>This method needs to be final, since it is called from a constructor.
 * See POOL-195.</p>
 *
 * @param delay time in milliseconds before start and between eviction runs
 */
final void startEvictor(long delay) {
    synchronized (evictionLock) {
        if (null != evictor) {
            EvictionTimer.cancel(evictor);
            evictor = null;
            evictionIterator = null;
        }
        if (delay > 0) {
            evictor = new Evictor();
            EvictionTimer.schedule(evictor, delay, delay);
        }
    }
}
复制代码

注释写的很清楚,两点:

  • 如果驱逐任务已经被创建,那么就取消。
    • 这种情况,delay参数一般是-1,仅仅是取消驱逐任务,而不开启新的驱逐任务。
    • 想一下,在coding过程中,取消过吗?如果没有,有啥问题?
  • 如果没有驱逐任务,那么按照周期调度驱逐任务。
    • 周期默认是30s。

驱逐周期的说明


public static final long DEFAULT_TIME_BETWEEN_EVICTION_RUNS_MILLIS = -1L;

private volatile long timeBetweenEvictionRunsMillis =
            BaseObjectPoolConfig.DEFAULT_TIME_BETWEEN_EVICTION_RUNS_MILLIS;

/**
 * Returns the number of milliseconds to sleep between runs of the idle
 * object evictor thread. When non-positive, no idle object evictor thread
 * will be run.
 *
 * @return number of milliseconds to sleep between evictor runs
 *
 * @see #setTimeBetweenEvictionRunsMillis
 */
public final long getTimeBetweenEvictionRunsMillis() {
    return timeBetweenEvictionRunsMillis;
}
复制代码

注释写的也很清楚:如果是非正数(包括负数或0),那么就不会有空闲对象的驱逐线程被创建。

可以看到上面的默认值是-1,也就是不开启驱逐线程。但是JedisPoolConfig却给出了JedisPool的默认值:

public class JedisPoolConfig extends GenericObjectPoolConfig {
  public JedisPoolConfig() {
    // defaults to make your life with connection pool easier :)
    setTestWhileIdle(true);
    setMinEvictableIdleTimeMillis(60000);
    setTimeBetweenEvictionRunsMillis(30000);
    setNumTestsPerEvictionRun(-1);
  }
}
复制代码

上面的注释说:这些默认值会使得你连接池的生命周期更容易。这个life是连接池的还是coder的life?

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享