We have configured java web app with tomcat on Microsoft Azure which Store/Read data from MongoDB which is located on Amazon EC2. We have a lot of traffic between EC2 and Azure – hundreds of threads doing actions each our. Our MongoDB config was (we use java driver):
autoConnectRetry = true connectTimeout = 10000 connectionsPerHost = 100 threadsAllowedToBlockForConnectionMultiplier = 50 socketKeepAlive = true socketTimeout = 60000
From time to time on our server we got
Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at org.bson.io.Bits.readFully(Bits.java:46) at org.bson.io.Bits.readFully(Bits.java:33) at org.bson.io.Bits.readFully(Bits.java:28) at com.mongodb.Response.<init>(Response.java:40) at com.mongodb.DBPort.go(DBPort.java:124) at com.mongodb.DBPort.call(DBPort.java:74) at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:286)
The reason of this is that Azure has global firewall which kill all idle connections in pool that are older then about 3 min – did not find any docs about this, just empirically got it.
Fix can be done in two ways:
- Via java driver we can set max idle time for connections and destroy them ourselves. This is already requested for MongoDB Java driver https://jira.mongodb.org/browse/JAVA-710 but still not done
- On Linux server we can configure ‘tcp_keepalive_time’, so that connections will be destroyed by system. We use 60 seconds live period for them.
sudo bash -c 'echo 60 > /proc/sys/net/ipv4/tcp_keepalive_time'
Restart HTTPD after this and problem gone. Of course, you need to understand that in any case single connection should not live more then 60 sec, and if you need this then better to move from Azure