Server crashes after closesocket
I have multithreading application, it's periodically polling a few hundred devices. Each thread serves one device, its socket and other descriptors are encapsulated at individual object, so no shared descriptors. Occasionally application crashes after closesocket(fSock), when I try set descriptor fSock to 0.
I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR. Or is there any other reason?
My code:
bool _EthDev::Connect()
{
int sockErr, ret, i, j;
int szOut = sizeof(sockaddr_in);
// create socket
if ((fSock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
{
sockErr = GetLastError();
Log("Invalid socket err %d", sockErr);
fSock = 0;
return false;
}
// set fast closing socket (by RST)
linger sLinger;
sLinger.l_onoff = 1;
sLinger.l_linger = 0;
if (sockErr = setsockopt(fSock, SOL_SOCKET, SO_LINGER, (const char FAR*)&sLinger, sizeof(linger)))
{
sockErr = WSAGetLastError();
Log("Setsockopt err %d", sockErr);
closesocket(fSock);
fSock = 0; // here crashes
return false;
}
// connect to device
fSockaddr.sin_port = htons((u_short)(baseport));
if (connect(fSock, (struct sockaddr*)&fSockaddr, szOut))
{
closesocket(fSock);
fSock = 0;
return false;
}
...
return true;
}
Solved
I have multithreading application, ... [it] occasionally crashes
A multithreading application that occasionally crashes is a classic symptom of a race condition. I think to prevent the crashes you need to figure out what the race condition is in your code, and fix that.
I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR. Or is there any other reason?
I doubt the problem is actually related to closesocket() or to setting fSock to 0. Keep in mind that sockets are really just integers, and setting an integer to 0 isn't likely to cause a crash on its own. What could cause a crash is a write to invalid memory -- and fSock = 0 does write to the memory location where the member variable fSock is (or was) located at.
Therefore, a more likely hypothesis is that the _EthDev object got deleted by thread B while thread A was still in the middle of calling Connect() on it. This would be most likely happen while the connect() call was executing, because a blocking connect() call can take a relatively long time to return. So if there was another thread out there that rudely deleted the _EthDev object during the connect() call, then as soon as connect() returned, the next line of code that would write to the location where the (now deleted) _EthDev object used to be would be the "fSock = 0;" line, and that could cause a crash.
I suggest you review your code that deletes _EthDev objects, and if it isn't careful to first shut down any thread(s) using those objects (and also to wait for the threads to exit!) before deleting the _EthDev objects, you should rewrite it so that it does so reliably. Deleting an object while another thread might still be using it is asking for trouble.
Comments
Post a Comment