Server crashes after closesocket

I have multithreading application, it's periodically polling a few hundred devices. Each thread serves one device, its socket and other descriptors are encapsulated at individual object, so no shared descriptors. Occasionally application crashes after closesocket(fSock), when I try set descriptor fSock to 0.

I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR. Or is there any other reason?

My code:

bool _EthDev::Connect()
{
    int sockErr, ret, i, j;
    int szOut = sizeof(sockaddr_in);

    // create socket
    if ((fSock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
    {
        sockErr = GetLastError();
        Log("Invalid socket err %d", sockErr);
        fSock = 0;
        return false;
    }

    // set fast closing socket (by RST)
    linger sLinger;
    sLinger.l_onoff = 1;
    sLinger.l_linger = 0;
    if (sockErr = setsockopt(fSock, SOL_SOCKET, SO_LINGER, (const char FAR*)&sLinger, sizeof(linger)))
    {
        sockErr = WSAGetLastError();
        Log("Setsockopt err %d", sockErr);
        closesocket(fSock);
        fSock = 0;          // here crashes
        return false;
    }

    // connect to device
    fSockaddr.sin_port = htons((u_short)(baseport));
    if (connect(fSock, (struct sockaddr*)&fSockaddr, szOut))
    {
        closesocket(fSock);
        fSock = 0;
        return false;
    }

    ...

    return true;
}

Solved

I have multithreading application, ... [it] occasionally crashes

A multithreading application that occasionally crashes is a classic symptom of a race condition. I think to prevent the crashes you need to figure out what the race condition is in your code, and fix that.

I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR. Or is there any other reason?

I doubt the problem is actually related to closesocket() or to setting fSock to 0. Keep in mind that sockets are really just integers, and setting an integer to 0 isn't likely to cause a crash on its own. What could cause a crash is a write to invalid memory -- and fSock = 0 does write to the memory location where the member variable fSock is (or was) located at.

Therefore, a more likely hypothesis is that the _EthDev object got deleted by thread B while thread A was still in the middle of calling Connect() on it. This would be most likely happen while the connect() call was executing, because a blocking connect() call can take a relatively long time to return. So if there was another thread out there that rudely deleted the _EthDev object during the connect() call, then as soon as connect() returned, the next line of code that would write to the location where the (now deleted) _EthDev object used to be would be the "fSock = 0;" line, and that could cause a crash.

I suggest you review your code that deletes _EthDev objects, and if it isn't careful to first shut down any thread(s) using those objects (and also to wait for the threads to exit!) before deleting the _EthDev objects, you should rewrite it so that it does so reliably. Deleting an object while another thread might still be using it is asking for trouble.


Comments

Popular posts from this blog

OpenCV return keypoints coordinates and area from blob detection, Python

.NET File.WriteAllLines leaves empty line at the end of file

How can I update multiple items with a shared customer number on a sharepoint list with AJAX?