Unix Socket Programming

 click here complete Lecture Notes: Computer Networks

Unix Socket Programming


Client Server Architecture

In the client server architecture, a machine(refered as client) makes a request to connect to another machine (called as server) for providing some service. The services running on the server run on known ports(application identifiers) and the client needs to know the address of the server machine and this port in order to connect to the server. On the other hand, the server does not need to know about the address or the port of the client at the time of connection initiation. The first packet which the client sends as a request to the server contains these informations about the client which are further used by the server to send any information. Client acts as the active device which makes the first move to establish the connection whereas the server passively waits for such requests from some client.

Illustration of Client Server Model

What is a Socket ?

In unix, whenever there is a need for inter process communication within the same machine, we use mechanism like signals or pipes(named or unnamed). Similarly, when we desire a communication between two applications possibly running on different machines, we need sockets. Sockets are treated as another entry in the unix open file table. So all the system calls which can be used for any IO in unix can be used on socket. The server and client applications use various system calls to conenct which use the basic construct called socket. A socket is one end of the communication channel between two applications running on different machines.

Steps followed by client to establish the connection:
  1. Create a socket
  2. Connect the socket to the address of the server
  3. Send/Receive data
  4. Close the socket
Steps followed by server to establish the connection:
  1. Create a socket
  2. Bind the socket to the port number known to all clients
  3. Listen for the connection request
  4. Accept connection request
  5. Send/Receive data

Basic data structures used in Socket programming

Socket Descriptor: A simple file descriptor in Unix.
        int
Socket Address: This construct holds the information for socket address
    struct sockaddrs {
        unsigned short    sa_family;    // address family, AF_xxx or PF_xxx
        char              sa_data[14];  // 14 bytes of protocol address
    }; 

AF stands for Address Family and PF stands for Protocol Family. In most modern implementations only the AF is being used. The various kinds of AF are as follows:
              Name                   Purpose                 
       AF_UNIX, AF_LOCAL      Local communication              
       AF_INET                IPv4 Internet protocols        
       AF_INET6               IPv6 Internet protocols
       AF_IPX                 IPX - Novell protocols
       AF_NETLINK             Kernel user interface device    
       AF_X25                 ITU-T X.25 / ISO-8208 protocol 
       AF_AX25                Amateur radio AX.25 protocol
       AF_ATMPVC              Access to raw ATM PVCs
       AF_APPLETALK           Appletalk                      
       AF_PACKET              Low level packet interface     
In all the sample programs given below, we will be using AF_INET.
struct sockaddr_in: This construct holds the information about the address family, port number, Internet address,and the size of the struct sockaddr.
    struct sockaddr_in {
        short int          sin_family;  // Address family
        unsigned short int sin_port;    // Port number
        struct in_addr     sin_addr;    // Internet address
        unsigned char      sin_zero[8]; // Same size as struct sockaddr
    }; 


Some systems (like x8086) are Little Endian i-e. least signficant byte is stored in the higher address, whereas in Big endian systems most significant byte is stored in the higher address. Consider a situation where a Little Endian system wants to communicate with a Big Endian one, if there is no standard for data representation then the data sent by one machine is misinterpreted by the other. So standard has been defined for the data representation in the network (called Network Byte Order) which is the Big Endian. The system calls that help us to convert a short/long from Host Byte order to Network Byte Order and viceversa are
  • htons() -- "Host to Network Short"
  • htonl() -- "Host to Network Long"
  • ntohs() -- "Network to Host Short"
  • ntohl() -- "Network to Host Long"

IP addresses

Assuming that we are dealing with IPv4 addresses, the address is a 32bit integer. Remembering a 32 bit number is not convenient for humans. So, the address is written as a set of four integers seperated by dots, where each integer is a representation of 8 bits. The representation is like a.b.c.d, where a is the representation of the most significant byte. The system call which converts this representation into Network Byte Order is:
int inet_aton(const char *cp, struct in_addr *inp);
inet_aton() converts the Internet host address cp from the standard numbers-and-dots notation into binary data and stores it in the structure that inp points to. inet_aton returns nonzero if the address is valid, zero if not.
For example, if we want to initialize the sockaddr_in construct by the IP address and desired port number, it is done as follows:
        struct sockaddr_in sockaddr;
        sockaddr.sin_family = AF_INET;
        sockaddr.sin_port = htons(21);
        inet_aton("172.26.117.168", &(sockaddr.sin_addr)); 
        memset(&(sockaddr.sin_zero), '\0', 8);
    

Socket System Call

A socket is created using the system call:
int socket( domain , type , protocol);
This system call returns a Socket Descriptor (like file descriptor) which is an integer value. Details about the Arguments:
  1. Domain: It specifies the communication domain. It takes one of the predefined values described under the protocol family and address family above in this lecture.
  2. Type: It specifies the semantics of communication , or the type of service that is desired . It takes the following values:
    • SOCK_STREAM : Stream Socket
    • SOCK_DGRAM : Datagram Socket
    • SOCK_RAW : Raw Socket
    • SOCK_SEQPACKET : Sequenced Packet Socket
    • SOCK_RDM : Reliably Delivered Message Packet
  3. Protocol: This parameter identifies the protocol the socket is supposed to use . Some values are as follows:
    • IPPROTO_TCP : For TCP (SOCK_STREAM)
    • IPPROTO_UDP : For UDP (SOCK_DRAM)
    Since we have only one protocol for each kind of socket, it does not matter if we do not define any protocol at all. So for simplicity, we can put "0" (zero) in the protocol field.

Bind System Call

The system call bind associates an address to a socket descriptor created by socket.
int bind  ( int sockfd  ,  struct sockaddr *myaddr  ,  int addrlen );
The second parameter myaddr specifies a pointer to a predefined address of the socket.Its structure is a general address structure so that the bind system call can be used by both Unix domain and Internet domain sockets.

Other System Calls and their Functions

LISTEN : Annoumce willingness to accept connections ; give queue size.
ACCEPT : Block the caller until a commwction attempt arrives.
CONNECT : Actively attempt to establish a connection.
SEND : Send some data over the connection.
RECIEVE : Recieve sme data from the connection.
CLOSE : Release the connection.

Client-Server Communication Overview

The analogy given below is often very useful in understanding many such networking concepts. The analogy is of a number of people in a room communicating with each other by way of talking. In a typical scenario, if A has to talk to B, then he would call out the name of B and only if B was listening would he respond. In case B responds, then one can say that a connection has been established. Henceforth until both of them desire to communicate, they can carry out their conversation. A Client-Server architecture generally employed in networks is also very similar in concept. Each machine can act as a client or a server.
Server: It is normally defined which provides some sevices to the client programs. However, we will have a deeper look at the concept of a "service" in this respect later. The most important feature of a server is that it is a passive entiry, one that listens for request from the clients.
Client: It is the active entity of the architecture, one that generated this request to connect to a particular port number on a particular server
Communication takes the form of the client process sending a message over the network to the server process. The client process then waits for a reply message. When the server process gets the request, it performs the requested work and sends back a reply.The server that the client will try to connect to should be up and running before the client can be executed. In most of the cases, the servers runs continuously as a daemon.
There is a general misconception that servers necessarily provide some service and is therefore called a server. For example an e-mail client provides as much service as an mail server does. Actually the term service is not very well defined. So it would be better not to refer to the term at all. In fact servers can be programmed to do practically anything that a normal application can do. In brief, a server is just an entity that listens/waits for requests.
To send a request, the client needs to know the address of the server as well as the port number which has to be supplied to establish a connection. One option is to make the server choose a random number as a port number, which will be somehow conveyed to the client. Subsequently the client will use this port number to send requests. This method has severe limitations as such information has to be communicated offline, the network connection not yet being established. A better option would be to ensure that the server runs on the same port number always and the client already has knowledge as to which port provides which service. Such a standardization already exists. The port numbers 0-1023 are reserved for the use of the superuser only. The list of the services and the ports can be found in the file /etc/services.

Connection Oriented vs Connectionless Communication

Connection Oriented Communication

Analogous to the telephone network.The sender requests for a communication (dial the number), the receiver gets an indication (the phone ring) the receiver accepts the connection (picks up the phone) and the sender receives the acknowledgment (the ring stops). The connection is established through a dedicated link provided for the communication. This type of communication is characterized by a high level of reliability in terms of the number and the sequence of bytes.

Connectionless Communication

Analogous to the postal service. Packets(letters) are sent at a time to a particular destination. For greater reliability, the receiver may send an acknowledgement (a receipt for the registered letters). Based on this two types of communication, two kinds of sockets are used:
  • stream sockets: used for connection-oriented communication, when reliability in connection is desired.
  • datagram sockets: used for connectionless communication, when reliability is not as much as an issue compared to the cost of providing that reliability. For eg. streaming audio/video is always send over such sockets so as to diminish network traffic.

Sequence of System Calls for Connection Oriented communication

The typical set of system calls on both the machines in a connection-oriented setup is shown in Figure below.
The sequence of system calls that have to be made in order to setup a connection is given below.
  1. The socket system call is used to obtain a socket descriptor on both the client and the server. Both these calls need not be synchronous or related in the time at which they are called.The synopsis is given below:
     
    #include<sys/types.h>
    #include<sys/socket.h>
    int socket(int domain, int type, int protocol);
     
  2. Both the client and the server 'bind' to a particular port on their machines using the bind system call. This function has to be called only after a socket has been created and has to be passed the socket descriptor returned by the socket call. Again this binding on both the machines need not be in any particular order. Moreover the binding procedure on the client is entirely optional. The bind system call requires the address family, the port number and the IP address. The address family is known to be AF_INET, the IP address of the client is already known to the operating system. All that remains is the port number. Of course the programmer can specify which port to bind to, but this is not necessary. The binding can be done on a random port as well and still everything would work fine. The way to make this happen is not to call bind at all. Alternatively bind can be called with the port number set to 0. This tells the operating system to assign a random port number to this socket. This way whenever the program tries to connect to a remote machine through this socket, the operating system binds this socket to a random local port. This procedure as mentioned above is not applicable to a server, which has to listen at a standard predetermined port.
     
  3. The next call has to be listen to be made on the server. The synopsis of the listen call is given below.
     
    #include<sys/socket.h>
    int listen(int skfd, int backlog);
    skfd is the socket descriptor of the socket on which the machine should start listening.
    backlog is the maximum length of the queue for accepting requests.

    The connect system call signifies that the server is willing to accept connections and thereby start communicating.
    Actually what happens is that in the TCP suite, there are certain messages that are sent to and fro and certain initializations have to be performed. Some finite amount of time is required to setup the resources and allocate memory for whatever data structures that will be needed. In this time if another request arrives at the same port, it has to wait in a queue. Now this queue cannot be arbitrarily large. After the queue reaches a particular size limit  no more requests are accepted by the operating system. This size limit is precisely the backlog argument in the listen call and is something that the programmer can set. Today's processors are pretty speedy in their computations and memory allocations. So under normal circumstances the length of the queue never exceeds 2 or 3. Thus a backlog value of 2-3 would be fine, though the value typically used is around 5.Note that this call is different from the concept of "parallel" connections.The established connections are not counted in n. So, we may have 100 parallel connection running at a time when n=5.
     
  4. The connect function is then called on the client with three arguments, namely the socket descriptor, the remote server address and the length of the address data structure. The synopsis of the function is as follows:
    #include<sys/socket.h>
    #include<netinet/in.h> /* only for AF_INET , or the INET Domain */
    int connect(int skfd, struct sockaddr* addr, int addrlen);
    This function initiates a connection on a socket.
    skfd is the same old socket descriptor.
    addr is again the same kind of structure as used in the bind system call. More often than not, we will be creating a structure of the type sockaddr_in instead of sockaddr and filling it with appropriate data. Just while sending the pointer to that structure to the connect or even the bind system call, we cast it into a pointer to a sockaddr structure. The reason for doing all this is that the sockaddr_in is more convenient to use in case of INET domain applications. addr basically contains the port number and IP address of the server which the local machine wants to connect to. This call normally blocks until either the connection is established or is rejected.
    addrlen is the length of the socket address structure, the pointer to which is the second argument.
     
  5. The request generated by this connect call is processed by the remote server and is placed in an operating system buffer, waiting to be handed over to the application which will be calling the accept function. The accept  call is the mechanism by which the networking program on the server receives that requests that have been accepted by the operating system. This synopsis of the accept system call is given below.
    #include<sys/socket.h>
    int accept(int skfd, struct sockaddr* addr, int addrlen);
    skfd is the socket descriptor of the socket on which the machine had performed a listen call and now desires to accept a request on that socket.
    addr is the address structure that will be filled in by the operating system by the port number and IP address of the client which has made this request. This sockaddr pointer can be type-casted to a sockaddr_in pointer for subsequent operations on it.
    addrlen is again the length of the socket address structure, the pointer to which is the second argument.

    This function  accept extracts aconnection on the buffer of pending connections in the system, creates a new socket with the same properties as skfd, and returns a new file descriptor for the socket.

    In fact, such an architecture has been criticized to the extent that the applications do not have a say on what connections the operating system should accept. The system accepts all requests irrespective of which IP, port number they are coming from and which application they are for. All such packets are processed and sent to the respective applications, and it is then that the application can decide what to do with that request.
    The accept call is a blocking system call. In case there are requests present in the system buffer, they will be returned and in case there aren't any, the call simply blocks until one arrives.
    This new socket is made on the same port that is listening to new connections. It might sound a bit weird, but it is perfectly valid and the new connection made is indeed a unique connection. Formally the definition of a connection is
    connection: defined as a 4-tuple : (Local IP, Local port, Foreign IP, Foreign port)
    For each connection at least one of these has to be unique. Therefore multiple connections on one port of the server, actually are different.
     
  6. Finally when both connect and accept return the connection has been established.
     
  7. The socket descriptors that are with the server and the client can now be used identically as a normal I/O descriptor. Both the read and the write calls can be performed on this socket descriptor. The close call can be performed on this descriptor to close the connection. Man pages on any UNIX type system will furnish further details about these generic I/O calls.
     
  8. Variants of read and write also exist, which were specifically designed for networking applications. These are recv and send.
    #include<sys/socket.h>
    int recv(int skfd, void *buf, int buflen, int flags);
    int send(int skfd, void *buf, int buflen, int flags);
    Except for the flags argument the rest is identical to the arguments of the read and write calls. Possible values for the flags are:

    used for
    macro for the flag
    comment





    recv

    MSG_PEEK

    look at the message in the buffer but do not consider it read
    send
    MSG_DONT_ROUTE
    send message only if the destination is on the same network, i.e. directly connected to the local machine.
    recv & send
    MSG_OOB
    used for transferring data out of sequence, when some bytes in a stream might be more important than others.
  9. To close a particular connection the shutdown call can also be used to achieve greater flexibility.
    #include<sys/socket.h>
    int shutdown(int skfd, int how);
    skfd is the socket descriptor of the socket which needs to be closed.
    how can be one of the following:

    SHUT_RD
    or
    0
    stop all read operations on this socket, but continue writing
    SHUT_WR
    or
    1
    stop all write operations on this socket, but keep receiving data
    SHUT_RDWR
    or
    2
    same as close
    A port can be reused only if it has been closed completely

    Multiple Sockets

    Suppose we have a process which has to handle multiple sockets. We cannot simply read from one of them if a request comes, because that will block while waiting on the request on that particular socket. In the meantime a request may come on any other socket. To handle this input/output multiplexing we could use different techniques :
  10. Busy waiting: In this methodology we make all the operations on sockets non-blocking and handle them simultaneously by doing polling. For example, we could use the read() system call this way and read from all the sockets together. The disadvantage in this is that we waste a lot of CPU cycles. To make the system calls non-blocking we use: fcntl (s, f_setfl, fndelay);
  11. Asynchronous I/O: Here we ask the Operating System to tell us whenever we are waiting for I/O on some sockets. The Operating System sends a signal whenever there is some I/O. When we receive a signal, we will have to check all sockets and then wait till the next signal comes. But there are two problems - first, the signals are expensive to catch and second, we would not be able to know if an input comes on a socket when we are doing I/O on another one. For Asynchronous I/O, we have a different set of commands (here we give the ones for UNIX with a VHD variant): signal(sigio, io_handler); fcntl(s, f_setown, getpid()); fcntl(s, f_setfl, fasync);
  12. Separate process for each I/O: We could as well fork out 10 different child processes for 10 different sockets. These child processes are very light weight and have some communication between them. Now these processes waiting on each socket can have blocking system calls. This wastes a lot of memory, data structures and other resources.
  13. Select() system call: We can use the select system call to instruct the Operating System to wait for any one of multiple events to occur and to wake up the process only if one of these events occur. This way we would know that the I/O request has come from which socket. int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout); void FD_CLR(int fd, fd_set *fdset); int FD_ISSET(int fd, fd_set *fdset); void FD_SET(int fd, fd_set *fdset); void FD_ZERO(fd_set *fdset);
    The select() function indicates which of the specified file descriptors is ready for reading, ready for writing, or has an error condition pending. If the specified condition is false for all of the specified file descriptors, select() blocks up to the specified timeout interval, until the specified condition is true for at least one of the specified file descriptors. The nfds argument specifies the range of file descriptors to be tested. The select() function tests file descriptors in the range of 0 to nfds-1. readfds, writefds and errorfds arguments point to an object of type fd_set. readfds specifies the file descriptors to be checked for being ready to read. writefds specifies the file descriptors to be checked for being ready to write, errorfds specifies the file descriptors to be checked for error conditions pending.
    On successful completion, the objects pointed to by the readfds, writefds, and errorfds arguments are modified to indicate which file descriptors are ready for reading, ready for writing, or have an error condition pending, respectively. For each file descriptor less than nfds, the corresponding bit will be set on successful completion if it was set on input and the associated condition is true for that file descriptor. The timeout is an upper bound on the amount of time elapsed before select returns. It may be zero, causing select to return immediately. If the timeout is a null pointer, select() blocks until an event causes one of the masks to be returned with a valid (non-zero) value. If the time limit expires before any event occurs that would cause one of the masks to be set to a non-zero value, select() completes successfully and returns 0.

Reserved Ports

Port numbers from 1-1023 are reserved for the superuser and the rest of the ports starting from 1024 are for other users. But we have a finer division also which is as follows :
  • 1 to 511 - these are assigned to the processes run by the superuser
  • 512 to 1023 - they are used when we want to assign ports to some important user or process but want to show that this is a reserved superuser port
  • 1024 to 5000 - they are system assigned random ports
  • 5000 to FFFF - they are used to assign a port to user processes or sockets used by users

Some Topics in TCP

Acknowledgement Ambiguity Problem

There can be an ambiguous situation during the Retransmission time-out for a packet . Such a case is described below:
The events are :
1. A packet named 'X' is sent at time 't1' for the first time .
2. Timeout occours for 'X' and acknowledgement is not recieved .
3. So 'X' will be retransmitted at time 't2' .
4. The acknowledgment arrives at 't' .
In this case , we cannot be sure wether the acknowledgement recieved at 't' is for the packet 'X' sent at the time 't1' or 't2'. What should be our RTT sample value?

If we take ('t'-'t1' ) as the RTT sample :
    t2-t1 (timeout period) is typically much greater than the average RTT . This implies that if we take t-t1 , then it will tend to increase the average RTT . If this happens for a large number of packets then the RTT will increase significantly. Now as the RTT increases , the timeout value increases and the damage caused by the above sequence of events increases . So this may cause the timeout to become very large unnecessarily .

If we take 't'-'t2' as the RTT sample :
Consider the case in which the acknowledgement for a packet takes a lot more time to arrive as compared to the timeout value . That is , the acknowledgement that arrives is for the packet which had been sent at the time 't1'.
That implies t-t2 is likely to be smaller and hence will tend to decrease the value of the RTT when included as a sample . A string of such events would cause the timeout to decrease . As the timeout decreases , the above sequence of events becomes more probable leading to even a further decrease in timeout . So if we consider t-t2, the timeout may become very small .

Since both the cases may lead to some problems , one possible solution is to discard the sample for which timeout occours . But this can't be done!!If the packet gets lost , the network is most probaaly congested The previous estimate of RTT is now meaningless as it was for an uncongested network and the characteristics of the network have changed Also new samples cant be found due to the above ambiguity .
So we simply adopt the policy of doubling the value of RTO on packet loss . When the congestion in the network subisdes , then we can start sampling afresh or we can go back to the state before the congestion occurred .
Note that this is a temporary increase in the value of RTO, as we have not increased the value of RTT. So, the present RTT value will help us to go back to the previous situation when it becomes normal.
This is called the Acknowledgment Ambiguity Problem.

Fast Restransmit

TCP may generate an immediate acknowledgment (a duplicate ACK) when an out- of-order segment is received. This duplicate ACK should not be delayed. The purpose of this duplicate ACK is to let the other end know that a segment was received out of order, and to tell it what sequence number is expected. Since TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of segments, it waits for a small number of duplicate ACKs to be received. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. TCP then performs a retransmission of what appears to be the missing segment, without waiting for a retransmission timer to expire. This algorithm is one of the algorithms used by the TCP for the purpose of Congestion/Flow Control. Let us consider ,a sender has to send packets with sequence numbers from 1 to 10 (Please note ,in TCP the bytes are given sequence numbers, but for the sake of explanation the example is given). Suppose packet 4 got lost.
How will the sender know that it is lost ?
The sender must have received the cumulative acknowledgment for packet 3. Now, time-out for packet 4 occurs. On the receiver side, the packet 5 is received. As it is an out of sequence packet, the duplicate acknowledgment (thanks to it's cumulative nature) is not delayed and sent immediately. The purpose of this duplicate ACK is to let the other end know that a segment was received out of order, and to tell it what sequence number is expected. So, now the sender sees a duplicate ACK. But can it be sure that the packet 4 was lost ?
Well, no as of now. Various situations like duplication of packet 3 or ACK itself, delay of packet 4 or receipt of an out of sequence packet etc. might have resulted in a duplicate ACK.
So, what does it do? Wait for more!!! Yeah, it waits for more duplicate packets. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. TCP then performs a retransmission of what appears to be the missing segment, without waiting for a retransmission timer to expire. This is called Fast Retransmit.

Flow Control

Both the sender and the receiver can specify the number of packets they are ready to send/receive. To implement this, the receiver advertises a Receive Window Size. Thus with every acknowledgment, the receiver sends the number of packets that it is willing to accept. Note that the size of the window depends on the available space in the buffer on the receiver side. Thus, as the application keeps consuming the data, window size is incremented.
On the sender size, it can use the acknowledgment and the receiver's window size to calculate the sequence number up to which it is allowed to transmit. For ex. If the acknowledgment is for packet 3 and the window size is 7, the sender knows that the recipient has received data up to packet 3 and it can send packets of sequence number up to (7+3=10).
The problem with the above scheme is that it is too fast. Suppose, in the above example, the sender sends 7 packets together and the network is congested. So, some packets may be lost. The timer on the sender side goes off and now it again sends 7 packets together, thus increasing the congestion further more. It only escalates the magnitude of the problem.

Image References

  • http://publib.boulder.ibm.com/iseries/v5r1/ic2924/info/rzab6/rxab6502.gif

if u like the post just say thank u in comment box.

No comments:

Post a Comment

its cool