Understanding Sockets in Unix, NT, and Java
In software development these days, networks are all-important. For example, Sun Microsystems uses the slogan, "the network is the computer." IBM promotes a network-based business model called "e-business." It’s taken for granted that programs talk to each other across the network. But just how do they do it? Surprisingly, a technology that is more than 15 years old still provides the foundation for most of today’s connectivity at the application level. This technology is based on an idea called sockets.
To understand how programs communicate, it’s helpful to understand sockets technology. This paper explains the basic concepts of sockets and provides source code for three sample programs that illustrate fundamental sockets principles. Each program runs in a different environment (Unix, NT, and Java) to illustrate the cross-platform capability of sockets.
Although the general trend in software development is toward higher levels of abstraction that hide nitty-gritty functional details, knowing the basic principles of sockets-based communication is important for anyone who develops software. Sockets technology is a building block for computer communications.
Basic sockets concepts
Sockets belong to a group of software mechanisms that enable interprocess communications (IPC), which simply means that concurrently executing processes can exchange data. Some IPC mechanisms only support data exchange between processes running on the same machine, while other forms of IPC connect processes running on geographically dispersed machines. In System V Unix, message queues, semaphores, and shared memory are examples of IPC mechanisms. In Windows NT, named pipes, mailslots, and memory mapped files are part of the IPC family. Sockets are one of the most common IPC mechanisms; they support both local and remote communications and are available on Unix and NT.
Sockets technology was developed in the early 1980s at the University of California at Berkeley and was included with Release 4.1 of BSD (Berkeley Software Distribution) Unix. Because of the connection with BSD, sockets are sometimes referred to as "Berkeley sockets." However, sockets support was gradually adopted by other versions of Unix, and eventually by other families of operating systems, including both Intel-based systems and mainframes. Sockets were added to the Microsoft Windows world in the form of the Winsock API, and to the Java world as Socket and ServerSocket objects in the java.net package.
The simplest way to understand a socket is to think of it as a byte stream between computers. In the original Unix implementation, sockets were treated like files. For example, the standard Unix system calls open( ) , read( ) , write( ) , and close ( ) are used for processing files. But the same system calls can also be used to process sockets.
When you create a file with open ( ) , Unix sets up internal data structures to manage the file, and returns an integer, referred to as the file descriptor, by which you can manipulate the file. In similar fashion, when you create a socket with the socket ( ) system call, the system sets up data structures to manage the socket and returns an integer, referred to as the socket descriptor, that gives you access to the socket. However, the socket ( ) call, unlike open ( ) , does not take a name argument.
Another important concept is support for multiple protocols. Sockets were designed to support different kinds of communications protocols and data streams. For example, the protocol family argument in the socket ( ) call allows you to specify different protocols. You can specify Internet protocols, but you can also specify the Xerox Network Systems (XNS) protocols and other protocols.
Type of connection is another important concept in sockets programming. When you create a socket you decide whether data transfer will be connection-oriented or connectionless. Connection-oriented data transfer implies the existence of a session, or dialogue, between computers. Much like a dialogue between human beings, a computer communications session has an existence separate from the data messages exchanged within it. For example, a human dialogue may begin when both parties say "hello," and continue until both parties say "good-bye" even though the conversation includes periods of silence.
A human example of connection-oriented communication is a telephone call. The information transfer, or exchange of messages, takes place in the context of a session that persists until one person hangs up.
In contrast, connectionless data transfer refers to the transmission of data messages without benefit of dialogue or session. A human example of connectionless data transfer is mailing a letter at the US Post Office. The information transfer consists only of the message itself; there is no session established between the person sending the letter and the recipient. In computer terms this form of communication consists of launching messages onto the network without establishing a session with the target system.
A simple example
What does a typical sockets application look like? The following table describes a simple connection-oriented client-server sockets application.
|Client Action||Client System Call||Server
|Server System Call||Description|
|Create socket descriptor||socket( )||Create
|socket( )||The socket( ) call creates a socket descriptor, which is similar to a file descriptor. The protocol family must be chosen when socket( ) is issued.|
|n/a||n/a||Associate network address with socket||bind( )||The server socket must be network-addressable. A typical use of bind( ) is to associate the server socket with a specific Internet Protocol (IP) address.|
|n/a||n/a||Wait for incoming message||listen( ) and
|Listen( ) notifies the operating system that the server process is ready to receive messages. Accept( ) suspends the server process until a message arrives.|
|Contact server||connect( )||n/a||n/a||Connect( ) establishes a network connection with the server process.|
|Transmit and receive data||write( ) read( )||Transmit and receive data||write( ) read( )||This is similar to file I/O. The socket connection is duplex.|
|Terminate connection||close( )||Terminate connection||close( )||This is analogous to closing a file.|
This example illustrates the fundamental paradigm of sockets communication: a server system creates a socket, establishes network addressability, and waits for messages. A client system, cognizant of the server’s address, sends a connection request. If a connection is established, data flows between client and server.
The basic steps in a typical sockets session can be summarized as follows:
|Server:||Create socket, establish network addressability, wait for connection request|
|Client:||Create socket, send connection request to server|
|Server and Client:||Establish connection|
|Server and Client:||Transmit and receive data|
|Server and Client:||Close connection.|
Sample sockets application
Let’s look at three illustrative programs that constitute a simple client-server application based on sockets technology. This sample application is drawn from the Unix environment, and exploits the common Unix command man, which displays documentation about other Unix commands in the form of man pages. The purpose of the sample application is to make man pages available to users on non-Unix systems. Unix users are familiar with the man command, which retrieves documentation from the online Unix manual. For example, the command man sync displays documentation describing the purpose, syntax, and usage of the sync command.
Obviously the man command is intended to be run on Unix systems, because it displays documentation about Unix commands. What if a programmer working in a multi-platform environment wishes to retrieve man output while working on a Windows NT machine? Although this user requirement is artificial, the solution illustrates the capabilities of a client-server application based on sockets.
The sample application makes use of the client-server model of software design, which distributes the computing load across two or more machines. In the sample application, the server process runs on a Unix machine. The server process spends most of its time idle, waiting for a request from a client machine. The server opens a socket and makes it available on the network, providing addressability by hostname (bookworld) or by IP address (22.214.171.124). The server then goes into a quiesced listening mode. In this application the server is not entirely passive, because after every three idle minutes it notifies the user and prompts for approval to continue.
The sample application contains two client programs (one written in C++, one written in Java) that run on Windows NT machines. When a client is launched, it opens a socket and attempts to connect to the server process. On the local machine it prompts the user for a Unix command, which it transmits through the socket to the server. The server process constructs a man command string containing the client data, runs the command, and captures the output. The server transmits the output back to the requesting client, which displays it on the screen to the user. Thus, by way of sockets, the NT user has access to man page information while using a non-Unix machine. In this client-server configuration, all interprocess communication takes place by means of sockets.
Server program (Unix)
The server program creates a continuously-running process that listens on a known port for requests from client machines. The port is known in the sense that all client programs written for this application know the specific network address of the server. When a client request arrives, the server invokes the man command and transmits the output back to the client. The server then resumes listening.
The server program was developed in the C language on AIX 4.2 with the vi editor and the IBM xlC compiler. The server program is invoked from the command line.
The following header files are required for this program:
General-purpose data definitions
Socket data definitions
Additional socket data definitions
Internet data definitions
Required for signal ( ) system call
Required for setjmp ( ) system call
Input/output data definitions
Depending on the Unix implementation, the following header files may also be required: arpa/inet.h, arpa/nameser.h, resolv.h, sys/un.h, sys/uio.h.
The symbolic constant MAX_IDLE specifies the number of minutes the server process is allowed to be idle before issuing a warning message to the user.
Creating a Socket
The server socket is created with the socket ( ) call, which takes the following arguments:
The first argument, socket domain, selects the family of communication protocols that will be used to control the data flowing through the socket. AF_INET is a symbolic constant representing the Internet family of protocols. If the value of this argument is AF_UNIX, the socket will operate in the "Unix domain." This means it will communicate with other processes on the same Unix system only, and will not support communication across the network. Note that there is an equivalent set of symbolic constants beginning with PF (Protocol Family) rather than AF (Address Family).
The symbolic constant SOCK_STREAM provides a value for socket type, which indicates whether communication through the socket will be connection-oriented or connectionless. SOCK_STREAM signifies that the communication will be connection-oriented, whereas SOCK_DGRAM signifies that communication will consist of the connectionless transmission of data packets called datagrams.
The protocol argument allows the programmer to specify a specific protocol within the protocol family. For example, the symbolic constant IPPROTO_TCP specifies the Transmission Control Protocol (TCP). Typically this argument is set to zero, allowing the system to select a protocol.
The next step is to initialize the socket address structure. Three important data fields are:
- Address family
AF_INET specifies the Internet family of protocols.
- Network address
INADDR_ANY indicates that the server will accept messages using any Internet address available on the machine (for machines with multiple network connections, this field allows you to single out a specific IP address).
- Port number
This integer identifies the server process to the network. The IP address identifies the Unix host on the Internet; the port number identifies a specific process running on the Unix host. Port numbers below 1025 are restricted (for example, the telnet protocol has permanently reserved port number 23). The sample program arbitrarily uses port number 10001.
The htonl ( ) and htons ( ) functions convert long and short integer values from local host byte ordering to network byte ordering.
Once the socket address structure has been properly initialized, the bind ( ) call associates the socket address structure with the socket descriptor, making the socket network-addressable. Any messages arriving at the specified Internet host machine and marked with the specified port number are delivered to the specified socket.
At this point the server process is ready to receive messages from the network, but it must first invoke listen ( ) and accept ( ) . Invoking listen ( ) is a preliminary step that informs the operating system that the server is ready to listen for messages. The listen ( ) call also allows a backlog limit to be specified. The usual value is 5, which indicates that the socket will allow 5 incoming messages to be queued up if they arrive faster than the server process can respond to them.
Once the server has notified the operating system that it is ready to listen for messages, the server process goes into waiting mode by issuing the accept ( ) system call. The accept ( ) call blocks until a message arrives.
The sample program uses signal and setjmp logic to display a user message at fixed intervals determined by MAX_IDLE. This is not required for sockets programming but was added as a user convenience– it at least assures the user that the server process is still alive.
Connecting to a Client
When a connection request arrives from the network, the accept ( ) system call awakens from its blocked state and performs two key services: (1) it initiates a data connection with the requesting client, and (2) it creates an entirely new socket to support communications with the requesting client. Creation of a second socket relieves the dependency on the original socket, so that the server process is free to listen through the original socket for requests from other clients. Note that the socket address structure argument to the accept ( ) call, sas2, specifies a new, uninitialized socket address structure and the return value from the accept ( ) call is a new socket descriptor, sd2. Thus the accept ( ) call not only establishes a connection with a requesting client machine, but also creates a new socket to support communication with the requesting client. This frees up the original socket, sd1, to receive messages from other client machines.
Automatic creation of a second socket is necessary to support concurrent servers, which support many clients simultaneously. In contrast, the sample server program is an iterative server, which processes client requests serially. This iterative server is useful for tutorial purposes, but production servers typically follow the concurrent model. In the Unix world, concurrent servers traditionally use the fork ( ) system call to spawn an independent process to handle each client request. Each child process only lives long enough to process its request and transmit appropriate data back to the requesting client, while the parent process lives indefinitely. On non-Unix systems threads, rather than separate processes, are typically used to handle client requests. For simplicity, the sample server does not include any multiprocessing or multithreading capability, although a real server probably would.
Exchanging Data with the Client
The server process attempts to read bytes from the sd2 socket using the recv ( ) call. The hex value ff was chosen to indicate end-of-transmission, so the recv ( ) routine tests for this value as well as a negative return code. In addition to the recv ( ) socket system call for receiving data over a socket, there are variants such as recvmsg ( ) and recvfrom ( ) . The regular read ( ) system call can be used to read data over a socket, but its semantics are slightly different from when it is used to read data from a file.
An important lesson to learn about socket input and output is that I/O operations are relatively low-level and put responsibility on the programmer for the technical details of the data flow. For example, it is the programmer’s responsibility to ensure that input and output routines can support different processing speeds between servers and clients, fluctuations in data transfer across the network, differences in buffer sizes between servers and clients, and so on. The combination of sockets technology and the TCP protocols, used pervasively in communications programming, provides a reliable data transfer system with a consistent programming interface at both end points. However, it does not automatically take care of all the details. For example, the programmer must choose a method for detecting end-of-transmission.
Once the end of the input data stream has been reached, as indicated by hex ff, the server processes the request. The request data is added to a command string containing the man command. The command string is passed to the operating system by means of the system ( ) call. Whatever output is generated by the man command is written to the temporary file tfile, which is then opened for reading, and the resulting data is written to the socket with send ( ) . When all strings contained in tfile have been written to sd2, the hex value ff is sent to signify end-of-transmission. The send ( ) operation is slowed by adding sleep ( ) to give the client process enough time to process each string of data, preventing buffer overflow. When recv ( ) finally detects hex ff or zero data, the server process interprets this as a disconnect from the current client and issues the accept ( ) call again to resume waiting for a new client. This cycle continues until the user terminates the server process.
Client program (NT)
The C++ client program allows the user to send requests to the Unix man page server and see the responses on an NT machine. The client prompts for a Unix command string, such as "reboot," and formats a request message terminated by hex ff. When the client program starts up it goes through the necessary preliminary steps of creating a socket and connecting it to the man page server. By the time the prompt appears on the screen, the client socket is ready to transfer data over its connection with the server socket.
The C++ client program was developed on Windows NT 4.0 with Microsoft Visual C++ 5.0, and is executed within the Visual C++ environment as a command-line application.
Required for Microsoft Foundation Class library
Required for basic C++ input/output classes
Required for winsock DLL.
Creating a Socket
Like the server program, the client program uses the socket() call to create a socket descriptor. The socket descriptor sd1 and the socket address structure sas1 are declared in the same manner as in the server program, which illustrates the similarity between the Windows implementation of sockets and the Unix implementation. The basic concepts of sockets communication are applicable across platforms. Microsoft documentation states that Windows sockets are based on the Unix sockets implementation in the BSD 4.3 release, and support both BSD-style socket routines plus extensions specific to Windows.
The chief difference in the Windows environment is the requirement for the winsock Dynamic Link Library (DLL). The corresponding header file, winsock.h, must be included in the source file, and the DLL must be available in the run-time environment. Note that winsock is the name of the original 16-bit Windows library, while wsock32 is the 32-bit version.
The winsock DLL must be initialized with a special function, WSAStartup(). The WSAStartup() call takes two arguments, a winsock version number and a pointer to a structure that stores winsock startup data. A zero return code indicates successful winsock startup. MAKEWORD is a C macro that puts the version number into the format required by this function.
The arguments to the socket() call in Windows are the same as they are in Unix– AF_INET, SOCK_STREAM, and 0. These arguments request the creation of a socket that will use the Internet family of protocols for connection-oriented communication with the choice of protocol left up to the system. Unlike the server program, the client does not require multiple sockets. There is only one socket descriptor, sd1, which supports all communication with the server.
The initialization of the socket address structure for the client program is slightly different from the server program. As in the server program, the address family is set to AF_INET and the port number is set to 10001. However, the network addressing is different. Addressing in the server program is specified as INADDR_ANY, by which the server process tells the system it will accept messages from any network adapter on the machine. Addressing in the client program is specified as 126.96.36.199, which is the IP address of the server machine. Any outgoing messages sent through this socket will be addressed to this destination. inet_addr() is a support function that converts the dotted decimal representation of an IP address into the required internal format.
Connecting to the Server
When the client program issues the connect() call, the TCP/IP transport layer attempts to locate the target host at IP address 188.8.131.52 and establish a connection. Since the sample server program has already been started on the correct host and has already issued an accept() call, the connection can be set up immediately. Note that the client program does not have to specify its own IP address; the server program acquires this information from the connection request packet.
Exchanging Data with the Server
Once the server connection is functioning, the man page client program has access to the man page database on the remote Unix system. The client prompts the user for a man page request, which it transmits to the server with the send() call. Hex ff is appended to indicate end-of-transmission. After sending the request, the client program waits for a response with a recv() call. Note that the client receives up to 256 bytes of data for each recv() call rather than receiving bytes one at a time, like the server program. The client program parses the incoming data stream and displays an output line whenever it encounters a newline, formfeed, or carriage return character. When it finds a byte with the hex value ff, the client program recognizes the end of the server transmission, loops back, and prompts the user for another man page request. This continues indefinitely until the user enters "bye," which terminates the client program.
Client program (Java)
The Java client program is functionally equivalent to the C++ client. Its purpose is to illustrate the availability of sockets technology on the Java platform. The Java client presents a graphical user interface comprising two windows, although the client is still text-based because of the nature of the application. The first window prompts for a Unix command string, and the second window presents the resulting man page information.
The Java client program was developed on Windows NT 4.0 with IBM VisualAge for Java 1.0 and is executed within the VisualAge environment as a graphical application.
Provides classes for network programming
Provides input/output classes
Provides classes for graphical user interface.
The Java client consists of two classes, ManPage and ManPageWindow. The ManPage class starts the application by instantiating itself in main(), and the ManPage constructor then instantiates a ManPageWindow object and initializes it as a main window. When the user enters a command string into the main window, the client instantiates another ManPageWindow object and initializes it as an output window.
Creating a Socket
Unlike the C++ client program and the server program, the Java client does not use the standard socket() call to create a socket descriptor. The Java language provides support for sockets through two socket classes which encapsulate the standard sockets functionality. This illustrates the similarity between the Java implementation of sockets and the Unix implementation. Again, the basic concepts of sockets communication are applicable across platforms.
The java.net.Socket class provides functionality for a client-side socket. The corresponding class java.net.ServerSocket provides functionality for a server-side socket. When you instantiate a client Socket object, you specify the name of the remote host machine and the port number to which you wish the socket to be connected. The constructor creates the socket and also attempts to connect it to the requested host, simplifying the job of the application developer. In the sample program, the server hostname is specified as bookworld. The full Domain Name System name for this machine is bookworld.raleigh.ibm.com. Following the Unix approach in which sockets are treated conceptually and syntactically almost as if they were files, the Java Socket class provides getInputStream() and getOutputStream() methods to support reading and writing data through a socket. What about the three arguments required for the socket() call– AF_INET, SOCK_STREAM, and 0? It is not necessary to specify these arguments when instantiating a socket object in Java. Since these are the most common values, they are provided as defaults.
Connecting to the Server
When a ManPageWindow object is initialized as a main window, it not only instantiates window components such as TextField, Panel, and Button, but it also invokes socket methods to establish connectivity with the server. A Socket object, socket1, is created with the new operator. The constructor method implicitly issues the equivalent of a connect() call, specifying the server host name bookworld and the port number, 10001. Unless an error is encountered, the Socket constructor returns a Socket object already connected to the server process on the target machine.
Next the initialize routine, initMainWIndow(), creates stream objects associated with the connected socket. dataInputStream1 and dataOutputStream1 are conceptually like traditional files and take advantage of the fact that sockets were designed to function much like files. These stream objects provide read and write methods which the Java client later uses to transfer data through socket1.
Exchanging Data with the Server
When a text string is entered into the TextField object in the main window, the action() method invokes processRequest(). This method writes the input string to the socket by way of the output stream object, appending hex ff. The input string data from the Java client arrives in the server program over the sd2 socket and the server program generates appropriate man page output. The Java client receives response data from the server through the input stream object associated with socket1. Note that by routing socket data movement through stream objects the Java environment effectively encapsulates the standard sockets functionality. This simplifies the work of the developer since the input/output idiosyncrasies of sockets are hidden behind the standard input/output functionality of the Java stream objects.
After the Java client transmits a request string to the server, the processRequest() method begins a read loop on the socket using the readUnsignedByte() method. It is necessary to read the input data stream in this primitive manner, one byte at a time, because the Java environment represents text in two-byte Unicode characters, and the data stream transmitted from the server is not encoded in Unicode. readUnsignedByte() returns the input data as an integer, which is tested for hex ff, and then is cast to a Unicode character. If the character is not a newline, formfeed, or carriage return it is stored in the input array; otherwise the input array is passed to printOutput() as a String object and appended to the output text in the output window.
If this paper has whetted your appetite for more study of sockets, there are many aspects of sockets usage that you can profitably investigate. Here are some examples:
The socket domain parameter signifies the communications environment– the type of network and address space– through which connected machines will communicate. For example, the Internet domain, indicated by AF_INET, consists of machines connected to an intranet/Internet and using the Internet Protocol addressing scheme. The original version of sockets also supported a "domain" that consisted of a single Unix machine. This domain is specified with the AF_UNIX symbolic constant. Generally in software development Unix domain sockets are treated like other types of sockets, but addressing in this domain consists of specifying file names rather than network addresses. Since this communications domain is defined specifically for the Unix environment, implementations of sockets on non-Unix systems may not always support this type of domain.
Programming with Datagrams
The socket type parameter is typically set to SOCK_STREAM or SOCK_DGRAM, although other values are supported. SOCK_STREAM indicates that sockets communication will be connection-oriented, and this is the value used in the sample application. In this arrangement data transmission takes place in the context of a session in which both machines are logically connected. Data delivery is considered reliable, and messages arrive in the order in which they were sent. In the Internet domain, the Transmission Control Protocol (TCP) is used for messages of this type.
The value SOCK_DGRAM indicates that sockets communication will be connectionless. Sockets do not need to be connected in order to transmit data, which is transmitted in discrete packets called datagrams. Data transport is not considered reliable, and packets may arrive out of order or not at all. The User Datagram Protocol (UDP) is used for messages of this type. Despite the lack of reliability, connectionless protocols like UDP are useful in certain situations. For example, UDP communication is faster than TCP; it does not incur the overhead of setting up a session and maintaining state.
The server program in the sample application is an iterative server because it processes client requests one at a time. Real servers, however, tend to be concurrent servers. A concurrent server leverages multiprocessing and/or multithreading capability provided by the platform to serve multiple client requests in parallel. The precedent was established in the Unix world, where the fork() system call was available to spawn child processes. In modern Unix implementations Posix thread support is available in the pthreads library, and concurrent servers can choose between processes or threads to handle multiple concurrent requests. Non-Unix platforms such as Windows NT and OS/2 also have threads capability.
When a server supports concurrency, the relationships among the sockets is sometimes referred to as master-slave. The master socket is the one which is bound to the server’s network address: this socket’s task is to listen for requests, but not to service them. When a client request is accepted by the master socket, the accept() call creates a new socket that is dedicated to supporting the client that launched the request. This "slave" socket handles all data transfer between the server and its assigned client, and can be discarded when the client request has been satisfied. The master socket continues to accept client requests and create new processes or threads up to the system’s capacity. Concurrent servers are challenging to develop and require competency not only in sockets technology but also in managing multiple processes or threads. For high-demand servers with a high client-to-server ratio, concurrent servers are clearly preferable to iterative servers.
Variations in Sockets Support
Platforms which added sockets support based on the original BSD Unix version are not always completely consistent. Basic calls such as socket ( ) , bind ( ) , and connect ( ) are likely to be identical to the Berkeley version in syntax and semantics, but other calls can reflect variations attributable to platform differences. Following are some examples:
- Microsoft Windows 3.x
In the Windows environment, we have already observed the WSAStartup ( ) call that is required to load and initialize the winsock code. Because Windows 3.x is a single-threaded operating system, sockets calls that block in the original BSD version would freeze the Windows system. Some existing calls were modified, and some new calls added, to enable sockets functionality in a single-threaded environment.
- IBM AIX
Variations in socket support can affect not only which function calls are provided but also the range of argument values available for sockets calls. For example, the most common socket domains are AF_INET (Internet domain) and AF_UNIX (Unix domain). AIX 4.2, however, adds the AF_NDD (Network Device Driver) domain. The NDD domain supports transmission of data packets at a lower layer in the protocol stack than is normally available, for example, addressing packets directly to a Medium Access Control (MAC) address.
- IBM OS/400
In Unix an executing process can pass a file descriptor or socket descriptor to another process. In the IBM OS/400 operating system there is no way to pass a socket descriptor from one job to another. Therefore the sockets support on this platform added the functions givedescriptor( ) and takedescriptor( ) .