Sorry to have kept you waiting. I had been very very busy recently. For starters, I am trying to complete parts of my creative writing exercise in govz.wordpress.com. Anyways, a number of people from other countries (USA,Romania,Korea,China,India) have been hitting this blog entry for quite some time now. When I checked this blog’s history, a rather interesting set of questions or search filters were raised. Here are a few of them :
- “multi-thread file descriptor open close”
- “fork / clone file descriptor”
- “file descriptor leak debug”
- “creating a file descriptor”
- “how to reinitialize file descriptor c++”
In a way, I feel happy because, though this is an unsavory topic for most, it turns out this is also important to someone else, somewhere in the world. And by fate, they have somehow gained access to my blog and my mundane musings. I Just a want to repeat this again though, I am not and will never be a supreme-level engineer (the likes of Dan Saks, Bjarne Stroustrup, Dennis Ritchie, Jon B. Postel, Paul Vixie, Steve W. etc…), I am still a tremendous work in progress. Should you find my data suspect of flaw, please do leave a message. After all, learning is fun when more heads come together.
Also please take into consideration that the following discussion is
- limited to user-land level programming (meaning not within your Operating System or kernel)
- limited to systems with kernels which have management mechanisms for all types of resources (i.e memory, descriptors, etc…).
- limited to systems which frees all used resources on process termination.
- limited to UNIX or UNIX-like systems. (Sorry, windows developers.
)
Ok?
Here we go, “beam us up, scotty”. (Sorry, for a moment there, I missed, Kirk , Spock, Data, T-Pol and Janeway of Star Trek).
I. File Descriptor Fundamentals
Before we finally close the sample code of the previous blog, it is imperative that we are capable of understanding certain fundamentals about file descriptors.
A. In the beginning, there was and is the trinity.
Did you know that, when you start an ordinary program, you automatically open three file descriptors? These file descriptors are embedded within the streams are known to us, as stdin (standard input), stdout (standard output), and stderr (standard error). In an ordinary process, these three streams automatically obtain file descriptor id 0, 1, and 2.
So in a simple application, the maximum usable number of descriptors per process is equal to the process limit minus three.
Can you close them, yes certainly! “fclose(stdout)” closes the standard output. (We use fclose() because stdout is a stream or a FILE* data type.) It is pretty much a valid command.
However, once stdout is closed, “printf” or “cout” will practically be useless, and in some instances, calling stdout-related commands in an stdout-disabled environment MAY cause the crash of an application.
For daemon developers however, if ever you are working on a BSD environment, we can chose to close these three file descriptors by simply setting the “noclose” parameter to a non-zero value in the “daemon()” command. Using “daemon(noclose=1)” the system points the three streams’ descriptor to “/dev/null”, causing all stdout-sent data to be sent safely to a NULL device. Linux daemon-izing on the other hand, is quite a tedious task that involves “forking” and a lot of other stuff that will eventually re-direct or re-open the three basic streams to a “/dev/NULL”.
“Redirection” in this context means that your output or input, is redirected into another device (or a file for that matter) aside from your laptop or PC screen. This is the reason why in any UNIX or UNIX-like systems, executing “ls -a > temp.txt” saves the result of the “ls -a” command to the file “temp.txt” . Anyways I just wanted to share some basic idea on IO redirection.
B. File Descriptor Acquisition Behavior
As far as my experience can take me, commands that generate file descriptors include (but are not limited to : open(), creat(), fopen(), freopen(), socket(), socketpair(), accept(), dup(), dup2(), and fcntl(). Be warned though, that some commands may not be supported in Linux or other UNIX and BSD-flavors. Also there may be other commands resulting to descriptor generation. My list only includes commands which I am aware of .
Now, as I have said, once we execute any of these commands, a new file descriptor is created and naturally the maximum number of usable descriptors for the process (and the system as a whole) is lessened by 1.
In almost all of the platforms I have worked with, once we ask the system for a file descriptor, the system usually returns the lowest possible file descriptor. So in normal conditions, where the triumvir streams are not closed, the first file descriptor you will get to acquire will be 3. Or if in case you have daemonized-close-all it, the return value will be file descriptor “zero”.
C. Forks, Clones and Virtual forks
For those who have not yet had the experience of “fork”ing, “vfork”ing, and “clone”ing then it is a privilege to welcome you to the world of process creation/duplication. Well, at the very least for C and C++. Be mindful again that the above commands may not be supported in your platform. And that threads and processes are entirely different in UNIX and UNIX-like systems. Anyways fork and other system calls of its kind, are called upon by one process, to create a new or child process. Its purely assexual though.
There are rules and norms covering this “process forking” mechanism but I wont discuss them in detail here.
There is only one thing I want to share though.
If a parent process, creates a child process via fork (or any of the above commands), the parent’s file descriptors are inherited by the child.
This is by design. But personally, I sometimes see this as an “inherited leak”. In a simplistic diagram, I would like to show you what happens after a child process is called :
( P1 )
x -- start
|
[fd1 = open("myfile.txt") = 3]
|
[fd2 = socket() = 4]
|
[fd3 = accept() = 5]
| [child process shares descriptors 0-5]
[create child :: fork()] - - - - - - - -> ( child P2 )
| |
| [p2fd1 = open("myfile3.txt") = 6]
| |
| |
x x
In the diagram above , after a fork() is executed, Process 2 starts and opens another file. In this case the resulting file descriptor for P2’s open() call, is 6 and not 3. Also if the same file like “myfile.txt” is opened again in the child process, depending on the settings of the parent’s “open()” call for “myfile.txt”, an error might occur. But anyways, my point is simple, child processes inherit their parent’s file descriptor table. (Actually this inheritance is useful in shell-command execution as it provides a method for “piping” data from the shell to another process.)
But if you have no plans whatsoever to use any of the parent’s file descriptors, you can use the “close()” command in the child process to close specific file descriptors. Or better yet, you can use the “closefrom()” command, to avoid having to loop the closing of opened descriptors.
D. Process Transformation
Did you know that you can transform processes? If you did not know that, I welcome you yet again. More often than not, a “fork”-er, an individual who forks his process, has one ultimate desire. He or she desires to execute a different command or program. Converting one process into another program, or process transformation, is done by calling the “execv()” or “execl()” family of commands. Take a look at the diagram below. Try it out if you have a UNIX/UNIX-like system with you.
( P1 - system() ) ( P2 - execv())
x -- start x -- start
| |
| [char *temp[2] = {"-a",NULL};]
| |
[printf("transforming.");] [printf("transforming");]
| |
[system("ls -a");] [execv("ls", temp);]
| |
[printf("ls finished.");] [printf("execv finished.");]
| |
x end of program x end of program
I assure you 100% (provided execv has no error), you will never see the message “execv finished.” on your screen. Meaning P2 has been completely transformed. But in the system() command, the”ls” is executed and then the parent process is continued. Making you see the “”ls -a finished.” message just after ls-a succeeds.
But as this blog is not about process transformation, I would just like to state that even after process transformation, the file descriptor table data is persistent even after an execv() is called. Simply put, by default, the file descriptors opened before the execv() call still exists in the transformed process.(This is the default behavior for most systems, and of course if the files were open() -ed with default file-flag settings.)
This however can be avoided by setting the necessary flags via, “fcntl()” command. Using “fcntl()”, set the opened file descriptor’s close-on-exec flag (FD_CLOEXEC) just before you call execv(). With the close-on-exec flag enabled, the system automatically closes all descriptors in the process, whenever an execv() or execl() is executed.
E. Maxima – Maximum Descriptor Count
(I suddenly remembered my minima-maxima derivative mathematics hahaha … collectively known as extrema.). Anyways, for almost all UNIX based systems, getdtablesize() is supported. If your program needs to know the maximum number of file descriptors a process can open at any time, then you can use the getdtablesize() function.
Remember though that if getdtablesize() is 32, it follows that the lowest possible descriptor you might get is zero and the highest descriptor value you will get is 31.
II. File Descriptor Theory Conclusion
By now, I think the information above is enough for all of us to understand the holistic-overviewish-nature of file descriptors. This primer blog may not be complete but I am guessing this is enough to :
- Spark your curiosity
- Create a semi-complete mental image of file / fd usage in programming
- And hopefully in some parts I have made you aware of some of the different things you can do in C/C++ like process generation and process transformation.
Within the next few days, I will upload my next blog, discussing my own file descriptor leak debugging techniques.
For the many C-developers out there, I wish you well. In a way, right now, Java seems to be taking over. But don’t worry, I think C and C++ will ALWAYS be around.
Good Day.
Up Next : File Descriptor Leak Debugging