Archive for the 'Programmin' Category

21
Oct
09

File Descriptor Leaks

Sorry to have kept you waiting. I had been very very busy recently. For starters, I am trying to complete parts of my creative writing exercise in govz.wordpress.com. Anyways, a number of people from other countries (USA,Romania,Korea,China,India) have been hitting this blog entry for quite some time now. When I checked this blog’s history, a rather interesting set of questions or search filters were raised. Here are a few of them :

- “multi-thread file descriptor open close”
- “fork / clone file descriptor”
- “file descriptor leak debug”
- “creating a file descriptor”
- “how to reinitialize file descriptor c++”

In a way, I feel happy because, though this is an unsavory topic for most, it turns out this is also important to someone else, somewhere in the world. And by fate, they have somehow gained access to my blog and my mundane musings. I Just a want to repeat this again though, I am not and will never be a supreme-level engineer (the likes of Dan Saks, Bjarne Stroustrup, Dennis Ritchie, Jon B. Postel, Paul Vixie, Steve W. etc…), I am still a tremendous work in progress. Should you find my data suspect of flaw, please do leave a message. After all, learning is fun when more heads come together. :)

Also please take into consideration that the following discussion is

  1. limited to user-land level programming (meaning not within your Operating System or kernel)
  2. limited to systems with kernels which have management mechanisms for all types of resources (i.e memory, descriptors, etc…).
  3. limited to systems which frees all used resources on process termination.
  4. limited to UNIX or UNIX-like systems. (Sorry, windows developers. :( )

Ok? :D Here we go, “beam us up, scotty”. (Sorry, for a moment there, I missed, Kirk , Spock, Data, T-Pol and Janeway of Star Trek).


I. File Descriptor Fundamentals

Before we finally close the sample code of the previous blog, it is imperative that we are capable of understanding certain fundamentals about file descriptors.

A. In the beginning, there was and is the trinity.

Did you know that, when you start an ordinary program, you automatically open three file descriptors? These file descriptors are embedded within the streams are known to us, as stdin (standard input), stdout (standard output), and stderr (standard error). In an ordinary process, these three streams automatically obtain file descriptor id 0, 1, and 2.

So in a simple application, the maximum usable number of descriptors per process is equal to the process limit minus three.

Can you close them, yes certainly! “fclose(stdout)” closes the standard output. (We use fclose() because stdout is a stream or a FILE* data type.) It is pretty much a valid command. :) However, once stdout is closed, “printf” or “cout” will practically be useless, and in some instances, calling stdout-related commands in an stdout-disabled environment MAY cause the crash of an application.

For daemon developers however, if ever you are working on a BSD environment, we can chose to close these three file descriptors by simply setting the “noclose” parameter to a non-zero value in the “daemon()” command. Using “daemon(noclose=1)” the system points the three streams’ descriptor to “/dev/null”, causing all stdout-sent data to be sent safely to a NULL device. Linux daemon-izing on the other hand, is quite a tedious task that involves “forking” and a lot of other stuff that will eventually re-direct or re-open the three basic streams to a “/dev/NULL”.

“Redirection” in this context means that your output or input, is redirected into another device (or a file for that matter) aside from your laptop or PC screen. This is the reason why in any UNIX or UNIX-like systems, executing “ls -a > temp.txt” saves the result of the “ls -a” command to the file “temp.txt” . Anyways I just wanted to share some basic idea on IO redirection.

B. File Descriptor Acquisition Behavior

As far as my experience can take me, commands that generate file descriptors include (but are not limited to : open(), creat(), fopen(), freopen(), socket(), socketpair(), accept(), dup(), dup2(), and fcntl(). Be warned though, that some commands may not be supported in Linux or other UNIX and BSD-flavors. Also there may be other commands resulting to descriptor generation. My list only includes commands which I am aware of .

Now, as I have said, once we execute any of these commands, a new file descriptor is created and naturally the maximum number of usable descriptors for the process (and the system as a whole) is lessened by 1.

In almost all of the platforms I have worked with, once we ask the system for a file descriptor, the system usually returns the lowest possible file descriptor. So in normal conditions, where the triumvir streams are not closed, the first file descriptor you will get to acquire will be 3. Or if in case you have daemonized-close-all it, the return value will be file descriptor “zero”.

C. Forks, Clones and Virtual forks

For those who have not yet had the experience of “fork”ing, “vfork”ing, and “clone”ing then it is a privilege to welcome you to the world of process creation/duplication. Well, at the very least for C and C++. Be mindful again that the above commands may not be supported in your platform. And that threads and processes are entirely different in UNIX and UNIX-like systems. Anyways fork and other system calls of its kind, are called upon by one process, to create a new or child process. Its purely assexual though. :) There are rules and norms covering this “process forking” mechanism but I wont discuss them in detail here. :) There is only one thing I want to share though.

If a parent process, creates a child process via fork (or any of the above commands), the parent’s file descriptors are inherited by the child.

This is by design. But personally, I sometimes see this as an “inherited leak”. In a simplistic diagram, I would like to show you what happens after a child process is called :


                 ( P1 )
                    x -- start
                    |
      [fd1 = open("myfile.txt") = 3]
                    |
           [fd2 = socket() = 4]
                    |
           [fd3 = accept() = 5]
                    |                [child process shares descriptors 0-5]
        [create child :: fork()] - - - - - - - ->   ( child P2 )
                    |                                    |
                    |		         [p2fd1 = open("myfile3.txt") = 6]
                    |                                    |
                    |                                    |
                    x                                    x

In the diagram above , after a fork() is executed, Process 2 starts and opens another file. In this case the resulting file descriptor for P2′s open() call, is 6 and not 3. Also if the same file like “myfile.txt” is opened again in the child process, depending on the settings of the parent’s “open()” call for “myfile.txt”, an error might occur. But anyways, my point is simple, child processes inherit their parent’s file descriptor table. (Actually this inheritance is useful in shell-command execution as it provides a method for “piping” data from the shell to another process.)

But if you have no plans whatsoever to use any of the parent’s file descriptors, you can use the “close()” command in the child process to close specific file descriptors. Or better yet, you can use the “closefrom()” command, to avoid having to loop the closing of opened descriptors.

D. Process Transformation

Did you know that you can transform processes? If you did not know that, I welcome you yet again. More often than not, a “fork”-er, an individual who forks his process, has one ultimate desire. He or she desires to execute a different command or program. Converting one process into another program, or process transformation, is done by calling the “execv()” or “execl()” family of commands. Take a look at the diagram below. Try it out if you have a UNIX/UNIX-like system with you.


    ( P1 - system() )                       ( P2 - execv())
           x -- start                              x -- start
           |                                       |
           |                         [char *temp[2] = {"-a",NULL};]
           |                                       |
[printf("transforming.");]               [printf("transforming");]
           |                                       |
  [system("ls -a");]                    [execv("ls", temp);]
           |                                       |
 [printf("ls finished.");]           [printf("execv finished.");]
           |                                       |
           x end of program                        x end of program

I assure you 100% (provided execv has no error), you will never see the message “execv finished.” on your screen. Meaning P2 has been completely transformed. But in the system() command, the”ls”  is executed and then the parent process is continued. Making you see the “”ls -a finished.” message just after ls-a succeeds.

But as this blog is not about process transformation, I would just like to state  that even after process transformation, the file descriptor table data is persistent even after an execv() is called. Simply put, by default, the file descriptors opened before the execv() call still exists in the transformed process.(This is the default behavior for most systems, and of course if the files were open() -ed with default file-flag settings.)

This however can be avoided by setting the necessary flags via, “fcntl()” command. Using “fcntl()”, set the opened file descriptor’s close-on-exec flag (FD_CLOEXEC) just before you call execv(). With the close-on-exec flag enabled, the system automatically closes all descriptors in the process, whenever an execv() or execl() is executed.

E. Maxima – Maximum Descriptor Count
(I suddenly remembered my minima-maxima derivative mathematics hahaha … collectively known as extrema.). Anyways, for almost all UNIX based systems, getdtablesize() is supported. If your program needs to know the maximum number of file descriptors a process can open at any time, then you can use the getdtablesize() function.

Remember though that if getdtablesize() is 32, it follows that the lowest possible descriptor you might get is zero and the highest descriptor value you will get is 31.


II. File Descriptor Theory Conclusion

By now, I think the information above is enough for all of us to understand the holistic-overviewish-nature of file descriptors. This primer blog may not be complete but I am guessing this is enough to :

  • Spark your curiosity
  • Create a semi-complete mental image of file / fd usage in programming
  • And hopefully in some parts I have made you aware of some of the different things you can do in C/C++ like process generation and process transformation.

Within the next few days, I will upload my next blog, discussing my own file descriptor leak debugging techniques.

For the many C-developers out there, I wish you well. In a way, right now, Java seems to be taking over. But don’t worry, I think C and  C++ will ALWAYS be around. :) Good Day.

Up Next : File Descriptor Leak Debugging

free counters

05
Aug
09

File Descriptors

Of the four I have stated in the previous blog, I want to discuss first, (if ever there will be a second, I don’t know yet.) file descriptor leaks. I would like to share with you (or as a note for myself in general), an experience of mine on this type of leak. For the most part, I have written this blog so as not to forget what I have learned from the experience. I also wrote this in a way, to share some knowledge I have acquired over it.  i.e the difficulty of file descriptor leak debugging, the techniques I employed to fix and determine it, and the tools I used.  Plus some overall knowledge, I think I have acquired over a UNIX / Unix-like system.


Files, what art thou?

Files are the simplest forms of data storage. Of course, I think even those who don’t develop software are kinda familiar with this fact. But, (and it is a big BUT!) files are also a convenient and rudimentary method of signaling and/or synchronization. Developers like myself refer to this signaling mechanism famously as IPC – Inter Process Commnunication. For example, imagine two processes, Process1(P1) and Process2(P2). Let us say that before Process1 starts sending requests Process2 has to be ready.

P1 and P2′s synchronization mechanism can be simplistically described by the following :

    ( P1 )                          ( P2 )
      x -- start                      x -- start
      |                               |
 [initialize]                    [initialize]
      |                               |
      |                      [resource preparation]
      |                               |
      |                              [b] --- announce readiness.
     [a]  wait 'til P2 ready          |
      |                               |
      |                      [wait for requests]
      |                               |
      x -- start sending              x
           requests to  P2

True that with a simple libc::kill() command (yep, “kill” does not mean “terminate” all the time, my young friends.), P2′s existence can be checked, but the its state can never be determined (not unless P2 is synch-safe and has a complex signal handler in place). So in the above example, a file can be used for signaling/synchronization. P1 will routinely check for the existence of a certain file via libc::stat() or libc::access(), while P2 will be the one to create the file. So if we substitute [a] as “loop and sleep until the file /ipcs/process2_ready.dat exists” and [b] as “create /ipcs/process2_ready.dat until successful”, it makes a whole lot of sense right? Of course the flow has to be refined more though, but the gist is more or less like that. (Doubting Thomas : files are resident aren’t they? At next start-up, a false positive will occur at P1 since the file is still present. answer : that my friend is a trick i have to teach later. ).


Files are everything.

In Windows, this paradigm might not ring true. But you see, UNIX or UNIX-like Operating systems (FreeBSD, OpenBSD, Linux, etc…) have one simple yet fundamental precept. Everything is a file. (for Unix-like OS’es however, it is more like “Everything is almost a file.”) It follows too, that any UNIX based developer, has too be wary of each opened file descriptor one has in his process. File descriptors in the UNIX context is not just about “files” per se. Like what all those “better” guys tell you on the internet, in Unix/Unix-like systems, once you open a device, you have to acquire a file descriptor. When you libc::fork() a process or libc::clone() it, and use libc::pipe(), you use two file descriptors. For some though, this tidbit of knowledge may not be really something important, especially when multi-process-multi-thread operations are unnecessary. But for those who create “daemons” or “timing critical multi-process-multi-thread applications”, for a living, this piece of trivia might come in very useful.


What then is a file descriptor?

A file desriptor in all its simplicity is a numeric handle/value which represents a file you have opened. To put it in easy terms, a file descriptor is like an the Operating System’s (OS) translation of a very long file name. Let us take for example, the following file which has a name of :
–> “C:\\directory1\directory2\iamaveryveryveryverydveryveryverylongfilename.txt”

In the OS layer however, once the above file is opened, file1 is merely the process’ file descriptor : 3. Now wait a minute, why “3″ and not zero or one? Later I will explain :D . For the moment take it as it is. :)

The obnoxiously long file name is converted simply as file number : (3).  In C or C++ it can be done via the following line :

fd = open(“myfile.txt”, O_CREAT | O_WRONLY);  (C/C++)

Theoretically, libc::fopen() also does the same thing but the result of such a function is not really a file descriptor. libc::fopen() has a return type of FILE * but there is a way to retrieve its underlying file descriptor, use libc::fileno(). More on this later (when i can get back at it …. like 10 years from now.)


So What’s with file opening?

Though syntactically, file descriptor generation looks very simple, there are quite a few intricacies within the kernel / OS operation that goes along with it. The OS actually does a few stuff  during an “open()” call. Though I am unsure of the sequence, but in most Unix or Unix-like systems some of open()’s internal steps are the following :

1. OS checks the system limit of file descriptors and checks if it is full or not.
2. OS checks the process’ file descriptor table if it is full or not.
3. OS retrieves the lowest free index in the process table.
4. OS saves the data to the process table (i.e. file name, file position, etc…)
5. OS returns the file descriptor to the calling function.


Teaching By Example

“That’s easy, for every file you open, you close it.” The rule is correct. However, the mechanism on how you close and open a file matters more than anything else. Let us analyze the following code. In what case do u think will a file descriptor leak happen? For this exercise, let us assume that the  libc::close() call never fails.

/*********************************************************/
/*    Created by      : Gauvin L. Repuspolo
/*    What is this    : File Descriptor Leak Example
/*    My Birthday     : 1977/02/21 [just goofing around]
/*********************************************************/
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

#define SAMPLE_LAUGH   "hahahaha"
#define SAMPLE_CRY     "huhuhuhu"

int generic_fd = 0;

/****************************************************
/* name       : write_something
/* parameters : (input) is_laughing
/*              --> <= 0 - write SAMPLE_CRY
/*              --> >  0 - write SAMPLE_LAUGH
/* return     : on succes returns 0
/*            : on failure returns -1
/****************************************************/
int write_something(int is_laughing)
{
   int write_result =  -1;
   /* let us use a temporary pointer */
   const char *temp_ptr = (is_laughing > 0) ? SAMPLE_LAUGH : SAMPLE_CRY;

   generic_fd = open("my_file.txt",O_CREAT | O_RDWR | O_APPEND);
   if (generic_fd < 0) {
      return write_result;
   }
   /* since this is only an example, let's compare directly  */
   if (write(generic_fd, temp_ptr, strlen(temp_ptr)) == strlen(temp_ptr)) {
      write_result = 0;
   }
   close(generic_fd);
   return write_result;
}

int main()
{
     /* let us assume that the  close() call never fails.
     create a sample content of main() that uses write_something()
     and then in the end generate a file descriptor leak.
     good luck .... */
     return 0;
}

Up next … fixing file descriptor leaks …

free counters

15
Jul
09

OS Resource Leaks

Anyone who works on an embedded platform or even in PC applications, should probably by now understand full well the implications of a resource leak.  Before we start delving into this matter however it is imperative that we have a full grasp of what types of resources the Operating System (OS) provides to a “user-land” application. For starters let me give you four resources which could possibly be leaked during execution of your program:

  • Memory leak – The most celebrated of all the leaks. Simplistically, the failure to inform the OS that you are finished using a memory area thus making the OS reserve that memory area for as long as the CPU is not reset or a “memory map” release is not automatically executed. This applies to malloc(), calloc(), realloc(), opendir(), mmap() and others of the like.
  • Thread leak – An OS can only allow a certain number of threads running at the same time for a particular process. Exiting (libc::exit()) unnecessary threads, is a MUST. For they share together with the other threads your process’ precious time quantum. And with special reference to thread-based implementations of server applications (like FTP, HTTP, LDAP, Discovery etc…), you may end up unable to serve further requests should you fail to properly account for your threads.
  • Process leak -like thread leaks, unnecessary processes should either be “killed” or “SIGTERM”ed. Don’t tell me you love rogue processes and good-for-nothing-zombies? Failure to properly end a process may result in an OS executing unecessary context switches for processes that dont really matter. Like threads, when your “server” or “daemon” application is serving multiple requests via “fork()”, “vfork()” or “clone()”, you might end up unable to server further requests from your clients if unnecessary processes are left to run idly.
  • File Descriptor / Handles Leaks - The operating system also limits the number of files that an application can open simultaneously. So if you have got a leak on this one, you are looking forward to an unnecessary debugging adventure. For starters, some open source libraries, automatically assert() when they fail to take a hold of a valid file descriptor.


Why are leaks dangerous?

A leak in the software realm is not like a leaky pipe that eventually floods up your room if left unfixed over long periods of time. (I left our faucet once fully open overnight and the next day the house was really a mess! The water distribution service bill was almost as messy.) A resource leak in software is pretty much the opposite though. Once there are leaks in certain areas of a process, be it memory or something else, that same process or other newly started programs, will eventually find themselves failing in certain system calls which would have succeeded in an otherwise zero-leak running environment.

Summarizing, “Leaks result to only one thing, the eventual depletion of resources of an otherwise perfectly “enough” system. “Enough” because theoretically, except for some cases, what an OS provides is enough for everything it was designed to do, and that include the facilitation of user-land processes.


What are the consequences of leaks?

The thing I hate most about leaks, is that it has the ability to affect other processes.  And in which case if they do, it is almost virtually impossible to detect. Think of a room full of people where somebody suddenly silently farts. Everyone suffers but when and where it happened nobody really knows. (hahahahah! PEACE!)

They too are extremely difficult to debug and it takes an enormous amount of time to pinpoint their exact location. Fixing them is not really the problem in most cases, but finding out where and when they happen is the most difficult of all.  Enumerating some effects :

  1. System fails to allocate memory. (libc :: malloc(), calloc(), etc…)
  2. System fails to start a new thread. (POSIX : pthread_create())
  3. System fails to start a new process. (libc :: fork(), vfork())
  4. System fails to open new files. (libc :: open(), fopen(), dup(), etc…)
  5. System fails on system calls. (libc :: socket(), pipe(), etc….)
  6. Applications start to run really really slow.
  7. Applications suddenly crash due to low level assert().
  8. You enter the debugging twilight zone with a ticket to the universal competition for patience. Just kidding.

Probably in simple programs, you dont have to worry too much. But if you are developing an all-too-powerful daemon  which has to exist while the system is online, (or as you morph from intern engineer level to non-assistant level), you have got to be paranoid of leaks. “Resource Leaks = MESS” remember that.


Isn’t my OS the sky-is-the-limit version?

Yes dudes and dudettes! No OS is “mugen” – meaning “infinite” in Japanese.  Let me cite some example just for fun (though completely unrelated and utterly useless) did you know that :

  • A server based encrypted data, is most likely be valid for only 5 minutes?
  • that the real limit of your system time or the Y2K bug is “2/7/2036″? and the UNIX Y2K bug in 2038?
  • The limit of a USB cable is around 3 – 5 meters depending on the speed you use?
  • That  a NETBIOS system browser  has to routinely list the domain every 12 minutes?

Now going back to the topic, OS resources are pretty much the same. How they came up with the limits, is I guess an arbitrary science. Limits are most probably based on a careful balance of available memory/actual resources versus the rough average of “extreme usage” and “ordinary usage”. For the moment, I do not question it because, as they are for me “enough”. Plus the fact that so many scientific minds have no major complaints about it, shows that it is well in a way “enough”.

Resources like file descriptors for MOST operating systems have process-wide and system-wide limitations.  Process-wide means that for each process you create, there is a limited amount of file descriptors it can open simultaneously. Or for multi-threading process,  thread count limit is the maximum amount of threads a process can run simultaneously.  System-wide limits however, is the count of all the particular in-use resource, regardless of the parent process.

Let us take for example some older versions of linux which can open up to a maximum of 256 files per process, and roughly 1K system-wide. Therefore, for as long as the 1K system-wide limit is not breached, any process can gain access to 256 files simultaneously at any given time. But should there be 4 other processes, each opening 250 files at the same time, then the 5th process cannot use its full 256 file limit anymore. (Check this out.)

Anyways, if you happen to be running on top of a unix platform, you might want to try “ulimit -a” bash command in your terminal, to see certain limits of your Operating system.


Can these limits be changed?

Yes. However, it is important to note here that changing a per-process limit be it a hard or soft limit might require some special process, like recompiling your kernel. For the most part though, commands like setrlimit(), ulimit() or sysctl() (via libc::system() command) can be called within the program to modify certain soft and hard limits. Note also that setting a particular hard limit to an unreasonable value and then allowing a process to go beyond the hard limit might cause the system to break down eventually. Besides, setting the limit for a particular resource, will never be a solution for a resource leak!

Next Up …  file descriptor leaks …

free counters

01
Jun
09

Embedded Software Paradigm (1)

One day in an undisclosed NASA facility…..

NASA Boss        : “Can somebody please go to MARS”
NASA Engineer    : “What for?”
NASA Boss        : “Upgrade the code for the MARS Rover?” (fictional)

For the general populace, software is just software, a set of instructions which dictates a computing device’s behavior. However for those of us who are initiated, “firmware”, is a bit different from streamline PC-application-development. Although they are fundamentally the same, I think that firmware development mentality is a bit different. I hope I did not lose you there. And please take note, I never said firmware is more difficult.

“A BIT BUT SIGNIFICANTLY DIFFERENT”

PC-Applications and firmware are differentiated by only one aspect. They are differentiated, if not stating the obvious, only by the environment/device on which they should run on. Or more technically what we call as the “target”. It follows too that firmware, and its development, is subject to the nuances and frailties of an embedded  device. Please refer to previous blog’s [1][2]and [3]. And these “nuances” have a  direct and significant effect on the development mentality and process. This blog aims to discuss tangibly and simply, certain paradigm adjustments of each “embedded device limitation”.

A. Embedded devices generally, is “lesser” than the PC.
The term “lesser” here pertains to a multitude of aspects. For the moment let us limit it to computing power, and functional/program memory.

1. Effect of having Lesser computing power.
“Computing power” is really a very difficult subject. If you want to be as technical about it, and if you don’t mind to nosebleed a little, you can refer to “The Computer Engineering Handbook By Vojin G. Oklobdzija”. For the mean time, think of “computing power” as the computer’s “chi” or life force. The more you have it the more it can pump up the computing process. The lesser you have of it, the slower the system process.

Almost a decade ago, we had a a problem for a micro-controller. We were to ensure that at any point in time, our design should be able to save whatever temporary data there was inside the buffers into an EEPROM (Electrically Erasable Programmable Read-Only Memory).  On normal operating conditions we really didn’t have a problem. The problem however, occurred when the device lost power from its battery source.  Our design was to be good enough save the “cached” data.

Our solution then was two fold. (a)Impliment a fast Interrupt Service Routine (ISR) which executed the “save-data” mechanism and (b) the save-data mechanism was to beat the decay rate of the systems power supply.

Implementing the ISR was not the problem. The problem really was the execution speed of  our “store-data” function. Simple analysis showed that the speed of such a “transaction” relied mainly on :
1. the speed of the cpu or its computing power. (“how fast can it execute one command”)
2. the amount of time for one “write” transaction in the EEPROM. (“write-cycle”)
3. the amount of cached-data to be saved.

Anyways, I hope you catch my drift. Lesser computing power in the above example was really bane for what we had in mind. If only our micro-controller could execute billions of transactions in one second, then i think we would have had lesser factors to worry about. But then again, a better controller would have skyrocketed the cost of materials.

2. Embedded devices have lesser functional memory

Memory here is categorical and does not refer to the number of images inside your mobile phone or digital SLR, nor the number of emails your Blackberry Smart Phone can keep. “Memory” here refers to the number of features / capabilities a computing device has in its disposal.

To emphasize further, let us say, that the brain remembers all the different skill sets you have. Like riding a bike, treading in water, writing a haiku, so on and so forth. The PC generally speaking, has enough brain space to store all these skill sets (plus more) and thereby allowing it to do a lot of things. Embedded devices on the other hand can only muster three or four. Some can store only one functionality.

A few years back, my boss wanted me to implement one special mechanism in our machine. For confidential purposes let us call it the “Lightning In A Bottle (LIAB)” mechanism. However LIAB’s original implementation was on top of the Java platform. For those unfamiliar with JAVA, simply think of it as the all useful VELCRO strap. Wherein if the opposite material is fibrous or “loop-full” enough, it will surely adhere to the trusty Velcro “hooks”.

Java is like that. A powerful programming base, developed with the “build / code once, run anywhere” principle in mind. Once you create a program in JAVA, you are almost ensured that it will run anywhere that supports JAVA. Whether it be MAC or INTEL or ZAURUS or what have you.

Anyways, during that time, my boss had two proposals for me :
1. Implement a simplified Java Virtual machine (create a special velcro strap)
2. Create my own version of the LIAB mechanism.
Due to the amount of risks option 1 entailed, I told my boss that option 2 is the best.

Bottom line, bar the very small technical disparity in my example,  an embedded device has so limited a functional memory that theoretically,  “the number of ways to catch a mouse” is very limited. Needless to say reinventing the wheel is not a “rare reality”.

*technical disparity – the differentiation was based on JAVA. But theoretically (“purist-tically”) this should be microprocessor  to microprocessor, or architecture versus architecture.

free counters

… Paradigm changes because Embedded Devices are usage-specific …

31
Oct
08

When the procesor Flops (Floating Point Operations)

One time, a very tired and exhausted friend and former student, needed to hurry up on a meeting and asked some support on the completion of a code. The code asked of him was simple, it was supposed to convert values ranging from 0.01 to 0.99 into its integral (whole number) form. Therefore, 0.01 becomes 1 and 0.99 becomes 100….oops just kidding 0.99 is 99.  Simplistically this can be done by multiplying the said values to 100.

The question however dawned on me then, why then would anyone need to convert real numbers into their integral form?

To better analyze let us Assume that these three values are to be added :
value 1 : 0.01     Value 2 : 0.2      Value 3 : 0.03

Coding this in C, we will have something like :
———————————————
#include <stdio.h>

int main(void)
{
   double val[3] = {1e-2,2e-1,3e-2};
   double total_val = 0;
   int cntr = 0;

   for (cntr=0; cntr<3; cntr++) {
      total_val += val[cntr];
   }

   printf(“total is : %e \n”,total_val);
   printf(“total is : %f \n”,(float)total_val);
   return 0;
}

Again let me warn you, my quest here is not to dissuade anybody from using real numbers. Or strictly “C-speaking”, dissuading the use of double or float data types. My goal is to learn something here and share what little I have learned. Since I was not from the very start a computer scientist nor systems architect, I take great stride in learning as much as I can to augment whatever learnings I have and become better in my profession.

Going back to the conversation at hand, what we don’t see in the above C-code is the logic of floating point manipulations. First off before I continue sharing, funny that after so many years in programming, it is only now that I took time to understand what floating points really are. I just treated them before like ordinary data types, just like integer and char. However,  like a shocking horror movie, and much to my surprise, it was not “just” a data type.

Simplistically, floating points are real numbers. They are called as such because the decimal point (or radix point) can be found anywhere within the number.  Thus, it is said that the radix point, “floats”. For example :

a. 1.23456 -> radix point is in between 1 and 2
b. 12.3456 -> radix point is in between 2 and 3

Again, why then would anyone need to convert real numbers into their integral form?

A. Learning 1 : Floating points are not innate to computing machines therefore, it has to find some extraneous way to preserve the data.

Computers do not operate on real numbers. Most operate on a binary system, the beloved “1″ and “0″. What it does however,  in my very rough understanding of it, is that for computers to express -134.56 it has to save or represent it in an integral manner.  For example :

a. -134.569  can be expressed as : -13456  x (10 exponent -2)

  • According to the mathematical experts : A real number can be expressed as : s x b exponent e
  • Where “s” is the signed number in this case “-13456″
  • Where “b” is the base in this case “10″
  • Where “e” is the exponent in this case “-2″
  • Therefore, to save float, computers have to preserve s, b and e.
  • And if a float in an architecture is a 32bit data, parts of the 32 bit will be set for “b” and “e”

Therefore in my coclusion, that by just declaring and initializing a double or a float data, i am led to believe that  some form of procedural or operating cost is automatically incurred when the cpu tries to analyze your input and then save on a bit level the corresponding levels for “s”, “b”, and “e”.

B. Learning 2 : Floating point operations require more processing power from the PC or microcomputer.

Floating point arithmetic is very difficult for limited resourced computers like embedded systems. For example, if we add 0.01 and 0.2, a simplified flow on how the  computer processes “0.01 + 0.2” is as follows :

1.  Check if the exponents are the same for both operands.  Represent both numbers 0.01 and 0.2 in their respective S x (B ^ e) form :

  • 1 x 10^-2
  • 2 x 10^-1

2.  If not of same exponents, get the lowest exponent value

  • 1 x 10^-2 -> lowest is -2

3.  Operate / Shift so as to make both operands use the lowest exponent value.

  • 2 x 10^-1 = 20 x 10^-2

4.  Now that they have the same exponents, add the values.

  • 20 x 10^-2 + 1 x 10^2
  • (20 + 1) x 10^2
  • 21 x 10^2

I dont think I have to expound any further. The above example speaks on how heavy it is to manipulate real numbers, or float / double data types. (Wait till you do floating point multiplication !!! :D ) As the number of floating point operations increase, the overhead acquired from real number usage is increased too.

Though I will no disscuss anymore, one more problem with floating points is its limitation / range problem. Well the “proving” part of this problem, I leave to those who read this blog.

However, though it is very difficult to compute floats, real numbers undeniably exist in the real world. The computer or processor has to comply with this. More so, there are so many benefits that come along speedy and accurate computations. And due to these benefits, some add FPUs (Floating Point Unit) into their computers. It also is now no wonder for me why today’s supercomputers are rated against Flops -> Floating point operations per second. TRIVIA : At current, as of this writing, IBM’s Roadrunner holds the record for being first to sustain 1 petaflops (1 quadrillion operations/second).

In closing, I finally realized that there was some sense to the instruction given to my friend. I may be wrong here about my conjecture, but I believe that the instruction came from an embedded engineer. I also remembered one of the advices, my previous embedded supervisor (in another company) gave me, “Avoid Floating.”

Again as I have said, I am not dissuading anybody from using floating points.

However, this learning has undoubtedly granted me the following realizations :

1. use floating points sparingly, when creating embedded applications
2. in optimizing and speeding up solutions, check out and minimize the number of float computations
3. should i need a real number, i would be inclined to use double
4. I will always ask for a double cheese burger in McDonalds (hahahahah :D just kidding …)

I hope you enjoyed this really really geeky blog from me. Let us enjoy learning together. :D

Resources :

1. http://en.wikipedia.org/wiki/IEEE_754
2. http://bytes.com/forum/thread161561.html
3. http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/arith.flpt.html
4. http://en.wikipedia.org/wiki/Flops

free counters

07
Oct
08

The Multi-Dimensional Software Engineer

I am an Electronics and Communications Engineering graduate. When I was 10 or 12, I got fascinated with computers and learned how to code BASIC (Beginners All Purpose Symbolic Instruction Code.). My father introduced me to transistors and electronics by the age of 13 or 14. Nothing impressive as compared to the proteges I read on the news but good enough I guess to spark the fire within, so to speak.

I am currently deployed in Japan, working as an outsourced engineer. Be it so, I take pride in my job and the skills acquired as an outsourced Japanese speaking / writing engineer. (hahahah my nihongo is really embarassing, but my japanese friends say it is enough for now.)

For the past 7 or 8 years, I had been into firmware programming. I have met some of the most brilliant engineers both in Philippines and Japan. I have evolved from the “if-then” ( of BASIC to the “int main()” single thread of non-RTOS “C/C++” to the more complex multi-process multi-thread world of RTOS. I have coded several firmware specifically geared towards networking and some stuff with the Operating System/Kernel. In my current level of learning, I have come to the following conclusion that :

“A software engineer is multi-dimensional. It is fascinating, dumbfounding and very complex!!!”

I want to be good at what I do. I try to study as much though I may not live long enough to study everything. :D In my long list of things to study, are the following :

  • Basic and Advacned UML – I know UML but not that great at it, I want to be good at this one.
  • Extreme Programming – Need to learn so that I can execute at will.
  • CMMI – (Capability Maturity Model Integration) Industry standard on development approaches
  • Java(SE) – hahahah my good friend Abraham suggests this. And i think it is time for an upgrade on my part.

Before you proceed though, I want to caution you. What you might read here may be impossible to you as of the moment. But I hope you fear not the many challenges that you will face in your journey towards whatever visions you may have. Focus more on the joy of obtaining the skills through hard-earned study and effort. My Physics teacher once taught me, that “There is no room in learning when there is fear.” So fear not because it is difficult but hope that you may grow. Never compare yourself with others but compare yourself to yourself the day before yesterday. :) So what if we all may not learn it all, but for the very least we have done due dilligence over what is asked of us as professional Software developers.

And should you be the type not motivated by optimism, Fear THIS : “The day will come when you will be measured. And the people hanging on the balance might be your wife and kids or probably a lifestyle you have grown accustomed to. Should you fail on that day, or should you be found wanting in the amount of dignity you put into your profession, I assure you the backlash will be so great that you will never ever grow out of it. An indellible, scar to your sense of professionalism.” The lack or preparation is the kiss of death, as one of the TV sitcoms (Will and Grace???) once said. Did I scare you enough? :) hahahah so be OPTIMISTIC. :D

Do we need to be perfect? Hell NO! I too make tons of mistakes, some really really embarassing. Basic coding misses and vague design specifications are but some of the many mistakes I make on a daily basis. But be that as it may, our will to do things right (never repeating my mistakes) and the conscious effort to learn so that we can do things right, are more important than anything else. And that shines beyond our mistakes.

With that said, as far as I have read and learned personally, I write my learnings here. My learnings I share with those who deem themselves of lower skill level than I am and also for those better than me so that I can correct my viewpoint/s. I share this so that those who aspire for more can use all or parts of it, for their growth.

In my opinion, a software engineer’s dimension is as follows :

  1. Test Engineer – As essential to the design the code and almost everything else.
  2. Programmer – Of syntax and discipline.
  3. Developer – Of designing and planning micro, mini and large scale modules.
  4. Administration Quality and Process Engineer – Of hastening the work flow process and at the same time increasing quality levels
  5. Architect – Of designing micro, mini and super scale architectures on which all other modules will base on
  6. Inventor – Of creating wonderful new technologies.
  7. Technocrat – Being “THE MAN”, Defending the business interests with superior technology

1. Test Engineer – Any engineer should be a good tester. My point here is simple, testing is an imperative step within the software development life cycle. The ability to create test routines should be fundamental to any engineer. And such tests should cover normal branches and to an EXTREME extent hypothetical cases which may have very low probability of happening. The better the testing mentality of the engineer, the higher the possibility that the code will be robust. Let me set an example :

/********** Code version 1 ***************/
int32 SafeStringLen(char *my_string)
{
return strlen(my_string); /* where string
}

Any coder who thinks about test cases even before coding should realize that “my_string” has many possible values. And one of those values is NULL. Thus, with the above code, if “my_string” is null, strlen will crash into oblivion. My point being is that, the above code, to be able to achieve its best form, must be subjected to some form of mental agitation, a rigid test within the cranial fluids or mental-what-nots of the programmer.

One simple rule : EXECUTE WITH A TEST IN MIND!

WARNING : Testing rigidly does not mean we test redundantly and stupidly. For example, in the above code, you dont have to test all possibilities. Basically as my normal rule follows there are four basic points to test.

  • Lower than allowed  – values BELOW the range
  • More than allowed  – values BEYOND the possible range
  • Normal cases – valid values / test cases
  • Exceptions  – hypothetical cases

So if I were to test the above code SafeStringLen():

  1. Lower than allowed – Not Applicable in this case
  2. More than allowed – In heap I will allocate 2MByte and then fill it up with “1″, with last character set to the NULL Terminator. And then test.
  3. Normal cases – (a) my_string = “abcde”  (b) my_string = “ab” (1 or 2  normal flow tests will be fine)
  4. Exceptions / Special cases :
    1. my_string = NULL
    2. Zero-len string : my_string[0] = ”
    3. my_string = “abc” and then place the cross-compiled code in a big endian system ()

Remember that tests usually consume alot of time. And time costs money. That is why, any software engineer should be good in testing to minimize cost, while maximize robustness of codes.

2.     Programmer – As far as I am concerned, coding should be a software engineer’s passion. Coding’s focus is more on the programming language one is handling. For example if the design asks us to implement a dynamically created data, a coder, at his or her disposal should be intelligent enough to chose which one is best for a particular language. For Example :

/* C Language create dynamic data */
#include <stdio.h>
#include <malloc.h>
struct MyStruct {
  int32 a;
  int32 b;
};
int main()
{
  struct MyStruct *sample = NULL;
  sample = (struct MyStruct *)calloc(1,sizeof(MyStruct));
  if (!sample) {
      printf("calloc has failed!");
  }
  else {
     printf("calloc has succeeded!");
  }
  return 0;
}

6 Important points :
1. In the initialization of the pointer, “= NULL” is used. (NULL) wont work with pure C compilers.
2. And as a coder, I prefer calloc over malloc.
3. The tester in me knows calloc or malloc can fail that is why I check it.
4. In C compilers  “/* */” is the generic form of commenting.
5. main() should never be void, most use integer as return type
6. my current environment supports zero as successful operation so i return 0.

/* C++ Language create dynamic data */
#include <stdio.h>
#include <new>
struct MyStruct {
  int a;
  int b;
};
int main()
{
  struct MyStruct *sample(NULL);
  sample = new(std::nothrow)(struct MyStruct);
  if (!sample) {
      printf("new has failed!");
  }
  else {
     printf("new has succeeded!");
  }
  return 0;
}

5 CRITICAL points :
1. sample_struct(NULL) will work this time.
2. new() is used to create dynamic data, without the use of type casting.
3. the test for failure is used by throw(), (try-catch can also be used here)
4. For commenting, “//” can be used.
5. “std::”,this is in reference to the std namespace of C++

NOTE : I am not really a C++ developer. And the throw part is really new to me. I just learned it today. So from now on I will use it. :) . As for the old codes, I feel guilty about them.

Anyways, this level for me is the hardest to measure. But the bottomline here is that, any coder judges well what commands, impliments and sequences are to be used. He watches out for fork() and execv() calls which messes up processes (and raises zombies) and other system calls/commands with debilitating results (if not done properly). A good coder can easily see through problems within the code, moreso also find other ways to generate the same result with different sets of commands. As in the example above. It is imperative for a software engineer to obtain the discipline of a good coder before he or she progresses to the next level.

In my opinion, the following candidates have good chances in becoming good coders :

1. People with good command of the english language. (Specifications of syntax are almost always in english.)
2. People with good mathematic skills. (Mathematical solutions are architecture independent.)
3. Regardless of 1 and 2, people who aspire to be good coders, who study by reading, reading, reading, coding, and coding. :D

Note : Recently I took a C/C++ exam. I got a score of 35 / 50. Not bad, but that means i have a great amount of learning ahead of me. The test I took online, was an amalgam of C and C++. Next Time I will take the paid exam. Also, next year I intend to take java certification as I was irked by the sad state of affairs of one of our projects in my organization. And for an upgrade on my part.

Next Time : The Continuation … (well once I have time … )

free counters

03
Aug
08

On Stack :: Starzan and Cheetae Question

Rommel-kun: char Starzan[strlen(Cheetae)];
Rommel-kun: bossing do you remember our conversation regarding the above code?
Rommel-kun: What is the weakness of the above code?
GovZ Repuspolo: ok game .. in your opinion what is its weakness …
Rommel-kun: hmm …it is possible to fail if the string is empty…
Rommel-kun: zero…
Rommel-kun: ahhh
Rommel-kun: wait i will verify with msdn ….
GovZ Repuspolo: hahahahah
Rommel-kun: there it is!
Rommel-kun: strlen will return -1 if string contains invalid char
Rommel-kun: haha
Rommel-kun: is that the one?
GovZ Repuspolo: hahahah partially correct
Rommel-kun: partially? hahahaha it is like a new puzzle
GovZ Repuspolo: ok let us try to share the answer on your question …
GovZ Repuspolo: i will answer via my blog site …
Rommel-kun: thanks
GovZ Repuspolo: no problem

For everyone to understand, alwyn, rommel and I had dinner one time and i told them how i hated malloc, calloc and realloc. And as far as I can, I try to stay away from them, for reasons I will disclose further blogs. But for now, please accept the fact I hate the three commands.

In my avoidance of malloc() or any of its forms, I ended up using the following kind of technique :
==============================
/* C++ int32 = 32bit integer int8 = char */
int32 vic_sotto(0);
vic_sotto = function_that_returns_int32(2,5);
int8 jimmy_santos[vic_sotto + 1];
==============================
Assuming the above code is valid according to the coding guidelines, there is an innate flaw to the code. But first let me get back to rommel’s responses …

1. Rommel-kun: wait i will verify with msdn ….

-> 20 Points for attitude : ANY DEVELOPER SHOULD ALWAYS CHECK HIS OR HER PLATFORMS AUTHORITATIVE MANUAL … hmmm though this may be overkill, it is imperative, we check what we do not know. Especially windows implementations :) (i have so many stories about windows interoperability… hahahah i can write a document about it. nahhh … not my style :) )

2. Rommel-kun: strlen will return -1 if string contains invalid char

-> 10 points for attitude : Be patient until one finds an authoritative answer for a simple or complex question. In my experience though, most of the times the answer will elude you depending on the question’s level of difficulty.

However, as I have said, the answer is partially correct. In an embedded platform, and or in Windows for that matter, one should always think that any form of used resource has a limit. In this case, the STACK IS NOT INFINITE, NOTHING IS INFINITE. So therefore, aside from having a negative sized allocation, you may end up with something like :

==============================
/* C++ int32 = 32bit integer int8 = char */
int32 vic_sotto(0);
vic_sotto = function_that_returns_int32(2000000,50000000);
/* vic_sotto = 2,000,000,000 */
int8 jimmy_santos[vic_sotto + 1];
==============================

Now what do you think will happen? This is what I call stack failure (stack overflow). You see for each process ( or thread or task) we create, the OS allocates a defined size for its stack. (As for the idea of stack, check wiki for it :) ) This is where your local variables are stored temporarily. Once the function exits, (or thread/process/task for that matter) it releases the used memory.

The stack is the best place to get memory from, and the fastest. But in my experience it can be really really small in terms of size. For example, in one of the non-RTOS I have worked on, the stack size was only 8Kb. In one of my RTOS adventures, I came across a 2K byte stack. So whether you are in windows or not, or you have a hell of a freeway for stack … DONT ABUSE YOUR RESOURCES… its much like life too, never abuse your resources in life. :)

So to correct the above code, assuming we still want to do it the same way :
==============================
/* C++ int32 = 32bit integer int8 = char */
int32 vic_sotto(0);
vic_sotto = function_that_returns_int32(2000000,50000000);
/* guard the damn thing with a macro JOEY_DE_LEON = 20 */
if ((vic_sotto<0) || (vic_sotto + 1 > JOEY_DE_LEON)) {
return -1; /* function failure */
}
int8 jimmy_santos[vic_sotto + 1];
==============================

Remember :
1. When you can, if it is ok, use STACK.
2. The STACK is NEVER INFINITE.
3. Apply a guard appropriately.
4. hehehe, for political reasons, follow the damned coding guidelines!!! :)
5. Only the stupid and arrogant think they are perfectly perfect. :)

10
Jul
08

Gian and me : Data types And Data sizes

NOTE : Gian is a Java developer for the company I work for. She likes playing the guitar and Japanese songs. Thanks Gian for granting permission to post this.

Gian: In C how many bits are there in a long data type?
GovZ: I think it is 32 ….
GovZ: Wait, I am checking on the rules of c / c++
GovZ: it can be 32 to 64 bits ..
Gian: Is it platform dependent?
GovZ: yup …
GovZ: factors affecting data type include platform and i think compiler …
GovZ: there are certain platforms with small sized registers
Gian: Wow…
GovZ: nahhh, i dont know everything that is why i study a lot …
GovZ: like you do :)
GovZ: Ok I will post this on my blog …
GovZ: Gian and me : Data types And Data sizes …
Gian: Really? …
GovZ: Yup …
GovZ: In c … one has to understand the value of sizeof() …
GovZ: the command’s usefulness ….
GovZ: So if you do not know the size … Just call sizeof(int)
GovZ: If you want the platform’s integer size.
GovZ: Hey I am going to make a tuna sandwich … want one?
Gian: Can you send it via ym? the sandwich?
Gian: haha ….
Gian: I will not be able to sleep easy tonight …
GovZ: hahaha it’s ok …
GovZ: save some learnings for tomorrow kid ….
GovZ: To question fundamental questions ….
GovZ: Means maturity in our profession …..
GovZ: That is why we should be tormented souls …
GovZ: Because it is only in ignorance that we can achieve bliss …
GovZ: We should chose not to be ignorant …
GovZ: at least while we are young …
Gian: Hey GTG now :)
Gian: I may not be able to sleep if …
Gian: I continue to think about this …
GovZ: nyty
Gian: ok
Gian: gud nyt
Gian has signed out ….




Follow

Get every new post delivered to your Inbox.