Friday, September 27, 2013

Initrd and Initramfs

The Linux® initial RAM disk (initrd) is a temporary root file system that is mounted during system boot to support the two-state boot process. The initrd contains various executables and drivers that permit the real root file system to be mounted, after which the initrd RAM disk is unmounted and its memory freed. In many embedded Linux systems, the initrd is the final root file system. This article explores the initial RAM disk for Linux 2.6, including its creation and use in the Linux kernel.
 What's an initial RAM disk?
 The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real root file system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system.
The initrd contains a minimal set of directories and executables to achieve this, such as the insmod tool to install kernel modules into the kernel.
In the case of desktop or server Linux systems, the initrd is a transient file system. Its lifetime is short, only serving as a bridge to the real root file system. In embedded systems with no mutable storage, the initrd is the permanent root file system. This article explores both of these contexts.
Anatomy of the initrd

The initrd image contains the necessary executables and system files to support the second-stage boot of a Linux system.
Depending on which version of Linux you're running, the method for creating the initial RAM disk can vary. Prior to Fedora Core 3, the initrd is constructed using the loop device. The loop device is a device driver that allows you to mount a file as a block device and then interpret the file system it represents. The loop device may not be present in your kernel, but you can enable it through the kernel's configuration tool (make menuconfig) by selecting Device Drivers > Block Devices > Loopback Device Support. You can inspect the loop device as follows (your initrd file name will vary):

Listing 1. Inspecting the initrd (prior to FC3)

 # mkdir temp ; cd temp
# cp /boot/initrd.img.gz .
# gunzip initrd.img.gz
# mount -t ext -o loop initrd.img /mnt/initrd
# ls -la /mnt/initrd
 You can now inspect the /mnt/initrd subdirectory for the contents of the initrd. Note that even if your initrd image file does not end with the .gz suffix, it's a compressed file, and you can add the .gz suffix to gunzip it.
 Beginning with Fedora Core 3, the default initrd image is a compressed cpio archive file. Instead of mounting the file as a compressed image using the loop device, you can use a cpio archive. To inspect the contents of a cpio archive, use the following commands:
 
Listing 2. Inspecting the initrd (FC3 and later)
 # mkdir temp ; cd temp
# cp /boot/initrd-2.6.14.2.img initrd-2.6.14.2.img.gz
# gunzip initrd-2.6.14.2.img.gz
# cpio -i --make-directories < initrd-2.6.14.2.img
The result is a small root file system, as shown in Listing 3. The small, but necessary, set of applications are present in the ./bin directory, including nash (not a shell, a script interpreter), insmod for loading kernel modules, and lvm (logical volume manager tools).
Listing 3. Default Linux initrd directory structure
 # ls -la
#
drwxr-xr-x  10 root root    4096 May 7 02:48 .
drwxr-x---  15 root root    4096 May 7 00:54 ..
drwxr-xr-x  2  root root    4096 May 7 02:48 bin
drwxr-xr-x  2  root root    4096 May 7 02:48 dev
drwxr-xr-x  4  root root    4096 May 7 02:48 etc
-rwxr-xr-x  1  root root     812 May 7 02:48 init
-rw-r--r--  1  root root 1723392 May 7 02:45 initrd-2.6.14.2.img
drwxr-xr-x  2  root root    4096 May 7 02:48 lib
drwxr-xr-x  2  root root    4096 May 7 02:48 loopfs
drwxr-xr-x  2  root root    4096 May 7 02:48 proc
lrwxrwxrwx  1  root root       3 May 7 02:48 sbin -> bin
drwxr-xr-x  2  root root    4096 May 7 02:48 sys
drwxr-xr-x  2  root root    4096 May 7 02:48 sysroot
#
Of interest in Listing 3 is the init file at the root. This file, like the traditional Linux boot process, is invoked when the initrd image is decompressed into the RAM disk. We'll explore this later in the article.
Tools for creating an initrd
 Let's now go back to the beginning to formally understand how the initrd image is constructed in the first place. For a traditional Linux system, the initrd image is created during the Linux build process. Numerous tools, such as mkinitrd, can be used to automatically build an initrd with the necessary libraries and modules for bridging to the real root file system. The mkinitrd utility is actually a shell script, so you can see exactly how it achieves its result. There's also the YAIRD (Yet Another Mkinitrd) utility, which permits customization of every aspect of the initrd construction.
Manually building a custom initial RAM disk

Creation

To create an (initially empty) initrd use the following steps:
Note: Change RDSIZE to your required filesystem size. E.g. with RDSIZE =8192 you will get a 8MB ramdisk. BLKSIZE=1024
host > dd if=/dev/zero of=<filename> bs=<BLKSIZE> count=<RDSIZE>
host > mke2fs -vm0 <filename> < RDSIZE >
host > tune2fs -c 0 <filename>
host > dd if=<filename> b=<BLKSIZE> count=< RDSIZE > | gzip -v9 > ramdisk.gz
(or)
host > dd if=/dev/zero of=<filename> bs=<BLKSIZE> count=<RDSIZE>
host > mke2fs -F -m 0 -b <BLKSIZE> <filename> < RDSIZE >| gzip -v9 > ramdisk. 
Now, we have a (empty) gzipped ramdisk image with (extracted) size of < RDSIZE >.

Filling

To fill empty ramdisk created above with all files needed for ramdisk, mount the image and fill it. Content would be e.g. BusyBox and/or other applications and/or libraries.
host > mkdir mnt
host > gunzip ramdisk.gz
host > mount -o loop ramdisk mnt/
host > ... copy stuff you want to have in ramdisk to mnt...
host > umount mnt
host > gzip -v9 ramdisk
The resulting ramdisk.gz is now ready for usage. Note its size is smaller than <count> cause of compression.
Note: Don't forget to create/copy some basic /dev/xxx nodes to ramdisk.
Note: If BusyBox or applications in ramdisk are linked dynamically, don't forget to copy dynamic libraries (*.so) to ramdisk (to correct directory) as well.
Creating a utility to generate ramdisk
 Because there is no hard drive in many embedded systems based on Linux, the initrd also serves as the permanent root file system. Listing 4 shows how to create an initrd image. I'm using a standard Linux desktop so you can follow along without an embedded target. Other than cross-compilation, the concepts (as they apply to initrd construction) are the same for an embedded target.
Listing 4. Utility (mkird) to create a custom initrd
 
#!/bin/bash
 
 
# Housekeeping...
 
rm -f /tmp/ramdisk.img
 
rm -f /tmp/ramdisk.img.gz
 
 
# Ramdisk Constants
 
RDSIZE=4000
 
BLKSIZE=1024
 
 
# Create an empty ramdisk image
 
dd if=/dev/zero of=/tmp/ramdisk.img bs=$BLKSIZE count=$RDSIZE
 
 
# Make it an ext2 mountable file system
 
/sbin/mke2fs -F -m 0 -b $BLKSIZE /tmp/ramdisk.img $RDSIZE
 
 
# Mount it so that we can populate
 
mount /tmp/ramdisk.img /mnt/initrd -t ext2 -o loop=/dev/loop0
 
 
# Populate the filesystem (subdirectories)
 
mkdir /mnt/initrd/bin
 
mkdir /mnt/initrd/sys
 
mkdir /mnt/initrd/dev
 
mkdir /mnt/initrd/proc
 
 
# Grab busybox and create the symbolic links
 
pushd /mnt/initrd/bin
 
cp /usr/local/src/busybox-1.1.1/busybox .
 
ln -s busybox ash
 
ln -s busybox mount
 
ln -s busybox echo
 
ln -s busybox ls
 
ln -s busybox cat
 
ln -s busybox ps
 
ln -s busybox dmesg
 
ln -s busybox sysctl
 
popd
 
 
# Grab the necessary dev files
 
cp -a /dev/console /mnt/initrd/dev
 
cp -a /dev/ramdisk /mnt/initrd/dev
 
cp -a /dev/ram0 /mnt/initrd/dev
 
cp -a /dev/null /mnt/initrd/dev
 
cp -a /dev/tty1 /mnt/initrd/dev
 
cp -a /dev/tty2 /mnt/initrd/dev
 
 
# Equate sbin with bin
 
pushd /mnt/initrd
 
ln -s bin sbin
 
popd
 
 
# Create the init file
 
cat >> /mnt/initrd/linuxrc << EOF
 
#!/bin/ash
 
echo
 
echo "Simple initrd is active"
 
echo
 
mount -t proc /proc /proc
 
mount -t sysfs none /sys
 
/bin/ash --login
 
EOF
 
 
chmod +x /mnt/initrd/linuxrc
 
 
# Finish up...
 
umount /mnt/initrd
 
To create an initrd, begin by creating an empty file, using /dev/zero (a stream of zeroes) as input writing to the ramdisk.img file. The resulting file is 4MB in size (4000 1K blocks). Then use the mke2fs command to create an ext2 (second extended) file system using the empty file. Now that this file is an ext2 file system, mount the file to /mnt/initrd using the loop device. At the mount point, you now have a directory that represents an ext2 file system that you can populate for your initrd. Much of the rest of the script provides this functionality.
The next step is creating the necessary subdirectories that make up your root file system: /bin, /sys, /dev, and /proc. Only a handful are needed (for example, no libraries are present), but they contain quite a bit of functionality.
To make your root file system useful, use BusyBox. This utility is a single image that contains many individual utilities commonly found in Linux systems (such as ash, awk, sed, insmod, and so on). The advantage of BusyBox is that it packs many utilities into one while sharing their common elements, resulting in a much smaller image. This is ideal for embedded systems. Copy the BusyBox image from its source directory into your root in the /bin directory. A number of symbolic links are then created that all point to the BusyBox utility. BusyBox figures out which utility was invoked and performs that functionality. A small set of links are created in this directory to support your init script (with each command link pointing to BusyBox).
The next step is the creation of a small number of special device files. I copy these directly from my current /dev subdirectory, using the -a option (archive) to preserve their attributes.
The penultimate step is to generate the linuxrc file. After the kernel mounts the RAM disk, it searches for an init file to execute. If an init file is not found, the kernel invokes the linuxrc file as its startup script. You do the basic setup of the environment in this file, such as mounting the /proc file system. In addition to /proc, I also mount the /sys file system and emit a message to the console. Finally, I invoke ash (a Bourne Shell clone) so I can interact with the root file system. The linuxrc file is then made executable using chmod.
Finally, your root file system is complete. It's unmounted and then compressed using gzip.
Using  opensource utility to generate ramdisk
 genext2fs -b <RDSIZE> -d src -f dev.txt flashdisk.img
This example builds a filesystem from all the files in src , then device nodes are created based on the contents of the device file dev.txt.
An example device file follows:
drwx            /dev crw-    10,190  /dev/lcd brw-    1,0     /dev/ram0
This device list builds the /dev directory, a character device node /dev/lcd (major 10, minor 190) and a block device node /dev/ram0 (major 1, minor 0)

Kernel options

To make initrd work, you have to configure kernel properly:
#
# General setup
#
...
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
...
#
# UBI - Unsorted block images
#
...
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=1
CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024
...
Note: The ramdisk size e.g. 8192 above has to be configured for your individual setup.
Initramfs

To use initramfs a cpio archive is embedded directly into the kernel. I.e. you don't create an additional (ramdisk) image. Instead, the initial file system is directly incorporated into the kernel. With this, the kernel size increases by the file system size. It's like you embed above ramdisk directly into the kernel.

Creation

Cause initramfs is directly embedded in the the kernel, its creation is simpler. No dd & mount & gzip stuff like with ramdisk above. You simply have to fill a directory on your host with the target filesystem you like and then pass the path to this directory to the kernel build process.
Create target file system
host > mkdir target_fs
host > ... copy stuff you want to have in initramfs to target_fs...
Note: cpio system used for initramfs can't handle hard links. If you e.g. created your BusyBox using hard links, you will get a quite large initramfs cause each command is taken with its size and not as hard link. In cpio initramfs use symbolic/soft links instead.
Note: To be able to detect initramfs by kernel properly, the top level directory has to contain a program called init. This can be done by e.g. using a soft link from top level init to /bin/busybox
/init -> /bin/busybox
if you use BusyBox in your initramfs.

Kernel options

The only difference from creating an initrd is to give the kernel the path to the target file system you like to embed:
#
# General setup
#
...
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="<path_to>/target_fs>"...
#
# UBI - Unsorted block images
#
...
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=1
CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024
...

Thread synchronization:

Thread synchronization:
As we know, to run threads, we need to schedule them. In order to run them effectively, they need to be synchronized. Suppose one thread creates a brush and then creates several threads that share the brush and draw with it. The first thread must not destroy the brush until the other threads finish drawing. This requires a means of coordinating the sequence of actions in several threads.

One way is to create a global Boolean variable that one thread uses to signal another. The writing thread will set this parameter to TRUE and the reading thread might loop until it sees the flag change. This will definitely work, but the looping thread wastes a lot of processor time.

Instead, Win32 supports a set of synchronization objects such as mutexes, semaphores, events and critical sections. These are the system objects created by the object manager. All of them will work in a similar way. A thread that wants to perform some coordinated action waits for a response from one of these objects and proceeds only after receiving it. The scheduler removes the waiting objects from the dispatch queue so that they will not consume processor time

It is important to keep in mind that:
• A mutex object works like a narrow gate for one thread to pass at a time.
• A semaphore object work like a multi-lane gate that a limited number of threads can pass through together.
• An event object broadcasts a public signal for any listening thread to hear.
• A critical section object works like a mutex but only within a single process.

Mutexes, semaphores and events can coordinate threads in different processes, but critical sections are only visible to threads in a single process.

Mutexex: These are very much like critical sections except that they can be used to synchronize data across multiple processes. To do this, a thread in each process must have its own process-relative handle to a single mutex object.

Semaphores: These objects are used for resource counting. They offer a thread the ability to query the number of resources available; if one or more resources are available, the count of available resources is decremented. Thus semaphores perform the test and set operations automatically, that is, when you request a resource from a semaphore, the operating system checks whether the resource is available and decrements the count of the available resources without letting another thread interfere. Only after the resource count has been decremented does the system allow another thread to request a resource.

For example, let us say that a computer has three serial ports. No more than three threads can use the serial ports at any given time; each port can be assigned to one thread. This situation provides a perfect opportunity to use a semaphore. To monitor serial port usage, you can create a semaphore with a count of three - one for each port. A semaphore is signaled when its resource count is greater than zero and is non-signaled when the count is zero.
Because several threads can affect a semaphore's resource count, a semaphore, unlike a critical section or mutex, is not considered to be owned by a thread. This means that it is possible to have one thread wait for the semaphore object and another thread release the object.

Events: Even objects are the most primitive form of synchronization objects and they are quite different from mutexes and semaphores. Mutexes and semaphores are usually used to control access to data, but events are used to signal that some operation has been completed.

There are two different types of event objects - manual reset events and auto reset events. A manual reset event is used to signal several threads simultaneously to say that an operation has finished, and an auto reset event is used to signal a single thread to say that an operation has been completed.

Events are most commonly used when one thread performs initialization work and, when it finishes, signals another thread to perform the remaining work. The initialization thread sets the event to the non-signaled state and begins to perform the initialization. Then, after the initialization has been completed, the thread sets the event to the signaled state. Waiting for the event, the worker thread wakes up and performs the rest of the work.

For example, a process might be running two threads. The first thread reads data from a file into a memory buffer. After the data has been read, the first thread signals the second thread that it can process the data. When the second thread finishes processing the data, it might need to signal the first thread again, so that the first thread can read the next block of data from the file.

Critical sections: A critical section is a small section of the code that required exclusive access to some shared data before the code can execute. Of all synchronization objects, critical sections are the simplest to use, but they can be used to synchronize threads only within a single process. Critical sections allow only one thread at a time to gain access to a region of data.

Mutex Vs semaphore
A semaphore can be a Mutex but a Mutex can never be semaphore. This simply means that a binary semaphore can be used as Mutex, but a Mutex can never exhibit the functionality of semaphore.Both semaphores and Mutex (at least the on latest kernel) are non-recursive in nature.No one owns semaphores, whereas Mutex are owned and the owner is held responsible for them. This is an important distinction from a debugging perspective.

In case the of Mutex, the thread that owns the Mutex is responsible for freeing it. However, in the case of semaphores, this condition is not required. Any other thread can signal to free the semaphore by using the sem_post() function.

A Mutex, by definition, is used to serialize access to a section of re-entrant code that cannot be executed concurrently by more than one thread. A semaphore, by definition, restricts the number of simultaneous users of a shared resource up to a maximum number

Another difference that would matter to developers is that semaphores are system-wide and remain in the form of files on the filesystem, unless otherwise cleaned up. Mutex are process-wide and get cleaned up automatically when a process exits.

The nature of semaphores makes it possible to use them in synchronizing related and unrelated process, as well as between threads. Mutex can be used only in synchronizing between threads and at most between related processes (the pthread implementation of the latest kernel comes with a feature that allows Mutex to be used between related process).

According to the kernel documentation, Mutex are lighter when compared to semaphores. What this means is that a program with semaphore usage has a higher memory footprint when compared to a program having Mutex.
From a usage perspective, Mutex has simpler semantics when compared to semaphores.

A mutex is like a lock. A thread can lock it, and then any subsequent attempt to lock it, by the same thread or any other, will cause the attempting thread to block until the mutex is unlocked. These are very handy for keeping data structures correct from all the threads' points of view. For example, imagine a very large linked list. If one thread deletes a node at the same time that another thread is trying to walk the list, it is possible for the walking thread to fall off the list, so to speak, if the node is deleted or changed. Using a mutex to "lock" the list keeps this from happening.
Computer Scientist people will tell you that Mutex stands for Mutual Exclusion.
In Java, Mutex-like behaviour is accomplished using the synchronized keyword.
Technically speaking, only the thread that locks a mutex can unlock it, but sometimes operating systems will allow any thread to unlock it. Doing this is, of course, a Bad Idea. If you need this kind of functionality, read on about the semaphore in the next paragraph.
Similar to the mutex is the semaphore. A semaphore is like a mutex that counts instead of locks. If it reaches zero, the next attempt to access the semaphore will block until someone else increases it. This is useful for resource management when there is more than one resource, or if two separate threads are using the same resource in coordination. Common terminology for using semaphores is "uping" and "downing", where upping increases the count and downing decreases and blocks on zero.
Java provides a Class called Semaphore which does the same thing, but uses acquire() and release() methods instead of uping and downing.
With a name as cool-sounding as semaphore, even Computer Scientists couldn't think up what this is short for.
Unlike mutexes, semaphores are designed to allow multiple threads to up and down them all at once. If you create a semaphore with a count of 1, it will act just like a mutex, with the ability to allow other threads to unlock it.
Computer Scientists like to refer to the pieces of code protected by mutexes and semaphores as Critical Sections. In general, it's a good idea to keep Critical Sections as short as possible to allow the application to be as paralle as possible.

Vim Editor

Vim Installation: Red Hat / CentOS / Fedora:
  • rpm -ivh vim-common-...rpm vim-minimal-...rpm vim-enhanced-...rpm vim-X11-...rpm
  • yum install vim-common vim-minimal vim-enhanced vim-X11
Ubuntu / Debian:
  • apt-get install vim vim-common vim-gnome vim-gui-common vim-runtime
Compiling Vim from source:
  • Download vim source from http://vim.org
  • tar xzf vim-7.0.tar.gz
  • cd vim70
  • ./configure --prefix=/opt --enable-cscope
  • make
  • make install

Basic "vi" features
One edits a file in vi by issuing the command: vi file-to-edit.txt
The vi editor has three modes, command mode, insert mode and command line mode.
  1. Command mode: letters or sequence of letters interactively command vi. Commands are case sensitive. The ESC key can end a command.
  2. Insert mode: Text is inserted. The ESC key ends insert mode and returns you to command mode. One can enter insert mode with the "i" (insert), "a" (insert after), "A" (insert at end of line), "o" (open new line after current line) or "O" (Open line above current line) commands.
  3. Command line mode: One enters this mode by typing ":" which puts the command line entry at the foot of the screen.
Partial list of interactive commands:
Cursor movement:
Keystrokes Action
h/j/k/l Move cursor left/down/up/right
-/+ Move cursor down/up in first column
ctrl-d Scroll down one half of a page
ctrl-u Scroll up one half of a page
ctrl-f Scroll forward one page
ctrl-b Scroll back one page
M (shift-h) Move cursor to middle of page
H Move cursor to top of page
L Move cursor to bottom of page
w
5w
Move cursor a word at a time
Move cursor ahead 5 words
b
B
5b
Move cursor back a word at a time
Move cursor back a word at a time
Move cursor back 5 words
e
5e
Move cursor to end of word
Move cursor ahead to the end of the 5th word
0 (zero) Move cursor to beginning of line
$ Move cursor to end of line
G Move cursor to end of file
% Move cursor to the matching bracket.
Place cursor on {}[]() and type "%".
'. Move cursor to previously modified line.
'a Move cursor to line mark "a" generated by marking with keystroke "ma"
'A Move cursor to line mark "a" (global between buffers) generated by marking with keystroke "mA"
]' Move cursor to next lower case mark.
[' Move cursor to previous lower case mark.
Editing commands:
Keystrokes Action
i Insert at cursor
a Append after cursor
A Append at end of line
ESC Terminate insert mode
u Undo last change
U Undo all changes to entire line
o Open a new line
dd
3dd
Delete line
Delete 3 lines.
D Delete contents of line after cursor
dw
4dw
Delete word
Delete 4 words
cw Change word
x Delete character at cursor
r Replace character
R Overwrite characters from cursor onward
~ Change case of individual character
ctrl-a
ctrl-x
Increment number under the cursor.
Decrement number under the cursor.
/search_string{CR} Search for search_string
?search_string{CR} Search backwards (up in file) for search_string
/<search_string>{CR} Search for search_word
Ex: /<s>
Search for variable "s" but ignore declaration "string" or words containing "s". This will find "string s;", "s = fn(x);", "x = fn(s);", etc
n Find next occurrence of search_word
N Find previous occurrence of search_word
. repeat last command action.
Terminate session:
  • Use command: ZZ Save changes and quit.
  • Use command line: ":wq" Save (write) changes and quit.
  • Use command line: ":w" Save (write) changes without quitting.
  • Use command line: ":q!" Ignore changes and quit. No changes from last write will be saved.
  • Use command line: ":qa" Quit all files opened.


Advanced "vi" features

Interactive Commands:

  • Marking a line:
    Any line can be "Book Marked" for a quick cursor return.
    • Type the letter "m" and any other letter to identify the line.
    • This "marked" line can be referenced by the keystroke sequence "'" and the identifying letter. Example: "mt" will mark a line by the identifier "t".
      "'t" will return the cursor to this line at any time.
      A block of text may be referred to by its marked lines. i.e.'t,'b
  • vi line buffers:
    To capture lines into the buffer:
    • Single line: "yy" - yanks a single line (defined by current cursor position) into the buffer
    • Multiple lines: "y't" - yanks from current cursor position to the line marked "t"
    • Multiple lines: "3yy" - yank 3 lines. Current line and two lines below it.
    Copy from buffer to editing session:
    • "p" - place contents of buffer after current line defined by current cursor position.
  • vim: Shift a block of code left or right:
    • Enter into visual mode by typing the letter "v" at the top (or bottom) of the block of text to be shifted.
    • Move the cursor to the bottom (or top) of the block of text using "j", "k" or the arrow keys. Tip: Select from the first collumn of the top line and the last character of the line on the bottom line.
      Zero ("0") will move the cursor to the first character of a line and "$" will move the cursor to the last character of the line.
    • Type >> to shift the block to the right. Type << to shift the block to the left.
    Note: The number of characters shifted is controlled by the "shift width" setting. i.e. 4: ":set sw=4" This can be placed in your $HOME/.vimrc file.

Command Line:

  • command options:
    The vi command line interface is available by typing ":". Terminate with a carriage return. Example commands:
    • :help topic If the exact name is unknown, TAB completion will cycle through the various options given the first few letters. Ctrl-d will print the complete list of possibilites.
    • :set all - display all settings of your session.
    • :set ic - Change default to ignore case for text searches Default is changed from noignorecase to ignorecase. (ic is a short form otherwise type set ignorecase)
    • Common options to set:
      Full "set" Command Short form Description
      autoindent/noautoindent ai/noai {CR} returns to indent of previous line
      autowrite/noautowrite aw/noaw See tags
      errorbells/noerrorbells eb/noeb Silence error beep
      flash/noflash fl/nofl Screen flashes upon error (for deaf people or when noerrorbells is set)
      tabstop=8 ts Tab key displays 8 spaces
      ignorecase/noignorecase ic/noic Case sensitive searches
      number/nonumber nu/nonu Display line numbers
      showmatch/noshowmatch no abbreviations Cursor shows matching ")" and "}"
      showmode/noshowmode no abbreviations Editor mode is displayed on bottom of screen
      taglength tl Default=0. Set significant characters
      closepunct='".,;)]}   % key shows matching symbol.
      Also see showmatch
      linelimit=1048560   Maximum file size to edit
      wrapscan/nowrapscan ws/nows Breaks line if too long
      wrapmargin=0/nowrapmargin wm/nowm Define right margin for line wrapping.
      list/nolist   Display all Tabs/Ends of lines.
      bg=dark
      bg=light
        VIM: choose color scheme for "dark" or "light" console background.
  • Executing Unix commands in vi:
    Any UNIX command can be executed from the vi command line by typing an "!" before the UNIX command. Examples:
    • ":!pwd" - shows your current working directory.
    • ":r !date" - reads the results from the date command into a new line following the cursor.
    • ":r !ls -1" - Place after the cursor, the current directory listing displayed as a single column.
  • Line numbers:
    Lines may be referenced by their line numbers. The last line in the file can be referenced by the "$" sign. The entire file may be referenced by the block "1,$" or "%"
    The current line is referred to as "."
    A block of text may be referred to by its marked lines. i.e. 5,38 or 't,'b
  • Find/Replace:
    Example:
    • :%s/fff/rrrrr/ - For all lines in a file, find string "fff" and replace with string "rrrrr" for the first instance on a line.
    • :%s/fff/rrrrr/g - For all lines in a file, find string "fff" and replace with string "rrrrr" for each instance on a line.
    • :%s/fff/rrrrr/gc - For all lines in a file, find string "fff" and replace with string "rrrrr" for each instance on a line. Ask for confirmation
    • :%s/fff/rrrrr/gi - For all lines in a file, find string "fff" and replace with string "rrrrr" for each instance on a line. Case insensitive.
    • :'a,'bs/fff/rrrrr/gi - For all lines between line marked "a" (ma) and line marked "b" (mb), find string "fff" and replace with string "rrrrr" for each instance on a line. Case insensitive.
    • :%s/*$/ - For all lines in a file, delete blank spaces at end of line.
    • :%s/\(.*\):\(.*\)/\2:\1/g - For all lines in a file, move last field delimited by ":" to the first field. Swap fields if only two.
    For more info type:
    • :help substitute
    • :help pattern
    • :help gdefault
    • :help cmdline-ranges
  • Sorting:
    Example:
    • Mark a block of text at the top line and bottom line of the block. i.e. "mt" and "mb" on two separate lines.
    • :'t,'b !sort
  • Moving columns, manipulating fields and awk:
    :'t,. !awk '{print $3 " " $2 " " $1}' - This will reverse the order of the columns in the block.
    aaa bbb ccc              ccc bbb aaa
                  xxx yyy zzz   becomes->  zzz yyy xxx
                  111 222 333              333 222 111
  • Source Code Formatting: C++/Java
    • Use vim visual text selection to mark the lines to format (beautify):
      • eg. Whole file:
        • Go to first line in file: shift-v
        • Go to last line in file: shift-g
        • Select the key equals: =
      This will align all braces and indentations. For the equivalent in emacs see the YoLinux emacs tutorial.
  • Text Formatting:
  • Mark a block of text at the top line and bottom line of the block. i.e. "mt" and "mb" on two separate lines.
  • Example: ":'t,'b !nroff"
  • You can insert nroff commands i.e.:
      .ce 3 Center the next three lines
      .fi Fill text - left and right justify (default)
      .nf No Fill
      .ls 2 Double line spacing
      .sp Single line space
      .sv 1.0i Vertical space at top of page space
      .ns Turn off spacing mode
      .rs Restore spacing mode
      .ll 6.0i Line length = 6 inches
      .in 1.0i Indent one inch
      .ti 1.0i Temporarily one time only indent one inch
      .pl 8.0i Page length = 8 inches
      .bp Page break

    .fi
    .pl 2i
    .in 1.0i
    .ll 6.0i
    .ce
    Title to be centered
    .sp
    The following text bla bla bla bla bla bla bla bla bla bla 
    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla 
    bla bla bla bla bla bla bla bla bla bla bla bla bla bla 
    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla 
    bla bla bla bla bla
    Becomes:
    Title to be centered
    
              The following text bla bla bla bla bla bla bla bla
              bla  bla  bla  bla bla bla bla bla bla bla bla bla
              bla bla bla bla bla bla bla bla bla  bla  bla  bla
              bla  bla  bla  bla bla bla bla bla bla bla bla bla
              bla bla bla bla bla bla bla bla bla  bla  bla  bla
              bla bla bla bla
  • Spell Checking:
    • Mark a block of text by marking the top line and bottom line of the block. i.e. "mt" and "mb" on two separate lines.
    • :'t,'b !spell will cause the block to be replaced with misspelled words.
    • Press "u" to undo.
    • Proceed to correct words misspelled.
  • Macros:
    :map letter commands_strung_together :map - lists current key mappings
    Example - :map g n cwNEW_WORD{ctrl-v}{esc}i{ctrl-v}{CR}
    This example would find the next search occurrence, change the word and insert a line feed after the word. The macro is invoked by typing the letter "g".

    • Control/Escape/Carriage control characters must be prefixed with ctrl-V.
    • Choose a letter which is not used or important. (i.e. a poor choice would be "i" as this is used for insert)

  • Double spacing:
    • :%s/$/{ctrl-V}{CR}/g This command applies an extra carriage return at the end of all lines

  • Strip blanks at end of line:
    • :%s/{TAB}*$//

  • Strip DOS ctrl-M's:
    • :1,$ s/{ctrl-V}{ctrl-M}//
    Note: In order to enter a control character, one muust first enter ctrl-v. This is true throughout vi. For example, if searching for a control character (i.e. ctrl-m): /ctrl-v ctrl-M If generating a macro and you need to enter esc without exiting the vi command line the esc must be prefixed with a ctrl-v: ctrl-v esc.

  • Editing multiple files:
    • vi file1 file2 file3
    • :n Edit next file (file2)
    • :n Edit next file (file3)
    • :rew Rewind to the first file (file1)

  • Line folding: Many times one may encounter a file with folded lines or may wish to fold lines. The following image is of a file with folded lines where each "+" represents a set of lines not viewed but a marker line prefixed with a "+" is shown stating how many lines have been folded and out of view. Folding helps manage large files which are more easily managed when text lines are grouped into "folds".
    Example: vim /usr/share/vim/vim63/plugin/netrw.vim
    Keystrokes:
    Keystroke Description
    zR Unfold all folded lines in file.
    za Open/close (toggle) a folded group of lines.
    zA Open a closed fold or close an open fold recursively.
    zc Close a folded group of lines.
    zC Close all folded lines recursively.
    zd Delete a folded line.
    zD Delete all folded lines recursively.
    zE Eliminate all folded lines in file.
    zF Create "N" folded lines.

  • Hyper-Linking to include files:
    • Place cursor over the file name (i.e. #include "fileABC.h")
    • Enter the letter combination: gf (go to file)
    This will load file fileABC.h into vim. Use the following entry in your ~/.vimrc file to define file paths. Change path to something appropriate if necessary.
    "Recursively set the path of the project.
    set path=$PWD/**

  • Batch execution of vi from a command file: Command file to change HTML file to lower case and XHTML compiance:
    :1,$ s/<HTML>/<html>/g
    :1,$ s/</HTML>/</html>/g
    :1,$ s/<HEAD>/<head>/g
    :1,$ s/</HEAD>/</head>/g
    :1,$ s/<TITLE>/<title>/g
    :1,$ s/</TITLE>/</title>/g
    :1,$ s/<BODY/<body/g
    :1,$ s/</BODY/</body/g
    :1,$ s/<UL>/<ul>/g
    :1,$ s/</UL>/</ul>/g
    ...
    ..
    .
    :1,$ s/<A HREF/<a href/g
    :1,$ s/<A NAME/<a name/g
    :1,$ s/</A>/</a>/g
    :1,$ s/<P>/<p>/g
    :1,$ s/<B>/<b>/g
    :1,$ s/</B>/</b>/g
    :1,$ s/<I>/<i>/g
    :1,$ s/</I>/</i>/g
    :wq
    Execute: vi -e file-name.html < ViCommands-HtmlUpdate.txt [Potential Pitfall]: This mnust be performed while vim has none of the files open which are to be affected. If it does, vim will error due to conflicts with the vim swap file.


Tagging:
This functionality allows one to jump between files to locate subroutines.
  • ctags *.h *.c This creates a file names "tags".
Edit the file using vi.
  • Unix command line: vi -t   subroutine_name This will find the correct file to edit. OR
  • Vi command line: :tag subroutine_name This will jump from your current file to the file containing the subroutine. (short form :ta subroutine_name ) OR
  • By cursor position: ctrl-] Place cursor on the first character of the subroutine name and press ctrl-] This will jump to the file containing the subroutine. Note: The key combination ctrl-] is also the default telnet connection interrupt. To avoid this problem when using telnet block this telnet escape key by specifying NULL or a new escape key:
    • telnet -E file-name
    • telnet -e "" file-name
In all cases you will be entered into the correct file and the cursor will be positioned at the subroutine desired.
If it is not working properly look at the "tags" file created by ctags. Also the tag name (first column) may be abbreviated for convenience. One may shorten the significant characters using :set taglength=number
Tag Notes:
  • A project may have a tags file which can be added and referred to by: :set tags=tags\ /ad/src/project1.tags A "\" must separate the file names.
  • :set autowrite will automatically save changes when jumping from file to file, otherwise you need to use the :w command.
vim tagging notes: (These specific tag features not available in vi)
Tag Command Description
:tag start-of-tag-name_TAB Vim supports tag name completion. Start the typing the tag name and then type the TAB key and name completion will complete the tag name for you.
:tag /search-string Jump to a tag name found by a search.
ctrl-] The vim editor will jump into the tag to follow it to a new position in the file or to a new file.
ctrl-t The vim editor will allow the user to jump back a level.
(or :pop)
:tselect <function-name> When multiple entries exist in the tags file, such as a function declaration in a header file and a function definition (the function itself), the operator can choose by issuing this command. The user will be presented with all the references to the function and the user will be prompted to enter the number associated with the appropriate one.
:tnext When multiple answers are available you can go to the next answer.
:set ignorecase
(or :set ic)
The ignore case directive affects tagging.
:tags Show tag stack (history)
:4pop Jump to a particular position in the tag stack (history).
(jump to the 4th from bottom of tag stack (history).
The command ":pop" will move by default "1" backwards in the stack (history).)
or
:4tag
(jump to the 4th from top of tag stack)
:tnext Jump to next matching tag.
(Also short form :tn and jump two :2tnext)
:tprevious Jump to previous matching tag.
(Also short form :tp and jump two :2tp)
:tfirst Jump to first matching tag.
(Also short form :tf, :trewind, :tr)
:tlast Jump to last matching tag.
(Also short form :tl)
:set tags=./tags,./subdir/tags
Using multiple tag files (one in each directory).
Allows one to specify all tags files in directory tree: set tags=src/**/tags
Use Makefile to generate tags files as well as compile in each directory.
Links:


The ctags utility:
There are more than one version of ctags out there. The original Unix version, the GNU version and the version that comes with vim. This discussion is about the one that comes with vim. (default with Red Hat)
For use with C++:
  • ctags version 5.5.4:
    ctags *.cpp ../inc/*.h
  • ctags version 5.0.1:
    ctags --lang=c++ --c-types=+Ccdefgmnpstuvx *.cpp ../inc/*.h
To generate a tags file for all files in all subdirectories: ctags -R .
The ctags program which is written by the VIM team is called " Exuberant Ctags" and supports the most features in VIM.
Man page: ctags - Generate tag files for source code


Defaults file:
VIM: $HOME/.exrc
  • ~/.vimrc
  • ~/.gvimrc
  • ~/.vim/ (directory of vim config files.)
VI: $HOME/.exrc
Example:
set autoindent
         set wrapmargin=0
         map g hjlhjlhjlhlhjl
         "
         " S = save current vi buffer contents and run spell on it,
         "     putting list of misspelled words at the end of the vi buffer.
         map S G:w!^M:r!spell %^M
         colorscheme desert
         "Specify that a dark terminal background is being used.
         set bg=dark
Notes:
  • Look in /usr/share/vim/vim61/colors/ for available colorschemes. (I also like "colorscheme desert")
  • Alternate use of autoindent: set ai sw=3


Using vim and cscope:
Cscope was developed to cross reference C source code. It now can be used with C++ and Java and can interface with vim.
Using Cscope to cross reference souce code will create a database and allow you to traverse the source to find calls to a function, occurances of a function, variable, macros, class or object and their respective declarations. Cscope offers more complete navigation than ctags as it has more complete cross referencing.
Vim must be compiled with Cscope support. Red Hat Enterprise Linux 5 (or CentOS 5), includes vim 7.0 with cscope support. Earlier versions of Red Hat or Fedora RPM does not support Cscope and thus must be compiled.

Compiling Vim from source:

  • Download vim source from http://vim.org
  • tar xzf vim-7.0.tar.gz
  • cd vim70
  • ./configure --prefix=/opt --enable-cscope
  • make
  • make install

Using Cscope with vim:

The Cscope database (cscope.out) is generated the first time it is invoked. Subsequent use will update the database based on file changes. The database can be generated manually using the command i.e.: cscope -b *.cpp *.h or cscope -b -R .
Invoke Cscope from within vim from the vim command line. Type the following: :cscope find search-type search-string The short form of the command is ":cs f" where the "search-type" is:
Search Type Type short form Description
symbol s Find all references to a symbol
global g Find global definition
calls c Find calls of this function
called d Find functions that the specified function calls
text t Find specified text string
file f Open file
include i Find files that "#include" the specified file
Results of the Cscope query will be displayed at the bottom of the vim screen.
  • To jump to a result type the results number (+ enter)
  • Use tags commands to return after a jump to a result: ctrl-t To return to same spot as departure, use ctrl-o
  • To use "tags" navigation to search for words under the cursor (ctrl-\) instead of using the vim command line ":cscope" (and "ctrl-spaceBar" instead of ":scscope"), use the vim plugin, cscope_maps.vim [cache] When using this plugin, overlapping ctags navigation will not be available. This should not be a problem since cscope plugin navigation is the same but with superior indexing and cross referenceing.
    Place this plugin in your directory "$HOME/.vim/plugin"
    Plugin required for vim 5 and 6. This feature is compiled in with vim 7.0 on Red Hat Enterprise Linux 5 and CentOS 5 and newer Linux OS's. Attempts to use the plugin when not required will result in the following error:
    E568: duplicate cscope database not added
  • Cycle through results:
    • Next result: :tnext
    • Previous result: :tprevious
  • Create a split screen for Cscope results: :scscope find search-type search-string (Short form: :scs f search-type search-string)
  • Use command line argument ":cscope -R": Scan subdirectories recursively
  • Use Cscope ncurses based GUI without vim: cscope
    • ctrl-d: Exit Cscope GUI

Cscope command line arguments:

Argument Description
-R Scan subdirectories recursively
-b Build the cross-reference only.
-C Ignore letter case when searching.
-fFileName Specify Cscope database file name instead of default "cscope.out".
-Iinclude-directories Look in "include-directories" for any #include files whose names do not begin with "/".
-iFiles Scan specified files listed in "Files". File names are separated by linefeed. Cscope uses the default file name "cscope.files".
-k Kernel mode ignores /usr/include.
Typical: cscope -b -q -k
-q create inverted index database for quick search for large projects.
-sDirectoryName Use specified directory for source code. Ignored if specified by "-i".
-u Unconditionally build a new cross-reference file..
-v Verbose mode.
file1 file2 ... List files to cross reference on the command line.

Cscope environment variable:

Environment Variable Description
CSCOPE_EDITOR Editor to use: /usr/bin/vim
EDITOR Default: /usr/bin/vim
INCLUDEDIRS Colon-separated list of directories to search for #include files.
SOURCEDIRS Colon-separated list of directories to search for additional source files.
VPATH Colon-separated list of directories to search. If not set, cscope searches only in the current directory.


Manually Generating file cscope.files
File: $HOME/bin/gen_cscope or /opt/bin/gen_cscope
#!/bin/bash
find ./ -name "*.[ch]pp" -print > cscope.files
cscope -b -q -k
Generates cscope.files of ".cpp" and ".hpp" source files for a C++ project. Note that this generates CScope files in the current working directory. The CScope files are only usefull if you begin the vim session in the same directory. Thus if you have a heirarchy of directories, perform this in the top directory and reference the files to be edited on the command line with the relative path from the same directory in which the CScope files were generated.


Also see:


Vim plugins:
Vim default plugins:
Vim comes with some default plugins which can be found in:
  • Red Hat / CentOS / Fedora:
    • RHEL4: /usr/share/vim/vim70/autoload/
    • Fedora 3:/usr/share/vim/vim63/plugin/
  • Ubuntu / Debian:
    • Ubuntu 6.06: /usr/share/vim/vim64/plugin/
Additional custom plugins:
User added plugins are added to the user's local directory: ~/.vim/plugin/ or ~/.vimrc/plugin/



Default vim plugins:

File Explorer / List Files: explorer.vim

Help is available with the following command: :help file-explorer

Command Description
:Explore List files in your current directory
:Explore directory-name List files in specified directory
:Vexplore Split with a new vertical window and then list files in your current directory
:Sexplore Split with a new horizontal window and then list files in your current directory
The new window buffer created by ":Vexplore" and ":Sexplore" can be closed with ":bd" (buffer delete).


Additional custom plugins:

CScope: cscope_maps.vim

See cscope and vim description and use in this tutorial above.


Tabbed pages: minibufexpl.vim

This plugin allows you to open multiple text files and accessed by their tabs displayed at the top of the frame.
Keystroke Description
:bn New buffer
:bd Buffer delete
:b3 Go to buffer number 3
ctrl-w followed by "k" New buffer. Puts curson in upper tabbed portion of window. Navigate with arrow keys or "h"/"l".
:qa Quit vim out of all buffers
tab The "tab" key jumps between tabbed buffers.
Recommended ~/.vimrc file entry:
"Hide abandon buffers in order to not lose undo history.
set hid
This vim directive will allow undo history to remain when switching buffers.
The new window buffer tab created can be closed with ":bd" (buffer delete).
Links:



Alternate between the commensurate include and source file: a.vim

Most usefull when used with the vim plugin "minibufexpl.vim" Usefull for C/C++ programmers to switch between the source ".cpp" and commensurate ".hpp" or ".h" file and vice versa.

Command Description
:A switches to the header file corresponding to the current file being edited (or vise versa)
:AS splits and switches
:AV vertical splits and switches
:AT new tab and switches
:AN cycles through matches
:IH switches to file under cursor
:IHS splits and switches
:IHV vertical splits and switches
:IHT new tab and switches
:IHN cycles through matches
If you are editing fileX.c and you enter ":A" in vim, you will be switched to the file fileX.h
Links:


Vim tip:
Using a mousewheel with vim in an xterm. Place in file $HOME/.Xdefaults
XTerm*VT100.Translations: #override n 
: string("0x9b") string("[64~") n 
: string("0x9b") string("[65~")


Links:

Wednesday, September 18, 2013

Algorithms complexity

Searching

Algorithm Data Structure Time Complexity Space Complexity


Average Worst Worst
Depth First Search (DFS) Graph of |V| vertices and |E| edges - O(|E| + |V|) O(|V|)
Breadth First Search (BFS) Graph of |V| vertices and |E| edges - O(|E| + |V|) O(|V|)
Binary search Sorted array of n elements O(log(n)) O(log(n)) O(1)
Linear (Brute Force) Array O(n) O(n) O(1)
Shortest path by Dijkstra,
using a Min-heap as priority queue
Graph with |V| vertices and |E| edges O((|V| + |E|) log |V|) O((|V| + |E|) log |V|) O(|V|)
Shortest path by Dijkstra,
using an unsorted array as priority queue
Graph with |V| vertices and |E| edges O(|V|^2) O(|V|^2) O(|V|)
Shortest path by Bellman-Ford Graph with |V| vertices and |E| edges O(|V||E|) O(|V||E|) O(|V|)

Sorting

Algorithm Data Structure Time Complexity Worst Case Auxiliary Space Complexity


Best Average Worst Worst
Quicksort Array O(n log(n)) O(n log(n)) O(n^2) O(n)
Mergesort Array O(n log(n)) O(n log(n)) O(n log(n)) O(n)
Heapsort Array O(n log(n)) O(n log(n)) O(n log(n)) O(1)
Bubble Sort Array O(n) O(n^2) O(n^2) O(1)
Insertion Sort Array O(n) O(n^2) O(n^2) O(1)
Select Sort Array O(n^2) O(n^2) O(n^2) O(1)
Bucket Sort Array O(n+k) O(n+k) O(n^2) O(nk)
Radix Sort Array O(nk) O(nk) O(nk) O(n+k)

Data Structures

Data Structure Time Complexity Space Complexity

Average Worst Worst

Indexing Search Insertion Deletion Indexing Search Insertion Deletion
Basic Array O(1) O(n) - - O(1) O(n) - - O(n)
Dynamic Array O(1) O(n) O(n) O(n) O(1) O(n) O(n) O(n) O(n)
Singly-Linked List O(n) O(n) O(1) O(1) O(n) O(n) O(1) O(1) O(n)
Doubly-Linked List O(n) O(n) O(1) O(1) O(n) O(n) O(1) O(1) O(n)
Skip List O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n) O(n) O(n) O(n) O(n log(n))
Hash Table - O(1) O(1) O(1) - O(n) O(n) O(n) O(n)
Binary Search Tree O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n) O(n) O(n) O(n) O(n)
Cartresian Tree - O(log(n)) O(log(n)) O(log(n)) - O(n) O(n) O(n) O(n)
B-Tree O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)
Red-Black Tree O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)
Splay Tree - O(log(n)) O(log(n)) O(log(n)) - O(log(n)) O(log(n)) O(log(n)) O(n)
AVL Tree O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

Heaps

Heaps Time Complexity

Heapify Find Max Extract Max Increase Key Insert Delete Merge
Linked List (sorted) - O(1) O(1) O(n) O(n) O(1) O(m+n)
Linked List (unsorted) - O(n) O(n) O(1) O(1) O(1) O(1)
Binary Heap O(n) O(1) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(m+n)
Binomial Heap - O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n))
Fibonacci Heap - O(1) O(log(n))* O(1)* O(1) O(log(n))* O(1)

Graphs

Node / Edge Management Storage Add Vertex Add Edge Remove Vertex Remove Edge Query
Adjacency list O(|V|+|E|) O(1) O(1) O(|V| + |E|) O(|E|) O(|V|)
Incidence list O(|V|+|E|) O(1) O(1) O(|E|) O(|E|) O(|E|)
Adjacency matrix O(|V|^2) O(|V|^2) O(1) O(|V|^2) O(1) O(1)
Incidence matrix O(|V| ⋅ |E|) O(|V| ⋅ |E|) O(|V| ⋅ |E|) O(|V| ⋅ |E|) O(|V| ⋅ |E|) O(|E|)
Source: http://bigocheatsheet.com/

Tuesday, September 17, 2013

BASIC C Socket Programming In Unix For Newbies

 BASIC C Socket Programming In Unix For Newbies
=================================================

CONTENTS
=======================================

1. Introduction
2. Different types of Internet Sockets
3. Structures
4. Conversions
5. IP Addresses
6. Important Functions

   6.1. socket()
   6.2. bind()
   6.3. connect()
   6.4. listen()
   6.5. accept()
   6.6. send()
   6.7. recv()
   6.8. sendto()
   6.9. recvfrom()
   6.10. close()
   6.11. shutdown()
   6.12. gethostname()

7. Some words about DNS

8. A Stream Server Example

9. A Stream Client Example

10. Last Words

11. Copyright



1. INTRODUCTION
=======================================

Are you trying to learn c socket programming? Or do you think that it's hard stuff?
Well, then you must read this basic tutorial to get basic ideas and concepts and to start
to work with sockets. Don't expect to be a "socket programming master" after reading this 
tutorial. You'll only be that if you practice and read a lot.



2. DIFFERENT TYPES OF INTERNET SOCKETS
=======================================

In the first place I must explain what a socket is. In a very simple way, a socket is a way
to talk to other computer. To be more precise, it's a way to talk to other computers using 
standard Unix file descriptors. In Unix, every I/O actions are done by writing or reading
to a file descriptor. A file descriptor is just an integer associated with an open file and it
can be a network connection, a terminal, or something else.

About the different types of internet sockets, there are many types but I'll just describe two 
of them - Stream Sockets (SOCK_STREAM) and Datagram Sockets (SOCK_DGRAM). 

"And what are the differences between this two types?" you may ask. Here's the answer:

        Stream Sockets - they're error free; if you send through the stream socket three
    items "A,B,C", they will arrive in the same order - "A,B,C" ;
    they use TCP ("Transmission Control Protocol") - this protocol
    assures the items' order.
      
      Datagram Sockets - they use UDP ("User Datagram Protocol"); they're connectionless
    because you don't need to have an open connection as in Stream 
    Sockets - you build a packet with the destination information and
    send it out.

A lot more could be explained here about this two kind of sockets, but I think this is enough
to get the basic concept of socket. Understanding what a socket is and this two types of 
internet sockets is a good start, but you need to learn how to "work" with them. You'll learn
it in the next sections.


3. STRUCTURES
=======================================

The purpose of this section is not to teach you structures but to tell you how are they
used in C socket programming. If you don't know what a structure is, my advice is to read 
a C Tutorial and learn it. For the moment, let's just say that a structure is a data type 
that is an aggregate, that is, it contains other data types, which are grouped together into
a single user-defined type.

Structures are used in socket programming to hold information about the address.


The first structure is struct sockaddr that holds socket information.



 struct sockaddr{
  unsigned short  sa_family;    /* address family */
  char         sa_data[14];  /* 14 bytes of protocol address */
 };



But, there's another structure (struct sockaddr_in) that help you to reference to the socket's 
elements.



 struct sockaddr_in {
  short int      sin_family;  /* Address family */
  unsigned short int   sin_port;   /* Port */
  struct in_addr      sin_addr;   /* Internet Address */
  unsigned char      sin_zero[8]; /* Same size as struct sockaddr */
 };



Note: sin_zero is set to all zeros with memset() or bzero() (See examples bellow).



The next structure is not very used but it is defined as an union.

As you can see in both examples bellow (Stream Client and Server Client) , when I declare for 
example "client" to be of type sockaddr_in then I do client.sin_addr = (...)

Here's the structure anyway:


 struct in_addr {
  unsigned long s_addr;
 };



Finally, I think it's better talk about struct hostent. In the Stream Client Example, you can
see that I use this structure. This structure is used to get remote host information. 

Here it is:


 struct hostent
 {
   char *h_name;                 /* Official name of host.  */
   char **h_aliases;             /* Alias list.  */
   int h_addrtype;               /* Host address type.  */
   int h_length;                 /* Length of address.  */
   char **h_addr_list;           /* List of addresses from name server.  */
 #define h_addr  h_addr_list[0]  /* Address, for backward compatibility.  */
 };

This structure is defined in header file netdb.h.


In the beginning, this structures will confuse you a lot, but after you start to write some
lines, and after seeing the examples, it will be easier for you understanding them. To see
how you can use them check the examples (section 8 and 9).



4. CONVERSIONS
=======================================

There are two types of byte ordering: most significant byte and least significant byte. 
This former is called "Network Byte Order" and some machines store their numbers internally 
in Network Byte Order.

There are two types you can convert: short and long. 
Imagine you want to convert a long from Host Byte Order to Network Byte Order. What would you
do? There's a function called htonl() that would convert it =) The following functions are
used to convert :

 htons() -> "Host to Network Short" 
 htonl() -> "Host to Network Long"
 ntohs() -> "Network to Host Short"
 ntohl() -> "Network to Host Long"

You must be thinking why do you need this. Well, when you finish reading this document, it will
all seems easier =) All you need is to read and a lot of practice =)

An important thing, is that sin_addr and sin_port (from struct sockaddr_in) must be in Network
Byte Order (you'll see in the examples the functions described here to convert and you'll start
to understand it).

5. IP ADRESSES
=======================================

In C, there are some functions that will help you manipulating IP addresses. We'll talk about
inet_addr() and inet_ntoa() functions.


 inet_addr() converts an IP address into an unsigned long. An example:
 
 (...)

 dest.sin_addr.s_addr = inet_addr("195.65.36.12"); 

 (...)

 /*Remember that this is if you've a struct dest of type sockaddr_in*/



 inet_ntoa() converts string IP addresses to long. An example:

 (...)

 char *IP;

 ip=inet_ntoa(dest.sin_addr);

 printf("Address is: %s\n",ip);

 (...)

Remember that inet_addr() returns the address in Network Byte Order - so you don't need to 
call htonl().


6. IMPORTANT FUNCTIONS
=======================================

In this section I'll put the function' syntax, the header files you must include to call it,
and little comments. Besides the functions mentioned in this document, there are more, but 
I decided to put only these ones here. Maybe I'll put them in a future version of this 
document =) To see examples of these functions, you can check the stream client and stream 
server source code (Sections 8 and 9)


  6.1. socket()
  =============
 
 #include <sys/types.h>
 #include <sys/socket.h>

 int socket(int domain,int type,int protocol);


 Let's see the arguments:

  domain   -> you can set "AF_INET" (set AF_INET to use ARPA internet protocols)
       or "AF_UNIX" if you want to create sockets for inside comunication.
       Those two are the most used, but don't think that there are just 
         those. There are more I just don't mention them. 
  type     -> here you put the kind of socket you want (Stream or Datagram)
         If you want Stream Socket the type must be SOCK_STREAM
         If you want Datagram Socket the type must be SOCK_DGRAM
  protocol -> you can just set protocol to 0


 socket() gives you a socket descriptor that you can use in later system calls or
 it gives you -1 on error (this is usefull for error checking routines).


  6.2. bind()
  ===========

 #include <sys/types.h>
 #include <sys/socket.h>

 int bind(int fd, struct sockaddr *my_addr,int addrlen);

 
 Let's see the arguments:

  fd  -> is the socket file descriptor returned by socket() call
  my_addr -> is a pointer to struct sockaddr
  addrlen -> set it to sizeof(struct sockaddr)

 bind() is used when you care about your local port (usually when you use listen() ) 
 and its function is to associate a socket with a port (on your machine). It returns 
 -1 on error.

 You can put your IP address and your port automatically:

  server.sin_port = 0;          /* bind() will choose a random port*/
  server.sin_addr.s_addr = INADDR_ANY;  /* puts server's IP automatically */


 An important aspect about ports and bind() is that all ports bellow 1024 are reserved.
 You can set a  port above 1024 and bellow 65535 (unless the ones being used by other
 programs).


  6.3. connect()
  ==============

 #include <sys/types.h>
 #include <sys/socket.h>

 int connect(int fd, struct sockaddr *serv_addr, int addrlen);

 Let's see the arguments:

  fd    -> is the socket file descriptor returned by socket() call
  serv_addr -> is a pointer to struct sockaddr that contains destination IP 
        address and port
  addrlen   -> set it to sizeof(struct sockaddr)

 connect() is used to connect to an IP address on a defined port. It returns -1 on 
 error.


  6.4. listen()
  =============

 #include <sys/types.h>
 #include <sys/socket.h>

 int listen(int fd,int backlog);

 Let's see the arguments:

  fd  -> is the socket file descriptor returned by socket() call
  backlog -> is the number of allowed connections

 listen() is used if you're waiting for incoming connections, this is, if you want
 someone to connect to your machine. Before calling listen(), you must call bind()
 or you'll be listening on a random port =) After calling listen() you must call
 accept() in order to accept incoming connection. Resuming, the sequence of system calls
 is:

  1. socket()
  2. bind()
  3. listen()
  4. accept() /* In the next section I'll explain how to use accept() */

 As all functions above described, listen() returns -1 on error.


  6.5. accept()
  =============

 #include <sys/socket.h>

 int accept(int fd, void *addr, int *addrlen);

 Let's see the arguments:

  fd  -> is the socket file descriptor returned by listen() call
  addr    -> is a pointer to struct sockaddr_in where you can determine which host 
      is calling you from which port
  addrlen -> set it to sizeof(struct sockaddr_in) before its address is passed 
      to accept()

 
 When someone is trying to connect to your computer, you must use accept() to get the
 connection. It's very simple to understand: you just get a connection if you accept =)
 

 Next, I'll give you a little example of accept() use because it's a little different 
 from other functions.

 (...)

   sin_size=sizeof(struct sockaddr_in);
   if ((fd2 = accept(fd,(struct sockaddr *)&client,&sin_size))==-1){ /* calls accept() */
     printf("accept() error\n");
     exit(-1);
   }

 (...)

 From now on, fd2 will be used for add send() and recv() calls.


  6.6. send()
  ===========

 
 int send(int fd,const void *msg,int len,int flags);

 Let's see the arguments:

  fd -> is the socket descriptor where you want to send data to
  msg    -> is a pointer to the data you want to send
  len    -> is the length of the data you want to send (in bytes)
  flags  -> set it to 0


 This function is used to send data over stream sockets or CONNECTED datagram sockets.
 If you want to send data over UNCONNECTED datagram sockets you must use sendto().
 
 send() returns the number of bytes sent out and it will return -1 on error.


  6.7. recv()
  ===========


 int recv(int fd, void *buf, int len, unsigned int flags);

 Let's see the arguments:

  fd  -> is the socket descriptor to read from
  buf -> is the buffer to read the information into
  len  -> is the maximum length of the buffer
  flags  -> set it to 0


 As I said above about send(), this function is used to send data over stream sockets or 
 CONNECTED datagram sockets. If you want to send data over UNCONNECTED datagram sockets
 you must use recvfrom().

 recv() returns the number of bytes read into the buffer and it'll return -1 on error.


  6.8. sendto()
  =============

 int sendto(int fd,const void *msg, int len, unsigned int flags,
     const struct sockaddr *to, int tolen);

 Let's see the arguments:

  fd  -> the same as send()
  msg     -> the same as send()
  len -> the same as send()
  flags -> the same as send()
  to -> is a pointer to struct sockaddr
  tolen -> set it to sizeof(struct sockaddr)

 As you can see, sendto() is just like send(). It has only two more arguments : "to" 
 and "tolen" =) 

 sendto() is used for UNCONNECTED datagram sockets and it returns the number of bytes 
 sent out and it will return -1 on error.


  6.9. recvfrom()
  ===============

 int recvfrom(int fd,void *buf, int len, unsigned int flags
       struct sockaddr *from, int *fromlen);

 Let's see the arguments:

  fd -> the same as recv()
  buf -> the same as recv()
  len -> the same as recv()
  flags -> the same as recv()
  from -> is a pointer to struct sockaddr
  fromlen -> is a pointer to a local int that should be initialised 
      to sizeof(struct sockaddr)

 recvfrom() returns the number of bytes received and it'll return -1 on error.


  6.10. close()
  =============

 close(fd);

 close() is used to close the connection on your socket descriptor. If you call close(),
 it won't be no more writes or reads and if someone tries to read/write will receive an 
 error.


  6.11. shutdown()
  ================

 int shutdown(int fd, int how);

 Let's see the arguments:

  fd -> is the socket file descriptor you want to shutdown
  how -> you put one of those numbers:
    
     0 -> receives disallowed
     1 -> sends disallowed
     2 -> sends and receives disallowed

 When how is set to 2, it's the same thing as close().

 shutdown() returns 0 on success and -1 on error.


  6.12. gethostname()
  ===================

 #include <unistd.h>

 int gethostname(char *hostname, size_t size);

 Let's see the arguments:

  hostname  -> is a pointer to an array that contains hostname
  size    -> length of the hostname array (in bytes)

 
 gethostname() is used to get the name of the local machine.



7. SOME WORDS ABOUT DNS
=======================================

I created this section, because I think you should know what DNS is. DNS stands for "Domain
Name Service" and basically, it's used to get IP addresses. For example, I need to know the
IP address of queima.ptlink.net and through DNS I'll get 212.13.37.13 . 

This is important, because functions we saw above (like bind() and connect()) work with IP
addresses.

To see you how you can get queima.ptlink.net IP address on c, I made a little example:


/* <---- SOURCE CODE STARTS HERE ----> */

#include <stdio.h>
#include <netdb.h>   /* This is the header file needed for gethostbyname() */
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>


int main(int argc, char *argv[])
{
  struct hostent *he;

  if (argc!=2){
     printf("Usage: %s <hostname>\n",argv[0]);
     exit(-1);
  }

  if ((he=gethostbyname(argv[1]))==NULL){
     printf("gethostbyname() error\n");
     exit(-1);
  }

  printf("Hostname : %s\n",he->h_name);  /* prints the hostname */
  printf("IP Address: %s\n",inet_ntoa(*((struct in_addr *)he->h_addr))); /* prints IP address */
}
/* <---- SOURCE CODE ENDS HERE ----> */



8. A STREAM SERVER EXAMPLE
=======================================

In this section, I'll show you a nice example of a stream server. The source code is all
commented so that you ain't no possible doubts =)

Let's start:

/* <---- SOURCE CODE STARTS HERE ----> */

#include <stdio.h>          /* These are the usual header files */
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>


#define PORT 3550   /* Port that will be opened */ 
#define BACKLOG 2   /* Number of allowed connections */

main()
{
 
  int fd, fd2; /* file descriptors */

  struct sockaddr_in server; /* server's address information */
  struct sockaddr_in client; /* client's address information */

  int sin_size;


  if ((fd=socket(AF_INET, SOCK_STREAM, 0)) == -1 ){  /* calls socket() */
    printf("socket() error\n");
    exit(-1);
  }

  server.sin_family = AF_INET;         
  server.sin_port = htons(PORT);   /* Remember htons() from "Conversions" section? =) */
  server.sin_addr.s_addr = INADDR_ANY;  /* INADDR_ANY puts your IP address automatically */   
  bzero(&(server.sin_zero),8); /* zero the rest of the structure */

  
  if(bind(fd,(struct sockaddr*)&server,sizeof(struct sockaddr))==-1){ /* calls bind() */
      printf("bind() error\n");
      exit(-1);
  }     

  if(listen(fd,BACKLOG) == -1){  /* calls listen() */
      printf("listen() error\n");
      exit(-1);
  }

while(1){
  sin_size=sizeof(struct sockaddr_in);
  if ((fd2 = accept(fd,(struct sockaddr *)&client,&sin_size))==-1){ /* calls accept() */
    printf("accept() error\n");
    exit(-1);
  }
  
  printf("You got a connection from %s\n",inet_ntoa(client.sin_addr) ); /* prints client's IP */
  
  send(fd2,"Welcome to my server.\n",22,0); /* send to the client welcome message */
  
  close(fd2); /*  close fd2 */
}
}

/* <---- SOURCE CODE ENDS HERE ----> */



9. A STREAM CLIENT EXAMPLE
=======================================

/* <---- SOURCE CODE STARTS HERE ----> */

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>        /* netbd.h is needed for struct hostent =) */

#define PORT 3550   /* Open Port on Remote Host */
#define MAXDATASIZE 100   /* Max number of bytes of data */

int main(int argc, char *argv[])
{
  int fd, numbytes;   /* files descriptors */
  char buf[MAXDATASIZE];  /* buf will store received text */
  
  struct hostent *he;         /* structure that will get information about remote host */
  struct sockaddr_in server;  /* server's address information */

  if (argc !=2) {       /* this is used because our program will need one argument (IP) */
    printf("Usage: %s <IP Address>\n",argv[0]);
    exit(-1);
  }

  if ((he=gethostbyname(argv[1]))==NULL){ /* calls gethostbyname() */
    printf("gethostbyname() error\n");
    exit(-1);
  }

  if ((fd=socket(AF_INET, SOCK_STREAM, 0))==-1){  /* calls socket() */
    printf("socket() error\n");
    exit(-1);
  }

  server.sin_family = AF_INET;
  server.sin_port = htons(PORT); /* htons() is needed again */
  server.sin_addr = *((struct in_addr *)he->h_addr);  /*he->h_addr passes "*he"'s info to "h_addr" */
  bzero(&(server.sin_zero),8);

  if(connect(fd, (struct sockaddr *)&server,sizeof(struct sockaddr))==-1){ /* calls connect() */
    printf("connect() error\n");
    exit(-1);
  }

  if ((numbytes=recv(fd,buf,MAXDATASIZE,0)) == -1){  /* calls recv() */
    printf("recv() error\n");
    exit(-1);
  }

      buf[numbytes]='\0';

      printf("Server Message: %s\n",buf); /* it prints server's welcome message =) */

      close(fd);   /* close fd =) */
}