Being Expert Programmer – Coding Like an Artist

Featured

Tags

, ,


Today I re-read the article: On becoming an expert C programmer  from Chongo.

He help me determine to grow up as a great programmer day by day.

Some suggestions from him:
  • Persistence and determination are important keys to success
  • Write programs for other people.
  • Where possible, openly publish your code, perhaps on your web site.
  • Start small and work your way up … Some of my more popular code is actually small code fragments that are used by others over and over again.
  • Don’t be discouraged if nobody (or if only a few people) uses some of your early programs. What is important is to practice writing quality, well commented code … to maintain and fix that code.
  • In fact, write your code well (with comments, etc.) the first time … even if you are just writing a quick ‘hello world’ program.If a program is not worth writing well (perhaps because you think it is just a quick test code fragment …) then it is not worth writing in the first place.
  • Always, always, always write code as if it will last 30 years … some of your code can live that long.
  • Keep a collection of programs and samples.
  • Code like you are an artist … even if you feel your initial programs are not very good.
  • Interact with other programmers.
  • Find bugs in other people’s code.
  • Maintaining and fixing other people’s code is EXTREMELY important skill to have.
  • Back up and protect your sample code directory.

Note-Emacs Quick Reference


Refer: http://sean.wenzel.net/docs/emacs/quick_reference/

Notation

It is important to understand the notation commonly used in Emacs documentation. Most of the commands used in Emacs consist of a modifier key, in conjuction with one or more other keys.

The following are the common modifier keys:

CTRL – (C) – The CONTROL key.
META – (M) – Depending upon the terminal this is the ALT key. You may also use the ESC key to send META.
ESC – (ESC) – The ESCAPE key.
SHIFT – (S) – The SHIFT key.

In Emacs documentation it is common to use an abbreviated syntax when describing key sequences.

Instead of typing: CTRL-x CTRL-c you would type C-x C-c

This would represent holding down the CONTROL key and pressing the letter x, and then holding down theCONTROL key and pressing the letter c.

Another example, M-C-\ would represent holding down the the ALT key and CONTROL key and pressing the \key. You could also press and release the ESCAPE key hold down the CONTROL key and type the \ key.)

 

The following is a table of notation for other keyboard characters.

BACKSPC The BACKSPACE key.
SPC The SPACE bar.
ENTER The Enter key.
RET The Enter key.
TAB The TAB key.

 

Insert Mode?

This is a large place where Emacs differs from the Vi editor. In Vi you are either in Insert Mode or you are not. In order to execute commands other than key insertion you need to be out of Insertion Mode.

By using CTRL to escape the key sequences, Emacs allows you to use the commands at any point in your session.

For example, if you are in Vi and are inserting text you would have to type the following key sequence to save your file and return to Insert Mode:

ESC : w i

In Emacs (which is essentially always in “Insert Mode”) you would type the following:

C-x C-s

When viewed from the insertion mode view point, Vi doesn’t really save any keystrokes over Emacs as is commonly claimed by religious Vi advocates. (It actually requires more in most cases – the debate is really whether you’re in Insert Mode more than you are in Command Mode.)

 

What is really happening on the backend

Emacs is really a bunch of lisp routines programmed to operate on the text in the buffers.

These complex routines are accessed via key bindings on the front end. For example, when you type an s you are really telling Emacs to run the “self-insert-command” on the letter “s” which inserts the letter “s” at the current Point of Insertion (POI).

These backend commands can be accessed by typing M-x and then the command. For example, you open a file with the key binding C-x C-f. You can begin the same operation by typing M-x find-file.

Keyboard Shortcuts – (Key Bindings)

 

Key Binding Backend Function What it does
Quiting Things
C-x C-c M-x save-buffers-kill-emacs Quit Emacs. Exits out of Emacs. If you have open buffers that have been modified you are prompted to save the changes or discard them.
C-g M-x keyboard-quit If you are in a key request sequence that you can’t get out of, this returns you to the main buffer.
ESC ESC ESC M-x keyboard-escape-quit Does about the same thing as “C-g”
C-x k M-x kill-buffer Kills the current buffer. If the buffer is modified you are prompted to save it.
Files and directories
C-x C-f M-x find-file Open a file. This prompts for the name of a file. Once you have typed the file it opens it in the current buffer. If the file doesn’t exist, a blank buffer is created and Emacs allows you to begin typing (In this case the file will not exist on the system until you save it the first time). When you are typing the name of the file, you may use the tab key for filename completion. Also you may use M-p and M-n to go to the “previous” and “next” files that have been found using find-file.
C-x C-s M-x save-buffer Save a file. This saves the current buffer.
C-x d M-x dired (directory edit) More Here. This opens the directory specified (same way as find-file). It opens a buffer that lists the directory and allows you to perform operations on the files. listed in that directory. There are many features (to many to list here) that I will go into later. The most basic is to put the cursor on the file you want and hit ENTER. This does a find-file on that file.
C-x C-w M-x write-file Save the current file as…
C-x i M-x insert-file Insert a file
Buffers
C-x C-b M-x list-buffers This lists all buffers that are open. Emacs will typically open a buffer for each file that you visit (if you visit a new file without closing an old one). This will give you a list of all of the currently open buffers and will allow you to choose one from the list to switch to. (You will notice there are a couple of buffers beginning with an *. These are buffers used by Emacs to store output. You may kill them if you like.) If you have a file opened multiple times, you will see it listed as “file” and “file <2>”. This list of buffers is called the buffer ring
C-x b M-x switch-to-buffer Switch buffers. Does just that. The default buffer to switch to is the last one that you came from. (Typing C-x b ENTER repeatedly swaps between the current and the last buffer like ALT-TAB in windows.) You may type a TAB to complete the buffer name or TAB TAB to see a list of all of the buffer names you can type.
C-x k M-x kill-buffer Kills the current buffer. If the buffer is modified you are prompted to save it.
Windows and frames
C-x 0 M-x delete-window Kills the current window if more than one is visible. The buffer contained in the window is put back to the buffer ring.
C-x 1 M-x delete-other-windows If more than one window is visible in the current viewing frame, it removes all but the current window. Other buffers remain open but unviewed.
C-x 2 M-x split-window-vertically Splits the current viewing window into two windows placed one over the other.
C-x 3 M-x split-window-horizontally Splits the current viewing window into two windows placed side to side.
C-x o M-x other-window Switches between two windows.
Moving Around
C-p (up arrow) M-x previous-line Moves to the previous line.
C-n (down arrow) M-x next-line Moves to the next line.
C-f (right arrow) M-x forward-char Move forward one character
C-f (left arrow) M-x backward-char Move backward one character
M-f M-x forward-word Move forward one word.
M-b M-x backward-word Move backward one word.
C-a M-x beginning-of-line Move to the beginning of the line.
C-e M-x end-of-line Move to the beginning of the line.
M-a M-x backward-sentence Move backward one sentence.
M-e M-x forward-sentence Move forward one sentence.
C-v M-x scroll-up Page down one page.
M-v M-x scroll-down Page up one page.
M-> M-x end-of-buffer Moves to the end of the buffer.
M-< M-x beginning-of-buffer Moves to the beginning of the buffer.
C-x ] M-x forward-page In most modes does the same as M->
C-x [ M-x backward-page In most modes does the same as M-<
M-x goto-line Goto line. Does what it says. I can’t believe that there is not a default key binding for this (there is in XEmacs). I usually set up in my .emacs file the line (global-set-key “\M-g” ‘goto-line) which tells emacs to use M-g to run the command.
C-x o M-x other-window Switches between two windows.
C-x b M-x switch-to-buffer Switch buffers. Does just that. The default buffer to switch to is the last one that you came from. (Typing C-x b ENTER repeatedly swaps between the current and the last buffer like ALT-TAB in windows.) You may type a TAB to complete the buffer name or TAB TAB to see a list of all of the buffer names you can type.
C-SPC M-x set-mark-command Set Mark. Sets the mark to the current cursor location (point). Many commands are run on the region in between the point and mark.
C-x C-x M-x exchange-point-and-mark Exchanges the point (cursor location) and the mark.
Copying and Deleting stuff
C-_ M-x undo Undo. Undoes the last command. Can be done repeatedly.
C-d M-x delete-char Delete. Deletes the character under the point (after the cursor location).
BACKSPC M-x delete-backward-char Delete. Deletes the character before the point (before the cursor location).
M-d M-x kill-word Kills the word after the cursor location. Places on the kill ring.
M-BACKSPC M-x backward-kill-word Kills the word before the cursor location. Places on the kill ring.
C-k M-x kill-line Kill. Kills from the point to the end of the line. If repeated in a row, all of kills go into the same kill ring entry. Doing this once does not kill the new line at the end of the line. Repeating kills the newline. (for example, to kill the entire current line you would type C-a to move to the beginning of the line, and C-k C-k to kill the line and the newline).
C-y M-x yank Yank. Inserts the last entry from the kill ring. This can be done repeatedly.
M-y M-x yank-pop Cycle through the kill ring. Operates only if the last command was C-y
C-SPC M-x set-mark-command Set Mark. Sets the mark to the current cursor location (point). Many commands are run on the region in between the point and mark.
C-w M-x kill-region Kills the text between the mark and the point. Places on the kill ring. If the buffer is read only, places text on kill ring but warns about not being able to modify buffer.
M-w M-x kill-ring-save Places text between mark and point on the kill ring without killing text.
M-z M-x zap-to-char Deletes the text from the point up to the first instance of the specifed character
Search and Replace
C-s M-x isearch-forward Searches forward for the typed in text. C-s repeatedly searches forward multiple times. C-r switches direction of search.
C-r M-x isearch-backward Searches backward for the typed in text. C-r repeatedly searches backward multiple times. C-s switches direction of search.
M-% M-x query-replace Requests two patterns. Searches for first pattern and replaces with second pattern. Prompts at each match. (y for “yes replace”, n for “no next”, and ! for “replace all”).
C-u C-s M-x isearch-forward-regexp Search using a regular expression (as you type it).
M-C-% M-x query-replace-regexp Query replaces using a regular expression as the match.
Fixing up the text
TAB M-x indent-relative-maybe Indent the text according to the program mode you are in.
M-C-\ M-x indent-region Runs the “TAB” on all lines in the region.
M-u M-x upcase-word Upper casifies from the point to the end of the word.
M-l M-x downcase-word Lower casifies from the point to the end of the word.
C-t M-x transpose-chars Swaps characters before and after point with one another.
M-t M-x transpose-words Swaps current and next word with one another.
Dired – Directory Editor
C-x d M-x dired (directory edit) This opens the directory specified (same way as find-file). It opens a buffer that lists the directory and allows you to perform operations on the files. listed in that directory. There are many features (to many to list here) that I will go into later. The most basic is to put the cursor on the file you want and hit ENTER. This does a find-file on that file.
All commands in this section will only operate when you are in dired mode.
ENTER M-x dired-advertised-find-file Runs a C-x C-f (find-file) on the currently highlighted file. If the file is a directory, that directory will be opened in dired mode in a new buffer.
+ M-x dired-create-directory Prompts for a new directory name and creates it.
d M-x dired-flag-file-deletion Flags the current file for deletion.
~ M-x dired-flag-backup-files Flags all files ending with ~ for deletion (backup files)
x M-x dired-do-flagged-delete Delete all files marked for deletion (lines beginning with a D
D M-x dired-do-delete Deletes the currently highlighted file(s)
C M-x dired-do-copy Copies the currently highlighted file(s). Prompts for the destination.
R M-x dired-do-rename Renames the currently highlighted file(s). Prompts for the destination.
% R M-x dired-do-rename-regex Renames the currently highlighted file(s) using regular expressions. Prompts for a regex for from and to. (example – from regex \(.+\).txt, to regex \1.text)
m M-x dired-mark Mark the current file (marks with a *)
u M-x dired-unmark Unmark the current file (marks with SPC – SPC is blank)
* c M-x dired-change-marks Changes marks from first prompted letter to second prompted letter (example – * c D SPC – this will clear the deletion flag)
! M-x dired-do-shell-command Execute command on files marked with *. If your command contains a *, the list of marked files will be substituted in place and executed. Otherwise, the command will be played once with each marked file as the last argument.
g M-x revert-buffer Refreshes the current dired directory.
Help – Finding it
C-h t M-x help-with-tutorial Begins the Emacs built in Tutorial – highly recommended
C-h k M-x describe-key Allows you to type in key strokes and see the backend function that they are running.
C-h f M-x describe-function Allows you to type a function name and receive an explanation of what it does.
C-h v M-x describe-variable Allows you to type a Emacs variable name and see its contents
C-h b M-x describe-bindings Show a table of all of the current key bindings.
C-h i M-x info This one is possibly the most important. It enters into the Emacs help browser which allows you to read in depth about just about any subject relating to Emacs. (Basic commands are “l” for last, “n” for next, and “p” for previous.) I’ll talk more about this later.
Text Registers
C-x r s M-x copy-to-register Prompts for register letter. Saves text between point and mark in the register.
C-x r i M-x insert-register Prompts for register letter. Inserts the text saved in that register at point.
Position Registers
C-x r SPC M-x point-to-register Prompts for register letter. Saves point in register.
C-x r j M-x jump-to-register Prompts for register letter. Jumpts to point saved in that register.
Bookmarks
C-x r m M-x bookmark-set Prompts for bookmark name (default is buffer name). Saves point in a bookmark. This information is saved even if you close Emacs.
C-x r b M-x bookmark-jump Prompts for bookmark name. Jumpts to point saved in that bookmark (Even if you had closed your emacs session).
C-x r l M-x bookmark-bmenu-list Lists all saved bookmarks. Typing “j” when on a bookmark jumps to the file. Dired commands are used to delete bookmarks (Type “d” when on file – then “x” to delete bookmark).
Rectangles
C-x r o M-x open-rectangle Opens rectangle of whitespace between point and mark.
C-x r t M-x string-rectangle Replaces text between point and mark with rectangle of text (prompts for text). This is a major timesaver. Typing

C-x r t “Other Text” RET

would replace

<mark> Some Text
Some Text
Some Text<point>
   With <mark> Other Text
Other Text
Other Text<point>
C-x r k M-x kill-rectangle Kills text between point and mark and places in rectangle kill ring.
C-x r y M-x yank-rectangle Inserts last killed rectangle at point.

Note-Handy Websites For Viewing Source Code


Refer: http://www.lainoox.com/tag/glibc-source-code/

Through my travels on the world wide web, aka intertubes I have come across a few websites that are excellent for referencing and viewing source code on the web. There have also been a few great sites that I frequent regularly if I cannot seem to find the manual page on my system. Here is a list of some websites that I have found useful when looking for source code referencing or some great examples of how to structure my code.

 

View Linux Kernel Source at lxr.linux.no

A Norwegian site hosts the last 100 or so kernel versions in their entirety in ‘readable’ form online. You select the kernel version in which you want to search for a function definition or structure and it spits out the code for you. Another great feature of this site is that essentially every function is a link to the structure definition, function, macro etcetera. You may also browse through the site traversing the directories.

Link: http://lxr.linux.no/linux/

Use Google Code Search

The google code search has been handy many a time. It allows you to specify specific fields if you know what you are looking for, including Package, Language, File, Class, Function, and the type of license it is developed under. There is also just a blank search option if you last looked at the code with beer goggles on. The results provide the code in a text format so it is easily viewable within your browser and downloadable as well.

Link: http://www.google.com/codesearch

Refer To GNU C Library

While this is not simply a code reference, it is the bible on GNU C and everything related to it. This page includes viewing glibc in its entirety as a webpage, PDF, HTML per page. The format is somewhat irrelevant, what is important is the wealth of knowledge that this site has to offer. It includes function definitions, explanations and a large set of examples that is rivaled by any other site I have come across.

Link: http://www.gnu.org/software/libc/manual/

Download Glibc And Look

If you are doing anything C related then sometimes you just have to look at the code in vi. The glibc library has many samples of excellent code in which to base your code off of. Some of this is developed from the original C developers and exists from the early Unix days. Who better to mimic than those guys!

Link: ftp://ftp.gnu.org/gnu/glibc/

PHP Functions And Examples At PHP.net

While I mostly write about C code and examples, I have written a fair amount of PHP in my day. I am actually a huge fan of PHP and love not having to deal with types on occasion. For an interpreted language it is fairly fast and with the syntax being quite similar to C I feel at home. I frequent php.net for its useful storage of function definitions and user added PHP examples. While some examples are useful others are not as useful, so parse at your own peril. Just use the search function to get started.

Link: http://www.php.net

Linux Man Pages At Die.net

When I am unable to find a manual page for a certain function, or an obscure function in the kernel I usually google it. For the most part die.net comes up as the first results so I now just head there almost exclusively when looking for a man page. A great part of this is the “other suggested functions” are links, which saves me from opening another terminal tab to look those up.

Link: http://linux.die.net/man/

 

Note – C Programming Tips


Gcc

  1. gcc error – stray ‘\’ in program : http://www.giannistsakiris.com/index.php/2008/04/17/gcc-error-stray-%E2%80%98342%E2%80%99-in-program/
    1. Reason that copy code from pdf
    2. Solution: replace all non-neutral double quotation marks with the neutral ones.

  2. Emacs to compile c files with error: make: *** No targets specified and no makefile found.  Stop.
    1. Reason: use the compile command – make -k, but actually there is no make file there and all what I want is to compile a c file, such as 1.1.c
    2. Solution: just update the command to cc -o 1.1 1.1.c, related post: http://stackoverflow.com/questions/4623080/compile-c-problem-in-emacs-ubuntu

Note – C Programming Hints


Reference : http://www.ma.utexas.edu/documentation/seminar/Spring95/c-programming/c-programming.html#SEC2

C in Emacs

GNU Emacs has a special mode called c-mode for editing C source files. When you visit a .c or a .h file, Emacs automatically puts you in c-mode. One of the nice features of c-mode is automatic indentation. Type TAB at the beginning of a line to position the point (a.k.a. cursor) at the right indentation for that line. As usual, you can do C-h m to find out more about the current mode.

GNU Emacs also has a function for compiling source files. The Emacs function compile can help you compile C files. If you have errors in your C files, C-x ` will take you to the position in you C files where the first error occured. You’ll also see a message describing the error. Subsequent C-x ` commands take you to the next errors. To learn more about compile, type C-h C-f compile.

Emacs can find function definitions. For example, say you are developing a program that consists of several .c files. You remember that you need to change a function called foo(). Where is foo()? Assuming that you have run an indexing program called etags on your program files, all you have to do it to type M-. foo. Emacs will open the file containing the definition of foo(), and Emacs will put the point (a.k.a. cursor) at the beginning of the definition.

Finally GNU Emacs has a mode for debugging programs. M-x gdb starts the GNU debugger gdb in a buffer. For more information on this mode, type C-h C-f gdb. See section Debugging.

See section `Introduction’ in Introduction to Gnu Emacs, for more information about the Gnu Emacs editor.

Note – HTG Explains: The Linux Directory Structure Explained


Refer: http://www.howtogeek.com/117435/htg-explains-the-linux-directory-structure-explained/

  1. Filesystem Hierarchy Standard (FHS)
  2. / – The Root DirectoryEverything on your Linux system is located under the / directory, known as the root directory. You can think of the / directory as being similar to the C:\ directory on Windows – but this isn’t strictly true, as Linux doesn’t have drive letters. While another partition would be located at D:\ on Windows, this other partition would appear in another folder under / on Linux.
  3. /bin – Essential User BinariesThe /bin directory contains the essential user binaries (programs) that must be present when the system is mounted in single-user mode. Applications such as Firefox are stored in /usr/bin, while important system programs and utilities such as the bash shell are located in /bin. The /usr directory may be stored on another partition – placing these files in the /bin directory ensures the system will have these important utilities even if no other file systems are mounted. The /sbin directory is similar – it contains essential system administration binaries.
  4. /boot – Static Boot Files : The /boot directory contains the files needed to boot the system – for example, the GRUB boot loader’s files and your Linux kernels are stored here. The boot loader’s configuration files aren’t located here, though – they’re in /etc with the other configuration files.
  5. /cdrom – Historical Mount Point for CD-ROMs : The /cdrom directory isn’t part of the FHS standard, but you’ll still find it on Ubuntu and other operating systems. It’s a temporary location for CD-ROMs inserted in the system. However, the standard location for temporary media is inside the /media directory.
  6. /dev – Device Files :Linux exposes devices as files, and the /dev directory contains a number of special files that represent devices. These are not actual files as we know them, but they appear as files– for example, /dev/sda represents the first SATA drive in the system. If you wanted to partition it, you could start a partition editor and tell it to edit /dev/sda.This directory also contains pseudo-devices, which are virtual devices that don’t actually correspond to hardware. For example, /dev/random produces random numbers. /dev/null is a special device that produces no output and automatically discards all input – when you pipe the output of a command to /dev/null, you discard it.
  7. /etc – Configuration Files : The /etc directory contains configuration files, which can generally be edited by hand in a text editor. Note that the /etc/ directory contains system-wide configuration files – user-specific configuration files are located in each user’s home directory.
  8. /home – Home FoldersThe /home directory contains a home folder for each user. For example, if your user name is bob, you have a home folder located at /home/bob. This home folder contains the user’s data files and user-specific configuration files. Each user only has write access to their own home folder and must obtain elevated permissions (become the root user) to modify other files on the system.
  9. /lib – Essential Shared LibrariesThe /lib directory contains libraries needed by the essential binaries in the /bin and /sbin folder. Libraries needed by the binaries in the /usr/bin folder are located in /usr/lib.
  10. /lost+found – Recovered FilesEach Linux file system has a lost+found directory. If the file system crashes, a file system check will be performed at next boot. Any corrupted files found will be placed in the lost+found directory, so you can attempt to recover as much data as possible.
  11. /media – Removable MediaThe /media directory contains subdirectories where removable media devices inserted into the computer are mounted. For example, when you insert a CD into your Linux system, a directory will automatically be created inside the /media directory. You can access the contents of the CD inside this directory.
  12. /mnt – Temporary Mount PointsHistorically speaking, the /mnt directory is where system administrators mounted temporary file systems while using them. For example, if you’re mounting a Windows partition to perform some file recovery operations, you might mount it at /mnt/windows. However, you can mount other file systems anywhere on the system.
  13. /opt – Optional PackagesThe /opt directory contains subdirectories for optional software packages. It’s commonly used by proprietary software that doesn’t obey the standard file system hierarchy – for example, a proprietary program might dump its files in /opt/application when you install it.
  14. /proc – Kernel & Process Files : The /proc directory similar to the /dev directory because it doesn’t contain standard files. It contains special files that represent system and process information.
  15. /root – Root Home DirectoryThe /root directory is the home directory of the root user. Instead of being located at /home/root, it’s located at /root. This is distinct from /, which is the system root directory.
  16. /run – Application State FilesThe /run directory is fairly new, and gives applications a standard place to store transient files they require like sockets and process IDs. These files can’t be stored in /tmp because files in /tmp may be deleted.
  17. /sbin – System Administration Binaries : The /sbin directory is similar to the /bin directory. It contains essential binaries that are generally intended to be run by the root user for system administration.
  18. /selinux – SELinux Virtual File SystemIf your Linux distribution uses SELinux for security (Fedora and Red Hat, for example), the /selinux directory contains special files used by SELinux. It’s similar to /proc. Ubuntu doesn’t use SELinux, so the presence of this folder on Ubuntu appears to be a bug.
  19. /srv – Service Data : The /srv directory contains “data for services provided by the system.” If you were using the Apache HTTP server to serve a website, you’d likely store your website’s files in a directory inside the /srv directory.
  20. /tmp – Temporary FilesApplications store temporary files in the /tmp directory. These files are generally deleted whenever your system is restarted and may be deleted at any time by utilities such as tmpwatch.
  21. /usr – User Binaries & Read-Only Data :The /usr directory contains applications and files used by users, as opposed to applications and files used by the system. For example, non-essential applications are located inside the /usr/bin directory instead of the /bin directory and non-essential system administration binaries are located in the /usr/sbin directory instead of the /sbin directory. Libraries for each are located inside the /usr/lib directory. The /usr directory also contains other directories – for example, architecture-independent files like graphics are located in /usr/share.The /usr/local directory is where locally compiled applications install to by default – this prevents them from mucking up the rest of the system.
  22. /var – Variable Data Files : The /var directory is the writable counterpart to the /usr directory, which must be read-only in normal operation. Log files and everything else that would normally be written to /usr during normal operation are written to the /var directory. For example, you’ll find log files in /var/log.

Note – Documentation/SubmittingPatches in Kernel Source


How to Get Your Change Into the Linux Kernel
or
Care And Operation Of Your Linus Torvalds

For a person or company who wishes to submit a change to the Linux
kernel, the process can sometimes be daunting if you’re not familiar
with “the system.” This text is a collection of suggestions which
can greatly increase the chances of your change being accepted.

Read Documentation/SubmitChecklist for a list of items to check
before submitting code. If you are submitting a driver, also read
Documentation/SubmittingDrivers.

——————————————–
SECTION 1 – CREATING AND SENDING YOUR CHANGE
——————————————–

1) “diff -up”
————

Use “diff -up” or “diff -uprN” to create patches.

All changes to the Linux kernel occur in the form of patches, as
generated by diff(1). When creating your patch, make sure to create it
in “unified diff” format, as supplied by the ‘-u’ argument to diff(1).
Also, please use the ‘-p’ argument which shows which C function each
change is in – that makes the resultant diff a lot easier to read.
Patches should be based in the root kernel source directory,
not in any lower subdirectory.

To create a patch for a single file, it is often sufficient to do:

SRCTREE= linux-2.6
MYFILE= drivers/net/mydriver.c

cd $SRCTREE
cp $MYFILE $MYFILE.orig
vi $MYFILE # make your change
cd ..
diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch

To create a patch for multiple files, you should unpack a “vanilla”,
or unmodified kernel source tree, and generate a diff against your
own source tree. For example:

MYSRC= /devel/linux-2.6

tar xvfz linux-2.6.12.tar.gz
mv linux-2.6.12 linux-2.6.12-vanilla
diff -uprN -X linux-2.6.12-vanilla/Documentation/dontdiff \
linux-2.6.12-vanilla $MYSRC > /tmp/patch

“dontdiff” is a list of files which are generated by the kernel during
the build process, and should be ignored in any diff(1)-generated
patch. The “dontdiff” file is included in the kernel tree in
2.6.12 and later. For earlier kernel versions, you can get it
from <http://www.xenotime.net/linux/doc/dontdiff&gt;.

Make sure your patch does not include any extra files which do not
belong in a patch submission. Make sure to review your patch -after-
generated it with diff(1), to ensure accuracy.

If your changes produce a lot of deltas, you may want to look into
splitting them into individual patches which modify things in
logical stages. This will facilitate easier reviewing by other
kernel developers, very important if you want your patch accepted.
There are a number of scripts which can aid in this:

Quilt:
http://savannah.nongnu.org/projects/quilt

Andrew Morton’s patch scripts:
http://userweb.kernel.org/~akpm/stuff/patch-scripts.tar.gz
Instead of these scripts, quilt is the recommended patch management
tool (see above).

2) Describe your changes.

Describe the technical detail of the change(s) your patch includes.

Be as specific as possible. The WORST descriptions possible include
things like “update driver X”, “bug fix for driver X”, or “this patch
includes updates for subsystem X. Please apply.”

The maintainer will thank you if you write your patch description in a
form which can be easily pulled into Linux’s source code management
system, git, as a “commit log”. See #15, below.

If your description starts to get long, that’s a sign that you probably
need to split up your patch. See #3, next.

When you submit or resubmit a patch or patch series, include the
complete patch description and justification for it. Don’t just
say that this is version N of the patch (series). Don’t expect the
patch merger to refer back to earlier patch versions or referenced
URLs to find the patch description and put that into the patch.
I.e., the patch (series) and its description should be self-contained.
This benefits both the patch merger(s) and reviewers. Some reviewers
probably didn’t even receive earlier versions of the patch.

If the patch fixes a logged bug entry, refer to that bug entry by
number and URL.
3) Separate your changes.

Separate _logical changes_ into a single patch file.

For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
or more patches. If your changes include an API update, and a new
driver which uses that new API, separate those into two patches.

On the other hand, if you make a single change to numerous files,
group those changes into a single patch. Thus a single logical change
is contained within a single patch.

If one patch depends on another patch in order for a change to be
complete, that is OK. Simply note “this patch depends on patch X”
in your patch description.

If you cannot condense your patch set into a smaller set of patches,
then only post say 15 or so at a time and wait for review and integration.

4) Style check your changes.

Check your patch for basic style violations, details of which can be
found in Documentation/CodingStyle. Failure to do so simply wastes
the reviewers time and will get your patch rejected, probably
without even being read.

At a minimum you should check your patches with the patch style
checker prior to submission (scripts/checkpatch.pl). You should
be able to justify all violations that remain in your patch.

5) Select e-mail destination.

Look through the MAINTAINERS file and the source code, and determine
if your change applies to a specific subsystem of the kernel, with
an assigned maintainer. If so, e-mail that person.

If no maintainer is listed, or the maintainer does not respond, send
your patch to the primary Linux kernel developer’s mailing list,
linux-kernel@vger.kernel.org. Most kernel developers monitor this
e-mail list, and can comment on your changes.
Do not send more than 15 patches at once to the vger mailing lists!!!
Linus Torvalds is the final arbiter of all changes accepted into the
Linux kernel. His e-mail address is <torvalds@linux-foundation.org>.
He gets a lot of e-mail, so typically you should do your best to -avoid-
sending him e-mail.

Patches which are bug fixes, are “obvious” changes, or similarly
require little discussion should be sent or CC’d to Linus. Patches
which require discussion or do not have a clear advantage should
usually be sent first to linux-kernel. Only after the patch is
discussed should the patch then be submitted to Linus.

6) Select your CC (e-mail carbon copy) list.

Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org.

Other kernel developers besides Linus need to be aware of your change,
so that they may comment on it and offer code review and suggestions.
linux-kernel is the primary Linux kernel developer mailing list.
Other mailing lists are available for specific subsystems, such as
USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the
MAINTAINERS file for a mailing list that relates specifically to
your change.

Majordomo lists of VGER.KERNEL.ORG at:
<http://vger.kernel.org/vger-lists.html&gt;

If changes affect userland-kernel interfaces, please send
the MAN-PAGES maintainer (as listed in the MAINTAINERS file)
a man-pages patch, or at least a notification of the change,
so that some information makes its way into the manual pages.

Even if the maintainer did not respond in step #5, make sure to ALWAYS
copy the maintainer when you change their code.

For small patches you may want to CC the Trivial Patch Monkey
trivial@kernel.org which collects “trivial” patches. Have a look
into the MAINTAINERS file for its current manager.
Trivial patches must qualify for one of the following rules:
Spelling fixes in documentation
Spelling fixes which could break grep(1)
Warning fixes (cluttering with useless warnings is bad)
Compilation fixes (only if they are actually correct)
Runtime fixes (only if they actually fix things)
Removing use of deprecated functions/macros (eg. check_region)
Contact detail and documentation fixes
Non-portable code replaced by portable code (even in arch-specific,
since people copy, as long as it’s trivial)
Any fix by the author/maintainer of the file (ie. patch monkey
in re-transmission mode)

7) No MIME, no links, no compression, no attachments. Just plain text.

Linus and other kernel developers need to be able to read and comment
on the changes you are submitting. It is important for a kernel
developer to be able to “quote” your changes, using standard e-mail
tools, so that they may comment on specific portions of your code.

For this reason, all patches should be submitting e-mail “inline”.
WARNING: Be wary of your editor’s word-wrap corrupting your patch,
if you choose to cut-n-paste your patch.

Do not attach the patch as a MIME attachment, compressed or not.
Many popular e-mail applications will not always transmit a MIME
attachment as plain text, making it impossible to comment on your
code. A MIME attachment also takes Linus a bit more time to process,
decreasing the likelihood of your MIME-attached change being accepted.

Exception: If your mailer is mangling patches then someone may ask
you to re-send them using MIME.

See Documentation/email-clients.txt for hints about configuring
your e-mail client so that it sends your patches untouched.

8) E-mail size.

When sending patches to Linus, always follow step #7.

Large changes are not appropriate for mailing lists, and some
maintainers. If your patch, uncompressed, exceeds 300 kB in size,
it is preferred that you store your patch on an Internet-accessible
server, and provide instead a URL (link) pointing to your patch.

9) Name your kernel version.

It is important to note, either in the subject line or in the patch
description, the kernel version to which this patch applies.

If the patch does not apply cleanly to the latest kernel version,
Linus will not apply it.

10) Don’t get discouraged. Re-submit.

After you have submitted your change, be patient and wait. If Linus
likes your change and applies it, it will appear in the next version
of the kernel that he releases.

However, if your change doesn’t appear in the next version of the
kernel, there could be any number of reasons. It’s YOUR job to
narrow down those reasons, correct what was wrong, and submit your
updated change.

It is quite common for Linus to “drop” your patch without comment.
That’s the nature of the system. If he drops your patch, it could be
due to
* Your patch did not apply cleanly to the latest kernel version.
* Your patch was not sufficiently discussed on linux-kernel.
* A style issue (see section 2).
* An e-mail formatting issue (re-read this section).
* A technical problem with your change.
* He gets tons of e-mail, and yours got lost in the shuffle.
* You are being annoying.

When in doubt, solicit comments on linux-kernel mailing list.

11) Include PATCH in the subject

Due to high e-mail traffic to Linus, and to linux-kernel, it is common
convention to prefix your subject line with [PATCH]. This lets Linus
and other kernel developers more easily distinguish patches from other
e-mail discussions.

12) Sign your work

To improve tracking of who did what, especially with patches that can
percolate to their final resting place in the kernel through several
layers of maintainers, we’ve introduced a “sign-off” procedure on
patches that are being emailed around.

The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as an open-source patch. The rules are pretty simple: if you
can certify the below:

Developer’s Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

then you just add a line saying

Signed-off-by: Random J Developer <random@developer.example.org>

using your real name (sorry, no pseudonyms or anonymous contributions.)

Some people also put extra tags at the end. They’ll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.

If you are a subsystem or branch maintainer, sometimes you need to slightly
modify patches you receive in order to merge them, because the code is not
exactly the same in your tree and the submitters’. If you stick strictly to
rule (c), you should ask the submitter to rediff, but this is a totally
counter-productive waste of time and energy. Rule (b) allows you to adjust
the code, but then it is very impolite to change one submitter’s code and
make him endorse your bugs. To solve this problem, it is recommended that
you add a line between the last Signed-off-by header and yours, indicating
the nature of your changes. While there is nothing mandatory about this, it
seems like prepending the description with your mail and/or name, all
enclosed in square brackets, is noticeable enough to make it obvious that
you are responsible for last-minute changes. Example :

Signed-off-by: Random J Developer <random@developer.example.org>
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>

This practise is particularly helpful if you maintain a stable branch and
want at the same time to credit the author, track changes, merge the fix,
and protect the submitter from complaints. Note that under no circumstances
can you change the author’s identity (the From header), as it is the one
which appears in the changelog.

Special note to back-porters: It seems to be a common and useful practise
to insert an indication of the origin of a patch at the top of the commit
message (just after the subject line) to facilitate tracking. For instance,
here’s what we see in 2.6-stable :

Date: Tue May 13 19:10:30 2008 +0000

SCSI: libiscsi regression in 2.6.25: fix nop timer handling

commit 4cf1043593db6a337f10e006c23c69e5fc93e722 upstream

And here’s what appears in 2.4 :

Date: Tue May 13 22:12:27 2008 +0200

wireless, airo: waitbusy() won’t delay

[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]

Whatever the format, this information provides a valuable help to people
tracking your trees, and to people trying to trouble-shoot bugs in your
tree.
13) When to use Acked-by: and Cc:

The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch’s delivery path.

If a person was not directly involved in the preparation or handling of a
patch but wishes to signify and record their approval of it then they can
arrange to have an Acked-by: line added to the patch’s changelog.

Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch.

Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
has at least reviewed the patch and has indicated acceptance. Hence patch
mergers will sometimes manually convert an acker’s “yep, looks good to me”
into an Acked-by:.

Acked-by: does not necessarily indicate acknowledgement of the entire patch.
For example, if a patch affects multiple subsystems and has an Acked-by: from
one subsystem maintainer then this usually indicates acknowledgement of just
the part which affects that maintainer’s code. Judgement should be used here.
When in doubt people should refer to the original discussion in the mailing
list archives.

If a person has had the opportunity to comment on a patch, but has not
provided such comments, you may optionally add a “Cc:” tag to the patch.
This is the only tag which might be added without an explicit action by the
person it names. This tag documents that potentially interested parties
have been included in the discussion
14) Using Reported-by:, Tested-by: and Reviewed-by:

If this patch fixes a problem reported by somebody else, consider adding a
Reported-by: tag to credit the reporter for their contribution. Please
note that this tag should not be added without the reporter’s permission,
especially if the problem was not reported in a public forum. That said,
if we diligently credit our bug reporters, they will, hopefully, be
inspired to help us again in the future.

A Tested-by: tag indicates that the patch has been successfully tested (in
some environment) by the person named. This tag informs maintainers that
some testing has been performed, provides a means to locate testers for
future patches, and ensures credit for the testers.

Reviewed-by:, instead, indicates that the patch has been reviewed and found
acceptable according to the Reviewer’s Statement:

Reviewer’s statement of oversight

By offering my Reviewed-by: tag, I state that:

(a) I have carried out a technical review of this patch to
evaluate its appropriateness and readiness for inclusion into
the mainline kernel.

(b) Any problems, concerns, or questions relating to the patch
have been communicated back to the submitter. I am satisfied
with the submitter’s response to my comments.

(c) While there may be things that could be improved with this
submission, I believe that it is, at this time, (1) a
worthwhile modification to the kernel, and (2) free of known
issues which would argue against its inclusion.

(d) While I have reviewed the patch and believe it to be sound, I
do not (unless explicitly stated elsewhere) make any
warranties or guarantees that it will achieve its stated
purpose or function properly in any given situation.

A Reviewed-by tag is a statement of opinion that the patch is an
appropriate modification of the kernel without any remaining serious
technical issues. Any interested reviewer (who has done the work) can
offer a Reviewed-by tag for a patch. This tag serves to give credit to
reviewers and to inform maintainers of the degree of review which has been
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
understand the subject area and to perform thorough reviews, will normally
increase the likelihood of your patch getting into the kernel.
15) The canonical patch format

The canonical patch subject line is:

Subject: [PATCH 001/123] subsystem: summary phrase

The canonical patch message body contains the following:

– A “from” line specifying the patch author.

– An empty line.

– The body of the explanation, which will be copied to the
permanent changelog to describe this patch.

– The “Signed-off-by:” lines, described above, which will
also go in the changelog.

– A marker line containing simply “—“.

– Any additional comments not suitable for the changelog.

– The actual patch (diff output).

The Subject line format makes it very easy to sort the emails
alphabetically by subject line – pretty much any email reader will
support that – since because the sequence number is zero-padded,
the numerical and alphabetic sort is the same.

The “subsystem” in the email’s Subject should identify which
area or subsystem of the kernel is being patched.

The “summary phrase” in the email’s Subject should concisely
describe the patch which that email contains. The “summary
phrase” should not be a filename. Do not use the same “summary
phrase” for every patch in a whole patch series (where a “patch
series” is an ordered sequence of multiple, related patches).

Bear in mind that the “summary phrase” of your email becomes a
globally-unique identifier for that patch. It propagates all the way
into the git changelog. The “summary phrase” may later be used in
developer discussions which refer to the patch. People will want to
google for the “summary phrase” to read discussion regarding that
patch. It will also be the only thing that people may quickly see
when, two or three months later, they are going through perhaps
thousands of patches using tools such as “gitk” or “git log
–oneline”.

For these reasons, the “summary” must be no more than 70-75
characters, and it must describe both what the patch changes, as well
as why the patch might be necessary. It is challenging to be both
succinct and descriptive, but that is what a well-written summary
should do.

The “summary phrase” may be prefixed by tags enclosed in square
brackets: “Subject: [PATCH tag] <summary phrase>”. The tags are not
considered part of the summary phrase, but describe how the patch
should be treated. Common tags might include a version descriptor if
the multiple versions of the patch have been sent out in response to
comments (i.e., “v1, v2, v3”), or “RFC” to indicate a request for
comments. If there are four patches in a patch series the individual
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
that developers understand the order in which the patches should be
applied and that they have reviewed or applied all of the patches in
the patch series.

A couple of example Subjects:

Subject: [patch 2/5] ext2: improve scalability of bitmap searching
Subject: [PATCHv2 001/207] x86: fix eflags tracking

The “from” line must be the very first line in the message body,
and has the form:

From: Original Author <author@example.com>

The “from” line specifies who will be credited as the author of the
patch in the permanent changelog. If the “from” line is missing,
then the “From:” line from the email header will be used to determine
the patch author in the changelog.

The explanation body will be committed to the permanent source
changelog, so should make sense to a competent reader who has long
since forgotten the immediate details of the discussion that might
have led to this patch. Including symptoms of the failure which the
patch addresses (kernel log messages, oops messages, etc.) is
especially useful for people who might be searching the commit logs
looking for the applicable patch. If a patch fixes a compile failure,
it may not be necessary to include _all_ of the compile failures; just
enough that it is likely that someone searching for the patch can find
it. As in the “summary phrase”, it is important to be both succinct as
well as descriptive.

The “—” marker line serves the essential purpose of marking for patch
handling tools where the changelog message ends.

One good use for the additional comments after the “—” marker is for
a diffstat, to show what files have changed, and the number of
inserted and deleted lines per file. A diffstat is especially useful
on bigger patches. Other comments relevant only to the moment or the
maintainer, not suitable for the permanent changelog, should also go
here. A good example of such comments might be “patch changelogs”
which describe what has changed between the v1 and v2 version of the
patch.

If you are going to include a diffstat after the “—” marker, please
use diffstat options “-p 1 -w 70” so that filenames are listed from
the top of the kernel source tree and don’t use too much horizontal
space (easily fit in 80 columns, maybe with some indentation).

See more details on the proper patch format in the following
references.
16) Sending “git pull” requests (from Linus emails)

Please write the git repo address and branch name alone on the same line
so that I can’t even by mistake pull from the wrong branch, and so
that a triple-click just selects the whole thing.

So the proper format is something along the lines of:

“Please pull from

git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus

to get these changes:”

so that I don’t have to hunt-and-peck for the address and inevitably
get it wrong (actually, I’ve only gotten it wrong a few times, and
checking against the diffstat tells me when I get it wrong, but I’m
just a lot more comfortable when I don’t have to “look for” the right
thing to pull, and double-check that I have the right branch-name).
Please use “git diff -M –stat –summary” to generate the diffstat:
the -M enables rename detection, and the summary enables a summary of
new/deleted or renamed files.

With rename detection, the statistics are rather different […]
because git will notice that a fair number of the changes are renames.

———————————–
SECTION 2 – HINTS, TIPS, AND TRICKS
———————————–

This section lists many of the common “rules” associated with code
submitted to the kernel. There are always exceptions… but you must
have a really good reason for doing so. You could probably call this
section Linus Computer Science 101.

1) Read Documentation/CodingStyle

Nuff said. If your code deviates too much from this, it is likely
to be rejected without further review, and without comment.

One significant exception is when moving code from one file to
another — in this case you should not modify the moved code at all in
the same patch which moves it. This clearly delineates the act of
moving the code and your changes. This greatly aids review of the
actual differences and allows tools to better track the history of
the code itself.

Check your patches with the patch style checker prior to submission
(scripts/checkpatch.pl). The style checker should be viewed as
a guide not as the final word. If your code looks better with
a violation then its probably best left alone.

The checker reports at three levels:
– ERROR: things that are very likely to be wrong
– WARNING: things requiring careful review
– CHECK: things requiring thought

You should be able to justify all violations that remain in your
patch.

2) #ifdefs are ugly

Code cluttered with ifdefs is difficult to read and maintain. Don’t do
it. Instead, put your ifdefs in a header, and conditionally define
‘static inline’ functions, or macros, which are used in the code.
Let the compiler optimize away the “no-op” case.

Simple example, of poor code:

dev = alloc_etherdev (sizeof(struct funky_private));
if (!dev)
return -ENODEV;
#ifdef CONFIG_NET_FUNKINESS
init_funky_net(dev);
#endif

Cleaned-up example:

(in header)
#ifndef CONFIG_NET_FUNKINESS
static inline void init_funky_net (struct net_device *d) {}
#endif

(in the code itself)
dev = alloc_etherdev (sizeof(struct funky_private));
if (!dev)
return -ENODEV;
init_funky_net(dev);

3) ‘static inline’ is better than a macro

Static inline functions are greatly preferred over macros.
They provide type safety, have no length limitations, no formatting
limitations, and under gcc they are as cheap as macros.

Macros should only be used for cases where a static inline is clearly
suboptimal [there are a few, isolated cases of this in fast paths],
or where it is impossible to use a static inline function [such as
string-izing].

‘static inline’ is preferred over ‘static __inline__’, ‘extern inline’,
and ‘extern __inline__’.

4) Don’t over-design.

Don’t try to anticipate nebulous future cases which may or may not
be useful: “Make it as simple as you can, and no simpler.”

———————-
SECTION 3 – REFERENCES
———————-

Andrew Morton, “The perfect patch” (tpp).
<http://userweb.kernel.org/~akpm/stuff/tpp.txt&gt;

Jeff Garzik, “Linux kernel patch submission format”.
<http://linux.yyz.us/patch-format.html&gt;

Greg Kroah-Hartman, “How to piss off a kernel subsystem maintainer”.
<http://www.kroah.com/log/linux/maintainer.html&gt;
<http://www.kroah.com/log/linux/maintainer-02.html&gt;
<http://www.kroah.com/log/linux/maintainer-03.html&gt;
<http://www.kroah.com/log/linux/maintainer-04.html&gt;
<http://www.kroah.com/log/linux/maintainer-05.html&gt;

NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
<http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2&gt;

Kernel Documentation/CodingStyle:
<http://users.sosdg.org/~qiyong/lxr/source/Documentation/CodingStyle&gt;

Linus Torvalds’s mail on the canonical patch format:
<http://lkml.org/lkml/2005/4/7/183&gt;

Andi Kleen, “On submitting kernel patches”
Some strategies to get difficult or controversial changes in.

Click to access on-submitting-patches.pdf

Note-Documentation/CodeStyle in Kernel Source


Linux kernel coding style

This is a short document describing the preferred coding style for the
linux kernel. Coding style is very personal, and I won’t _force_ my
views on anybody, but this is what goes for anything that I have to be
able to maintain, and I’d prefer it for most other things too. Please
at least consider the points made here.

First off, I’d suggest printing out a copy of the GNU coding standards,
and NOT read it. Burn them, it’s a great symbolic gesture.

Anyway, here goes:

Chapter 1: Indentation

Tabs are 8 characters, and thus indentations are also 8 characters.
There are heretic movements that try to make indentations 4 (or even 2!)
characters deep, and that is akin to trying to define the value of PI to
be 3.

Rationale: The whole idea behind indentation is to clearly define where
a block of control starts and ends. Especially when you’ve been looking
at your screen for 20 straight hours, you’ll find it a lot easier to see
how the indentation works if you have large indentations.

Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you’re screwed anyway, and should fix
your program.

In short, 8-char indents make things easier to read, and have the added
benefit of warning you when you’re nesting your functions too deep.
Heed that warning.

The preferred way to ease multiple indentation levels in a switch statement is
to align the “switch” and its subordinate “case” labels in the same column
instead of “double-indenting” the “case” labels. E.g.:

switch (suffix) {
case 'G':
case 'g':
        mem <<= 30;
        break;
case 'M':
case 'm':
        mem <<= 20;
        break;
case 'K':
case 'k':
        mem <<= 10;
/* fall through */
default:
        break;
}

Don’t put multiple statements on a single line unless you have
something to hide:

        if (condition) do_this;
            do_something_everytime;

Don’t put multiple assignments on a single line either. Kernel coding style
is super simple. Avoid tricky expressions.

Outside of comments, documentation and except in Kconfig, spaces are never
used for indentation, and the above example is deliberately broken.

Get a decent editor and don’t leave whitespace at the end of lines.


Chapter 2: Breaking long lines and strings

Coding style is all about readability and maintainability using commonly
available tools.

The limit on the length of lines is 80 columns and this is a strongly
preferred limit.

Statements longer than 80 columns will be broken into sensible chunks, unless
exceeding 80 columns significantly increases readability and does not hide
information. Descendants are always substantially shorter than the parent and
are placed substantially to the right. The same applies to function headers
with a long argument list. However, never break user-visible strings such as
printk messages, because that breaks the ability to grep for them.

Chapter 3: Placing Braces and Spaces

The other issue that always comes up in C styling is the placement of
braces. Unlike the indent size, there are few technical reasons to
choose one placement strategy over the other, but the preferred way, as
shown to us by the prophets Kernighan and Ritchie, is to put the opening
brace last on the line, and put the closing brace first, thusly:

if (x is true) {
        we do y
}

This applies to all non-function statement blocks (if, switch, for,
while, do). E.g.:

switch (action) {
case KOBJ_ADD:
        return "add";
case KOBJ_REMOVE:
        return "remove";
case KOBJ_CHANGE:
        return "change";
default:
        return NULL;
}

However, there is one special case, namely functions: they have the
opening brace at the beginning of the next line, thus:

int function(int x)
{
        body of function
}

Heretic people all over the world have claimed that this inconsistency
is … well … inconsistent, but all right-thinking people know that
(a) K&R are _right_ and (b) K&R are right. Besides, functions are
special anyway (you can’t nest them in C).

Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a “while” in a do-statement or an “else” in an if-statement, like
this:


do {
        body of do-loop
} while (condition);

and


if (x == y) {
        ..
} else if (x &gt; y) {
        ...
} else {
        ....
}

Rationale: K&R.

Also, note that this brace-placement also minimizes the number of empty
(or almost empty) lines, without any loss of readability. Thus, as the
supply of new-lines on your screen is not a renewable resource (think
25-line terminal screens here), you have more empty lines to put
comments on.

Do not unnecessarily use braces where a single statement will do.

if (condition)
        action();

and


if (condition)
        do_this();
else
        do_that();

This does not apply if only one branch of a conditional statement is a single
statement; in the latter case use braces in both branches:


if (condition) {
        do_this();
        do_that();
} else {
        otherwise();
}

3.1: Spaces

Linux kernel style for use of spaces depends (mostly) on
function-versus-keyword usage. Use a space after (most) keywords. The
notable exceptions are sizeof, typeof, alignof, and __attribute__, which look
somewhat like functions (and are usually used with parentheses in Linux,
although they are not required in the language, as in: “sizeof info” after
“struct fileinfo info;” is declared).

So use a space after these keywords:
if, switch, case, for, do, while
but not with sizeof, typeof, alignof, or __attribute__. E.g.,
s = sizeof(struct file);

Do not add spaces around (inside) parenthesized expressions. This example is
*bad*:

s = sizeof( struct file );

When declaring pointer data or a function that returns a pointer type, the
preferred use of ‘*’ is adjacent to the data name or function name and not
adjacent to the type name. Examples:

char *linux_banner;
unsigned long long memparse(char *ptr, char **retptr);
char *match_strdup(substring_t *s);

Use one space around (on each side of) most binary and ternary operators,
such as any of these:

= + – < > * / % | & ^ <= >= == != ? :

but no space after unary operators:
& * + – ~ ! sizeof typeof alignof __attribute__ defined

no space before the postfix increment & decrement unary operators:
++ —

no space after the prefix increment & decrement unary operators:
++ —

and no space around the ‘.’ and “->” structure member operators.

Do not leave trailing whitespace at the ends of lines. Some editors with
“smart” indentation will insert whitespace at the beginning of new lines as
appropriate, so you can start typing the next line of code right away.
However, some such editors do not remove the whitespace if you end up not
putting a line of code there, such as if you leave a blank line. As a result,
you end up with lines containing trailing whitespace.

Git will warn you about patches that introduce trailing whitespace, and can
optionally strip the trailing whitespace for you; however, if applying a series
of patches, this may make later patches in the series fail by changing their
context lines.

Chapter 4: Naming

C is a Spartan language, and so should your naming be. Unlike Modula-2
and Pascal programmers, C programmers do not use cute names like
ThisVariableIsATemporaryCounter. A C programmer would call that
variable “tmp”, which is much easier to write, and not the least more
difficult to understand.

HOWEVER, while mixed-case names are frowned upon, descriptive names for
global variables are a must. To call a global function “foo” is a
shooting offense.

GLOBAL variables (to be used only if you _really_ need them) need to
have descriptive names, as do global functions. If you have a function
that counts the number of active users, you should call that
count_active_users()” or similar, you should _not_ call it “cntusr()”.

Encoding the type of a function into the name (so-called Hungarian
notation) is brain damaged – the compiler knows the types anyway and can
check those, and it only confuses the programmer. No wonder MicroSoft
makes buggy programs.

LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called “i”.
Calling it “loop_counter” is non-productive, if there is no chance of it
being mis-understood. Similarly, “tmp” can be just about any type of
variable that is used to hold a temporary value.

If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See chapter 6 (Functions).

Chapter 5: Typedefs

Please don’t use things like “vps_t”.

It’s a _mistake_ to use typedef for structures and pointers. When you see a

vps_t a;

in the source, what does it mean?

In contrast, if it says

struct virtual_container *a;

you can actually tell what “a” is.

Lots of people think that typedefs “help readability”. Not so. They are
useful only for:

(a) totally opaque objects (where the typedef is actively used to _hide_
what the object is).

Example: “pte_t” etc. opaque objects that you can only access using
the proper accessor functions.

NOTE! Opaqueness and “accessor functions” are not good in themselves.
The reason we have them for things like pte_t etc. is that there
really is absolutely _zero_ portably accessible information there.

(b) Clear integer types, where the abstraction _helps_ avoid confusion
whether it is “int” or “long”.

u8/u16/u32 are perfectly fine typedefs, although they fit into
category (d) better than here.

NOTE! Again – there needs to be a _reason_ for this. If something is
“unsigned long”, then there’s no reason to do

typedef unsigned long myflags_t;

but if there is a clear reason for why it under certain circumstances
might be an “unsigned int” and under other configurations might be
“unsigned long”, then by all means go ahead and use a typedef.

(c) when you use sparse to literally create a _new_ type for
type-checking.

(d) New types which are identical to standard C99 types, in certain
exceptional circumstances.

Although it would only take a short amount of time for the eyes and
brain to become accustomed to the standard types like ‘uint32_t’,
some people object to their use anyway.

Therefore, the Linux-specific ‘u8/u16/u32/u64’ types and their
signed equivalents which are identical to standard types are
permitted — although they are not mandatory in new code of your
own.

When editing existing code which already uses one or the other set
of types, you should conform to the existing choices in that code.

(e) Types safe for use in userspace.

In certain structures which are visible to userspace, we cannot
require C99 types and cannot use the ‘u32’ form above. Thus, we
use __u32 and similar types in all structures which are shared
with userspace.

Maybe there are other cases too, but the rule should basically be to NEVER
EVER use a typedef unless you can clearly match one of those rules.

In general, a pointer, or a struct that has elements that can reasonably
be directly accessed should _never_ be a typedef.

Chapter 6: Functions

Functions should be short and sweet, and do just one thing. They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80×24,
as we all know), and do one thing and do that well.

The maximum length of a function is inversely proportional to the
complexity and indentation level of that function. So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it’s OK to have a longer function.

However, if you have a complex function, and you suspect that a
less-than-gifted first-year high-school student might not even
understand what the function is all about, you should adhere to the
maximum limits all the more closely. Use helper functions with
descriptive names (you can ask the compiler to in-line them if you think
it’s performance-critical, and it will probably do a better job of it
than you would have done).

Another measure of the function is the number of local variables. They
shouldn’t exceed 5-10, or you’re doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused. You know you’re brilliant, but maybe you’d like
to understand what you did 2 weeks from now.

In source files, separate functions with one blank line. If the function is
exported, the EXPORT* macro for it should follow immediately after the closing
function brace line. E.g.:


int system_is_up(void)
{
        return system_state == SYSTEM_RUNNING;
}
EXPORT_SYMBOL(system_is_up);

In function prototypes, include parameter names with their data types.
Although this is not required by the C language, it is preferred in Linux
because it is a simple way to add valuable information for the reader.

Chapter 7: Centralized exiting of functions

Albeit deprecated by some people, the equivalent of the goto statement is
used frequently by compilers in form of the unconditional jump instruction.

The goto statement comes in handy when a function exits from multiple
locations and some common work such as cleanup has to be done.

The rationale is:

– unconditional statements are easier to understand and follow
– nesting is reduced
– errors by not updating individual exit points when making
modifications are prevented
– saves the compiler work to optimize redundant code away 😉


int fun(int a)
{
        int result = 0;
        char *buffer = kmalloc(SIZE);

        if (buffer == NULL)
                return -ENOMEM;

        if (condition1) {
                while (loop1) {
                ...
                }
        result = 1;
        goto out;
        }
        ...
out:
        kfree(buffer);
        return result;
}

Chapter 8: Commenting

Comments are good, but there is also a danger of over-commenting. NEVER
try to explain HOW your code works in a comment: it’s much better to
write the code so that the _working_ is obvious, and it’s a waste of
time to explain badly written code.

Generally, you want your comments to tell WHAT your code does, not HOW.
Also, try to avoid putting comments inside a function body: if the
function is so complex that you need to separately comment parts of it,
you should probably go back to chapter 6 for a while. You can make
small comments to note or warn about something particularly clever (or
ugly), but try to avoid excess. Instead, put the comments at the head
of the function, telling people what it does, and possibly WHY it does
it.

When commenting the kernel API functions, please use the kernel-doc format.
See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
for details.

Linux style for comments is the C89 “/* … */” style.
Don’t use C99-style “// …” comments.

The preferred style for long (multi-line) comments is:


        /*
        * This is the preferred style for multi-line
        * comments in the Linux kernel source code.
        * Please use it consistently.
        *
        * Description: A column of asterisks on the left side,
        * with beginning and ending almost-blank lines.
        */

It’s also important to comment data, whether they are basic types or derived
types. To this end, use just one data declaration per line (no commas for
multiple data declarations). This leaves you room for a small comment on each
item, explaining its use.

Chapter 9: You’ve made a mess of it

That’s OK, we all do. You’ve probably been told by your long-time Unix
user helper that “GNU emacs” automatically formats the C sources for
you, and you’ve noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing – an infinite number of monkeys typing into GNU emacs would never
make a good program).

So, you can either get rid of GNU emacs, or change it to use saner
values. To do the latter, you can stick the following in your .emacs file:

(defun c-lineup-arglist-tabs-only (ignored)
“Line up argument lists by tabs, not spaces”
(let* ((anchor (c-langelem-pos c-syntactic-element))
(column (c-langelem-2nd-pos c-syntactic-element))
(offset (- (1+ column) anchor))
(steps (floor offset c-basic-offset)))
(* (max steps 1)
c-basic-offset)))

(add-hook ‘c-mode-common-hook
(lambda ()
;; Add kernel style
(c-add-style
“linux-tabs-only”
‘(“linux” (c-offsets-alist
(arglist-cont-nonempty
c-lineup-gcc-asm-reg
c-lineup-arglist-tabs-only))))))

(add-hook ‘c-mode-hook
(lambda ()
(let ((filename (buffer-file-name)))
;; Enable kernel mode for the appropriate files
(when (and filename
(string-match (expand-file-name “~/src/linux-trees”)
filename))
(setq indent-tabs-mode t)
(c-set-style “linux-tabs-only”)))))

This will make emacs go better with the kernel coding style for C
files below ~/src/linux-trees.

But even if you fail in getting emacs to do sane formatting, not
everything is lost: use “indent”.

Now, again, GNU indent has the same brain-dead settings that GNU emacs
has, which is why you need to give it a few command line options.
However, that’s not too bad, because even the makers of GNU indent
recognize the authority of K&R (the GNU people aren’t evil, they are
just severely misguided in this matter), so you just give indent the
options “-kr -i8” (stands for “K&R, 8 character indents”), or use
“scripts/Lindent”, which indents in the latest style.

“indent” has a lot of options, and especially when it comes to comment
re-formatting you may want to take a look at the man page. But
remember: “indent” is not a fix for bad programming.

Chapter 10: Kconfig configuration files

For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different. Lines under a “config” definition
are indented with one tab, while help text is indented an additional two
spaces. Example:

config AUDIT
        bool “Auditing support”
        depends on NET
        help
          Enable auditing infrastructure that can be used with another
          kernel subsystem, such as SELinux (which requires this for
          logging of avc messages output). Does not do system-call
          auditing without CONFIG_AUDITSYSCALL.

Features that might still be considered unstable should be defined as
dependent on “EXPERIMENTAL”:

config SLUB
        depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT
        bool “SLUB (Unqueued Allocator)”
        …

while seriously dangerous features (such as write support for certain
filesystems) should advertise this prominently in their prompt string:

config ADFS_FS_RW
        bool “ADFS write support (DANGEROUS)”
        depends on ADFS_FS
        …

For full documentation on the configuration files, see the file
Documentation/kbuild/kconfig-language.txt.

Chapter 11: Data structures

Data structures that have visibility outside the single-threaded
environment they are created and destroyed in should always have
reference counts. In the kernel, garbage collection doesn’t exist (and
outside the kernel garbage collection is slow and inefficient), which
means that you absolutely _have_ to reference count all your uses.

Reference counting means that you can avoid locking, and allows multiple
users to have access to the data structure in parallel – and not having
to worry about the structure suddenly going away from under them just
because they slept or did something else for a while.

Note that locking is _not_ a replacement for reference counting.
Locking is used to keep data structures coherent, while reference
counting is a memory management technique. Usually both are needed, and
they are not to be confused with each other.

Many data structures can indeed have two levels of reference counting,
when there are users of different “classes”. The subclass count counts
the number of subclass users, and decrements the global count just once
when the subclass count goes to zero.

Examples of this kind of “multi-level-reference-counting” can be found in
memory management (“struct mm_struct”: mm_users and mm_count), and in
filesystem code (“struct super_block”: s_count and s_active).

Remember: if another thread can find your data structure, and you don’t
have a reference count on it, you almost certainly have a bug.

Chapter 12: Macros, Enums and RTL

Names of macros defining constants and labels in enums are capitalized.


#define CONSTANT 0x12345

Enums are preferred when defining several related constants.

CAPITALIZED macro names are appreciated but macros resembling functions
may be named in lower case.

Generally, inline functions are preferable to macros resembling functions.

Macros with multiple statements should be enclosed in a do – while block:


#define macrofun(a, b, c) \
        do { \
                if (a == 5) \
                        do_this(b, c); \
        } while (0)

Things to avoid when using macros:

1) macros that affect control flow:


#define FOO(x) \
        do { \
                if (blah(x) &lt; 0) \
                        return -EBUGGERED; \
        } while(0)

is a _very_ bad idea. It looks like a function call but exits the “calling”
function; don’t break the internal parsers of those who will read the code.

2) macros that depend on having a local variable with a magic name:


#define FOO(val) bar(index, val)

might look like a good thing, but it’s confusing as hell when one reads the
code and it’s prone to breakage from seemingly innocent changes.

3) macros with arguments that are used as l-values: FOO(x) = y; will
bite you if somebody e.g. turns FOO into an inline function.

4) forgetting about precedence: macros defining constants using expressions
must enclose the expression in parentheses. Beware of similar issues with
macros using parameters.


#define CONSTANT 0x4000
#define CONSTEXP (CONSTANT | 3)

The cpp manual deals with macros exhaustively. The gcc internals manual also
covers RTL which is used frequently with assembly language in the kernel.

Chapter 13: Printing kernel messages

Kernel developers like to be seen as literate. Do mind the spelling
of kernel messages to make a good impression. Do not use crippled
words like “dont”; use “do not” or “don’t” instead. Make the messages
concise, clear, and unambiguous.

Kernel messages do not have to be terminated with a period.

Printing numbers in parentheses (%d) adds no value and should be avoided.

There are a number of driver model diagnostic macros in <linux/device.h>
which you should use to make sure messages are matched to the right device
and driver, and are tagged with the right level: dev_err(), dev_warn(),
dev_info(), and so forth. For messages that aren’t associated with a
particular device, <linux/printk.h> defines pr_debug() and pr_info().

Coming up with good debugging messages can be quite a challenge; and once
you have them, they can be a huge help for remote troubleshooting. Such
messages should be compiled out when the DEBUG symbol is not defined (that
is, by default they are not included). When you use dev_dbg() or pr_debug(),
that’s automatic. Many subsystems have Kconfig options to turn on -DDEBUG.
A related convention uses VERBOSE_DEBUG to add dev_vdbg() messages to the
ones already enabled by DEBUG.

Chapter 14: Allocating memory

The kernel provides the following general purpose memory allocators:
kmalloc(), kzalloc(), kcalloc(), vmalloc(), and vzalloc(). Please refer to
the API documentation for further information about them.

The preferred form for passing a size of a struct is the following:


        p = kmalloc(sizeof(*p), ...);

The alternative form where struct name is spelled out hurts readability and
introduces an opportunity for a bug when the pointer variable type is changed
but the corresponding sizeof that is passed to a memory allocator is not.

Casting the return value which is a void pointer is redundant. The conversion
from void pointer to any other pointer type is guaranteed by the C programming
language.

Chapter 15: The inline disease

There appears to be a common misperception that gcc has a magic “make me
faster” speedup option called “inline”. While the use of inlines can be
appropriate (for example as a means of replacing macros, see Chapter 12), it
very often is not. Abundant use of the inline keyword leads to a much bigger
kernel, which in turn slows the system as a whole down, due to a bigger
icache footprint for the CPU and simply because there is less memory
available for the pagecache. Just think about it; a pagecache miss causes a
disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles
that can go into these 5 milliseconds.

A reasonable rule of thumb is to not put inline at functions that have more
than 3 lines of code in them. An exception to this rule are the cases where
a parameter is known to be a compile time constant, and as a result of this
constantness you *know* the compiler will be able to optimize most of your
function away at compile time. For a good example of this later case, see
the kmalloc() inline function.

Often people argue that adding inline to functions that are static and used
only once is always a win since there is no space tradeoff. While this is
technically correct, gcc is capable of inlining these automatically without
help, and the maintenance issue of removing the inline when a second user
appears outweighs the potential value of the hint that tells gcc to do
something it would have done anyway.

Chapter 16: Function return values and names

Functions can return values of many different kinds, and one of the
most common is a value indicating whether the function succeeded or
failed. Such a value can be represented as an error-code integer
(-Exxx = failure, 0 = success) or a “succeeded” boolean (0 = failure,
non-zero = success).

Mixing up these two sorts of representations is a fertile source of
difficult-to-find bugs. If the C language included a strong distinction
between integers and booleans then the compiler would find these mistakes
for us… but it doesn’t. To help prevent such bugs, always follow this
convention:

If the name of a function is an action or an imperative command,
the function should return an error-code integer. If the name
is a predicate, the function should return a “succeeded” boolean.

For example, “add work” is a command, and the add_work() function returns 0
for success or -EBUSY for failure. In the same way, “PCI device present” is
a predicate, and the pci_dev_present() function returns 1 if it succeeds in
finding a matching device or 0 if it doesn’t.

All EXPORTed functions must respect this convention, and so should all
public functions. Private (static) functions need not, but it is
recommended that they do.

Functions whose return value is the actual result of a computation, rather
than an indication of whether the computation succeeded, are not subject to
this rule. Generally they indicate failure by returning some out-of-range
result. Typical examples would be functions that return pointers; they use
NULL or the ERR_PTR mechanism to report failure.

Chapter 17: Don’t re-invent the kernel macros

The header file include/linux/kernel.h contains a number of macros that
you should use, rather than explicitly coding some variant of them yourself.
For example, if you need to calculate the length of an array, take advantage
of the macro


#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))

Similarly, if you need to calculate the size of some structure member, use


#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)-&gt;f))

There are also min() and max() macros that do strict type checking if you
need them. Feel free to peruse that header file to see what else is already
defined that you shouldn’t reproduce in your code.

Chapter 18: Editor modelines and other cruft

Some editors can interpret configuration information embedded in source files,
indicated with special markers. For example, emacs interprets lines marked
like this:

-*- mode: c -*-

Or like this:

/*
Local Variables:
compile-command: “gcc -DMAGIC_DEBUG_FLAG foo.c”
End:
*/

Vim interprets markers that look like this:

/* vim:set sw=8 noet */

Do not include any of these in source files. People have their own personal
editor configurations, and your source files should not override them. This
includes markers for indentation and mode configuration. People may use their
own custom mode, or may have some other magic method for making indentation
work correctly.

Chapter 19: Inline assembly

In architecture-specific code, you may need to use inline assembly to interface
with CPU or platform functionality. Don’t hesitate to do so when necessary.
However, don’t use inline assembly gratuitously when C can do the job. You can
and should poke hardware from C when possible.

Consider writing simple helper functions that wrap common bits of inline
assembly, rather than repeatedly writing them with slight variations. Remember
that inline assembly can use C parameters.

Large, non-trivial assembly functions should go in .S files, with corresponding
C prototypes defined in C header files. The C prototypes for assembly
functions should use “asmlinkage”.

You may need to mark your asm statement as volatile, to prevent GCC from
removing it if GCC doesn’t notice any side effects. You don’t always need to
do so, though, and doing so unnecessarily can limit optimization.

When writing a single inline assembly statement containing multiple
instructions, put each instruction on a separate line in a separate quoted
string, and end each string except the last with \n\t to properly indent the
next instruction in the assembly output:


        asm ("magic %reg1, #42\n\t"
                "more_magic %reg2, %reg3"
                : /* outputs */ : /* inputs */ : /* clobbers */);

Appendix I: References

The C Programming Language, Second Edition
by Brian W. Kernighan and Dennis M. Ritchie.
Prentice Hall, Inc., 1988.
ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback).
URL: http://cm.bell-labs.com/cm/cs/cbook/

The Practice of Programming
by Brian W. Kernighan and Rob Pike.
Addison-Wesley, Inc., 1999.
ISBN 0-201-61586-X.
URL: http://cm.bell-labs.com/cm/cs/tpop/

GNU manuals – where in compliance with K&R and this text – for cpp, gcc,
gcc internals and indent, all available from http://www.gnu.org/manual/

WG14 is the international standardization working group for the programming
language C, URL: http://www.open-std.org/JTC1/SC22/WG14/

Kernel CodingStyle, by greg@kroah.com at OLS 2002:
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/

Note-Documentation/BUG-HUNTING in Kernel Source


Table of contents
=================

Last updated: 20 December 2005

Contents
========

– Introduction
– Devices not appearing
– Finding patch that caused a bug
— Finding using git-bisect
— Finding it the old way
– Fixing the bug

Introduction
============

Always try the latest kernel from kernel.org and build from source. If you are
not confident in doing that please report the bug to your distribution vendor
instead of to a kernel developer.

Finding bugs is not always easy. Have a go though. If you can’t find it don’t
give up. Report as much as you have found to the relevant maintainer. See
MAINTAINERS for who that is for the subsystem you have worked on.

Before you submit a bug report read REPORTING-BUGS.

Devices not appearing
=====================

Often this is caused by udev. Check that first before blaming it on the
kernel.

Finding patch that caused a bug
===============================

Finding using git-bisect
————————

Using the provided tools with git makes finding bugs easy provided the bug is
reproducible.

Steps to do it:
– start using git for the kernel source
– read the man page for git-bisect
– have fun

Finding it the old way
———————-

[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]

This is how to track down a bug if you know nothing about kernel hacking.
It’s a brute force approach but it works pretty well.

You need:

. A reproducible bug – it has to happen predictably (sorry)
. All the kernel tar files from a revision that worked to the
revision that doesn’t

You will then do:

. Rebuild a revision that you believe works, install, and verify that.
. Do a binary search over the kernels to figure out which one
introduced the bug. I.e., suppose 1.3.28 didn’t have the bug, but
you know that 1.3.69 does. Pick a kernel in the middle and build
that, like 1.3.50. Build & test; if it works, pick the mid point
between .50 and .69, else the mid point between .28 and .50.
. You’ll narrow it down to the kernel that introduced the bug. You
can probably do better than this but it gets tricky.

. Narrow it down to a subdirectory

– Copy kernel that works into “test”. Let’s say that 3.62 works,
but 3.63 doesn’t. So you diff -r those two kernels and come
up with a list of directories that changed. For each of those
directories:

Copy the non-working directory next to the working directory
as “dir.63”.
One directory at time, try moving the working directory to
“dir.62” and mv dir.63 dir”time, try

mv dir dir.62
mv dir.63 dir
find dir -name ‘*.[oa]’ -print | xargs rm -f

And then rebuild and retest. Assuming that all related
changes were contained in the sub directory, this should
isolate the change to a directory.

Problems: changes in header files may have occurred; I’ve
found in my case that they were self explanatory – you may
or may not want to give up when that happens.

. Narrow it down to a file

– You can apply the same technique to each file in the directory,
hoping that the changes in that file are self contained.

. Narrow it down to a routine

– You can take the old file and the new file and manually create
a merged file that has

#ifdef VER62
routine()
{

}
#else
routine()
{

}
#endif

And then walk through that file, one routine at a time and
prefix it with

#define VER62
/* both routines here */
#undef VER62

Then recompile, retest, move the ifdefs until you find the one
that makes the difference.

Finally, you take all the info that you have, kernel revisions, bug
description, the extent to which you have narrowed it down, and pass
that off to whomever you believe is the maintainer of that section.
A post to linux.dev.kernel isn’t such a bad idea if you’ve done some
work to narrow it down.

If you get it down to a routine, you’ll probably get a fix in 24 hours.

My apologies to Linus and the other kernel hackers for describing this
brute force approach, it’s hardly what a kernel hacker would do. However,
it does work and it lets non-hackers help fix bugs. And it is cool
because Linux snapshots will let you do this – something that you can’t
do with vendor supplied releases.

Fixing the bug
==============

Nobody is going to tell you how to fix bugs. Seriously. You need to work it
out. But below are some hints on how to use the tools.

To debug a kernel, use objdump and look for the hex offset from the crash
output to find the valid line of code/assembler. Without debug symbols, you
will see the assembler code for the routine shown, but if your kernel has
debug symbols the C code will also be available. (Debug symbols can be enabled
in the kernel hacking menu of the menu configuration.) For example:

objdump -r -S -l –disassemble net/dccp/ipv4.o

NB.: you need to be at the top level of the kernel tree for this to pick up
your C files.

If you don’t have access to the code you can also debug on some crash dumps
e.g. crash dump output as shown by Dave Miller.

> EIP is at ip_queue_xmit+0x14/0x4c0
> …
> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
>
> Put the bytes into a “foo.s” file like this:
>
> .text
> .globl foo
> foo:
> .byte …. /* bytes from Code: part of OOPS dump */
>
> Compile it with “gcc -c -o foo.o foo.s” then look at the output of
> “objdump –disassemble foo.o”.
>
> Output:
>
> ip_queue_xmit:
> push %ebp
> push %edi
> push %esi
> push %ebx
> sub $0xbc, %esp
> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
> mov 0x8(%ebp), %ebx ! %ebx = skb->sk
> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt

In addition, you can use GDB to figure out the exact file and line
number of the OOPS from the vmlinux file. If you have
CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the
OOPS:

EIP: 0060:[<c021e50e>] Not tainted VLI

And use GDB to translate that to human-readable form:

gdb vmlinux
(gdb) l *0xc021e50e

If you don’t have CONFIG_DEBUG_INFO enabled, you use the function
offset from the OOPS:

EIP is at vt_ioctl+0xda8/0x1482

And recompile the kernel with CONFIG_DEBUG_INFO enabled:

make vmlinux
gdb vmlinux
(gdb) p vt_ioctl
(gdb) l *(0x<address of vt_ioctl> + 0xda8)
or, as one command
(gdb) l *(vt_ioctl + 0xda8)

If you have a call trace, such as :-
>Call Trace:
> [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
> [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
> [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
> …
this shows the problem in the :jbd: module. You can load that module in gdb
and list the relevant code.
gdb fs/jbd/jbd.ko
(gdb) p log_wait_commit
(gdb) l *(0x<address> + 0xa3)
or
(gdb) l *(log_wait_commit + 0xa3)
Another very useful option of the Kernel Hacking section in menuconfig is
Debug memory allocations. This will help you see whether data has been
initialised and not set before use etc. To see the values that get assigned
with this look at mm/slab.c and search for POISON_INUSE. When using this an
Oops will often show the poisoned data instead of zero which is the default.

Once you have worked out a fix please submit it upstream. After all open
source is about sharing what you do and don’t you want to be recognised for
your genius?

Please do read Documentation/SubmittingPatches though to help your code get
accepted.

Windows 7 can’t Access Internet with Good Network Connection


My Windows 7 can’t access the internet although the network connection is fine.

I do connect the internet successful through the VB Ubuntu on the Windows7.

The issue is fixed by the following command “netsh winsock reset catalog” (more information) from the Post:http://texhex.blogspot.fr/2009/01/dhcp-error-service-provider-could-not.html

The following is the ipconfig /all output of my machine:
Windows IP Configuration

Host Name . . . . . . . . . . . . : ********
Primary Dns Suffix . . . . . . . : ********
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : ********
grenoble.eur.slb.com
slb.com
System Quarantine State . . . . . : Not Restricted
Wireless LAN adapter Wireless Network Connection 2:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Virtual WiFi Miniport Adapter
Physical Address. . . . . . . . . : 24-77-03-2E-96-91
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes

Ethernet adapter Local Area Connection* 12:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Juniper Network Connect Virtual Adapter
Physical Address. . . . . . . . . : 00-FF-10-20-87-07
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes

Wireless LAN adapter Wireless Network Connection:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . : mobile.lan
Description . . . . . . . . . . . : Intel(R) Centrino(R) Ultimate-N 6300 AGN
Physical Address. . . . . . . . . : 24-77-03-2E-96-90
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . : ********
Description . . . . . . . . . . . : Intel(R) 82579LM Gigabit Network Connection
Physical Address. . . . . . . . . : D0-67-E5-33-18-D0
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 163.187.242.52(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Thursday, June 21, 2012 14:51:26
Lease Expires . . . . . . . . . . : Sunday, July 01, 2012 14:51:26
Default Gateway . . . . . . . . . : 163.187.242.1
DHCP Server . . . . . . . . . . . : 163.187.176.6
DNS Servers . . . . . . . . . . . : 163.187.176.6
192.23.23.192
192.23.23.193
Primary WINS Server . . . . . . . : 199.6.145.33
Secondary WINS Server . . . . . . : 199.6.193.214
NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter VirtualBox Host-Only Network:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : VirtualBox Host-Only Ethernet Adapter
Physical Address. . . . . . . . . : 08-00-27-00-38-44
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.56.1(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled