Embed-India: 12/18/11

Chapter 3: Numbers, Characters and Strings

What's in Chapter 3?

How are numbers represented on the computer
8-bit unsigned numbers
8-bit signed numbers
16-bit unsigned numbers
16-bit signed numbers
Big and little endian
Boolean (true/false)
Decimal numbers
Hexadecimal numbers
Octal numbers
Characters
Strings
Escape sequences

This chapter defines the various data types supported by the compiler. Since the objective of most computer systems is to process data, it is important to understand how data is stored and interpreted by the software. We define a literal as the direct specification of the number, character, or string. E.g.,

100 'a' "Hello World"

are examples of a number literal, a character literal and a string literal respectively. We will discuss the way data are stored on the computer as well as the C syntax for creating the literals. The Imagecraft and Metrowerks compilers recognize three types of literals (numeric, character, string). Numbers can be written in three bases (decimal, octal, and hexadecimal). Although the programmer can choose to specify numbers in these three bases, once loaded into the computer, the all numbers are stored and processed as unsigned or signed binary. Although C does not support the binary literals, if you wanted to specify a binary number, you should have no trouble using either the octal or hexadecimal format.

Binary representation

Numbers are stored on the computer in binary form. In other words, information is encoded as a sequence of 1’s and 0’s. On most computers, the memory is organized into 8-bit bytes. This means each 8-bit byte stored in memory will have a separate address. Precision is the number of distinct or different values. We express precision in alternatives, decimal digits, bytes, or binary bits. Alternatives are defined as the total number of possibilities. For example, an 8-bit number scheme can represent 256 different numbers. An 8-bit digital to analog converter can generate 256 different analog outputs. An 8-bit analog to digital converter (ADC) can measure 256 different analog inputs. We use the expression 4½ decimal digits to mean about 20,000 alternatives and the expression 4¾ decimal digits to mean more than 20,000 alternatives but less than 100,000 alternatives. The ½ decimal digit means twice the number of alternatives or one additional binary bit. For example, a voltmeter with a range of 0.00 to 9.99V has a three decimal digit precision. Let the operation [[x]] be the greatest integer of x. E.g., [[2.1]] is rounded up to 3. Tables 3.1a and 3.1b illustrate various representations of precision.

Binary bits	Bytes	Alternatives
8	1	256
10		1024
12		4096
16	2	65536
20		1,048,576
24	3	16,777,216
30		1,073,741,824
32	4	4,294,967,296
n	[[n/8]]	2ⁿ

Table 3-1a. Relationships between various representations of precision.

Decimal digits	Alternatives
3	1000
3½	2000
3¾	4000
4	10000
4½	20000
4¾	40000
5	100000
n	10ⁿ

Table 3-1b. Relationships between various representations of precision.

Observation: A good rule of thumb to remember is 2^10•nis about 10^3•n.

For large numbers we use abbreviations, as shown in the following table. For example, 16K means 16*1024 which equals 16384. Computer engineers use the same symbols as other scientists, but with slightly different values.

abbreviation	pronunciation	Computer Engineering Value	Scientific Value
K	"kay"	2¹⁰ 1024	10³
M	"meg"	2²⁰ 1,048,576	10⁶
G	"gig"	2³⁰ 1,073,741,824	10⁹
T	"tera"	2⁴⁰ 1,099,511,627,776	10¹²
P	"peta"	2⁵⁰ 1,125,899,906,843,624	10¹⁵
E	"exa"	2⁶⁰ 1,152,921,504,606,846,976	10¹⁸

Table 3-2. Common abbreviations for large numbers.

8-bit unsigned numbers

A byte contains 8 bits

where each bit b7,...,b0 is binary and has the value 1 or 0. We specify b7 as the most significant bit or MSB, and b0 as the least significant bit or LSB. If a byte is used to represent an unsigned number, then the value of the number is

N = 128•b7 + 64•b6 + 32•b5 + 16•b4 + 8•b3 + 4•b2 + 2•b1 + b0

There are 256 different unsigned 8-bit numbers. The smallest unsigned 8-bit number is 0 and the largest is 255. For example, 00001010₂ is 8+2 or 10. Other examples are shown in the following table.

binary	hex	Calculation	decimal
00000000	0x00		0
01000001	0x41	64+1	65
00010110	0x16	16+4+2	22
10000111	0x87	128+4+2+1	135
11111111	0xFF	128+64+32+16+8+4+2+1	255

Table 3-3. Example conversions from unsigned 8-bit binary to hexadecimal and to decimal.

The basis of a number system is a subset from which linear combinations of the basis elements can be used to construct the entire set. For the unsigned 8-bit number system, the basis is

{ 1, 2, 4, 8, 16, 32, 64, 128}

One way for us to convert a decimal number into binary is to use the basis elements. The overall approach is to start with the largest basis element and work towards the smallest. One by one we ask ourselves whether or not we need that basis element to create our number. If we do, then we set the corresponding bit in our binary result and subtract the basis element from our number. If we do not need it, then we clear the corresponding bit in our binary result. We will work through the algorithm with the example of converting 100 to 8 bit binary. We with the largest basis element (in this case 128) and ask whether or not we need to include it to make 100. Since our number is less than 128, we do not need it so bit 7 is zero. We go the next largest basis element, 64 and ask do we need it. We do need 64 to generate our 100, so bit 6 is one and subtract 100 minus 64 to get 36. Next we go the next basis element, 32 and ask do we need it. Again we do need 32 to generate our 36, so bit 5 is one and we subtract 36 minus 32 to get 4. Continuing along, we need basis element 4 but not 16 8 2 or 1, so bits 43210 are 00100 respectively. Putting it together we get 011001002 (which means 64+32+4).

Observation: If the least significant binary bit is zero, then the number is even.

Observation: If the right most n bits (least significant) are zero, then the number is divisible by 2ⁿ.

Number	Basis	Need it	bit	Operation
100	128	no	bit7=0	none
100	64	yes	bit6=1	subtract 100-64
36	32	yes	bit5=1	subtract 36-32
4	16	no	bit4=0	none
4	8	no	bit3=0	none
4	4	yes	bit2=1	subtract 4-4
0	2	no	bit1=0	none
0	1	no	bit0=0	none

Table 3-4. Example conversion from decimal to unsigned 8-bit binary to hexadecimal.

We define an unsigned 8-bit number using the unsigned char format. When a number is stored into an unsigned char it is converted to 8-bit unsigned value. For example

unsigned char data; // 0 to 255

   unsigned char function(unsigned char input){

       data=input+1;

       return data;}

8-bit signed numbers

If a byte is used to represent a signed 2’s complement number, then the value of the number is

N = -128•b7 + 64•b6 + 32•b5 + 16•b4 + 8•b3 + 4•b2 + 2•b1 + b0

There are also 256 different signed 8 bit numbers. The smallest signed 8-bit number is -128 and the largest is 127. For example, 10000010₂ is -128+2 or -126. Other examples are shown in the following table.

binary	hex	Calculation	decimal
00000000	0x00		0
01000001	0x41	64+1	65
00010110	0x16	16+4+2	22
10000111	0x87	-128+4+2+1	-121
11111111	0xFF	-128+64+32+16+8+4+2+1	-1

Table 3-5. Example conversions from signed 8-bit binary to hexadecimal and to decimal.

For the signed 8-bit number system the basis is

{ 1, 2, 4, 8, 16, 32, 64, -128}

Observation: The most significant bit in a 2’s complement signed number will specify the sign.

Notice that the same binary pattern of 11111111₂ could represent either 255 or -1. It is very important for the software developer to keep track of the number format. The computer can not determine whether the 8-bit number is signed or unsigned. You, as the programmer, will determine whether the number is signed or unsigned by the specific assembly instructions you select to operate on the number. Some operations like addition, subtraction, and shift left (multiply by 2) use the same hardware (instructions) for both unsigned and signed operations. On the other hand, multiply, divide, and shift right (divide by 2) require separate hardware (instruction) for unsigned and signed operations. For example, the 6805/6808/6811 multiply instruction, mul, operates only on unsigned values. So if you use the mul instruction, you are implementing unsigned arithmetic. The Freescale 6812 has both unsigned, mul, and signed, smul, multiply instructions. So if you use the smul instruction, you are implementing signed arithmetic. The compiler will automatically choose the proper implementation.

It is always good programming practice to have clear understanding of the data type for each number, variable, parameter, etc. For some operations there is a difference between the signed and unsigned numbers while for others it does not matter.

	signed different from unsigned		signed same as unsigned
/ %	division	+	addition
*	multiplication	-	subtraction
>	greater than	==	is equal to
<	less than	\|	logical or
>=	greater than or equal to	&	logical and
<=	less than or equal to	^	logical exclusive or
>>	right shift	<<	left shift

Table 3-6. Operations either depend or don't depend on whether the number is signed/unsigned.

The point is that care must be taken when dealing with a mixture of numbers of different sizes and types.

Similar to the unsigned algorithm, we can use the basis to convert a decimal number into signed binary. We will work through the algorithm with the example of converting -100 to 8-bit binary. We with the largest basis element (in this case -128) and decide do we need to include it to make -100. Yes (without -128, we would be unable to add the other basis elements together to get any negative result), so we set bit 7 and subtract the basis element from our value. Our new value is -100 minus -128, which is 28. We go the next largest basis element, 64 and ask do we need it. We do not need 64 to generate our 28, so bit6 is zero. Next we go the next basis element, 32 and ask do we need it. We do not need 32 to generate our 28, so bit5 is zero. Now we need the basis element 16, so we set bit4, and subtract 16 from our number 28 (28-16=12). Continuing along, we need basis elements 8 and 4 but not 2 1, so bits 3210 are 1100. Putting it together we get 100111002 (which means -128+16+8+4).

Number	Basis	Need it	bit	Operation
-100	-128	yes	bit7=1	subtract -100 - -128
28	64	no	bit6=0	none
28	32	no	bit5=0	none
28	16	yes	bit4=1	subtract 28-16
12	8	yes	bit3=1	subtract 12-8
4	4	yes	bit2=1	subtract 4-4
0	2	no	bit1=0	none
0	1	no	bit0=0	none

Table 3-7. Example conversion from decimal to signed 8-bit binary to hexadecimal.

Observation: To take the negative of a 2’s complement signed number we first complement (flip) all the bits, then add 1.

A second way to convert negative numbers into binary is to first convert them into unsigned binary, then do a 2’s complement negate. For example, we earlier found that +100 is 011001002. The 2’s complement negate is a two step process. First we do a logic complement (flip all bits) to get 100110112. Then add one to the result to get 100111002.

A third way to convert negative numbers into binary is to first subtract the number from 256, then convert the unsigned result to binary using the unsigned method. For example, to find -100, we subtract 256 minus 100 to get 156. Then we convert 156 to binary resulting in 100111002. This method works because in 8 bit binary math adding 256 to number does not change the value. E.g., 256-100 is the same value as -100.

Common Error: An error will occur if you use signed operations on unsigned numbers, or use unsigned operations on signed numbers.

Maintenance Tip: To improve the clarity of our software, always specify the format of your data (signed versus unsigned) when defining or accessing the data.

We define a signed 8-bit number using the char format. When a number is stored into a char it is converted to 8-bit signed value. For example

char data; // -128 to 127

   char function(char input){

       data=input+1;

       return data;}

16 bit unsigned numbers

A word or double byte contains 16 bits

where each bit b15,...,b0 is binary and has the value 1 or 0. If a word is used to represent an unsigned number, then the value of the number is

N = 32768•b15 + 16384•b14 + 8192•b13 + 4096•b12
+ 2048•b11 + 1024•b10 + 512•b9 + 256•b8
+ 128•b7 + 64•b6 + 32•b5 + 16•b4 + 8•b3 + 4•b2 + 2•b1 + b0

There are 65,536 different unsigned 16-bit numbers. The smallest unsigned 16-bit number is 0 and the largest is 65535. For example, 0010,0001,1000,0100₂ or 0x2184 is 8192+256+128+4 or 8580. Other examples are shown in the following table.

binary	hex	Calculation	decimal
0000,0000,0000,0000	0x0000		0
0000,0100,0000,0001	0x0401	1024+1	1025
0000,1100,1010,0000	0x0CA0	2048+1024+128+32	3232
1000,1110,0000,0010	0x8E02	32768+2048+1024+512+2	36354
1111,1111,1111,1111	0xFFFF	32768+16384+8192+4096+2048+1024 +512+256+128+64+32+16+8+4+2+1	65535

Table 3-8. Example conversions from unsigned 16-bit binary to hexadecimal and to decimal.

For the unsigned 16-bit number system the basis is

{ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768}

If a word is used to represent a signed 2’s complement number, then the value of the number is

N = -32768•b15 + 16384•b14 + 8192•b13 + 4096•b12
+ 2048•b11 + 1024•b10 + 512•b9 + 256•b8
+ 128•b7 + 64•b6 + 32•b5 + 16•b4 + 8•b3 + 4•b2 + 2•b1 + b0

We define an unsigned 16-bit number using the unsigned short format. When a number is stored into an unsigned short it is converted to 16-bit unsigned value. For example

unsigned short data; // 0 to 65535

   unsigned short function(unsigned short input){

       data=input+1;

       return data;}

16-bit signed numbers

There are also 65,536 different signed 16-bit numbers. The smallest signed 16-bit number is -32768 and the largest is 32767. For example, 1101,0000,0000,0100₂ or 0xD004 is -32768+16384+4096+4 or -12284. Other examples are shown in the following table.

binary	hex	Calculation	decimal
0000,0000,0000,0000	0x0000		0
0000,0100,0000,0001	0x0401	1024+1	1025
0000,1100,1010,0000	0x0CA0	2048+1024+128+32	3232
1000,0100,0000,0010	0x8402	-32768+1024+2	-31742
1111,1111,1111,1111	0xFFFF	-32768+16384+8192+4096+2048+1024 +512+256+128+64+32+16+8+4+2+1	-1

Table 3-9. Example conversions from signed 16-bit binary to hexadecimal and to decimal.

For the signed 16-bit number system the basis is

{ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, -32768}

Maintenance Tip: To improve the quality of our software, we should always specify the precision of our data when defining or accessing the data.

We define a signed 16-bit number using the short format. When a number is stored into a short it is converted to 16-bit signed value. For example

short data; // -23768 to 32767

   short function(short input){

       data=input+1;

       return data;}

Big and Little Endian

When we store 16-bit data into memory it requires two bytes. Since the memory systems on most computers are byte addressable (a unique address for each byte), there are two possible ways to store in memory the two bytes that constitute the 16-bit data. Freescale microcomputers implement the big endian approach that stores the most significant part first. Intel microcomputers implement the little endian approach that stores the least significant part first. The PowerPC is biendian, because it can be configured to efficiently handle both big and little endian. For example, assume we wish to store the 16 bit number 1000 (0x03E8) at locations 0x50,0x51, then

We also can use either the big or little endian approach when storing 32-bit numbers into memory that is byte (8-bit) addressable. If we wish to store the 32-bit number 0x12345678 at locations 0x50-0x53 then

In the above two examples we normally would not pick out individual bytes (e.g., the 0x12), but rather capture the entire multiple byte data as one nondivisable piece of information. On the other hand, if each byte in a multiple byte data structure is individually addressable, then both the big and little endian schemes store the data in first to last sequence. For example, if we wish to store the 4 ASCII characters ‘6811’ which is 0x36383131 at locations 0x50-0x53, then the ASCII ‘6’=0x36 comes first in both big and little endian schemes.

The term "Big Endian" comes from Jonathan Swift’s satire Gulliver’s Travels. In Swift’s book, a Big Endian refers to a person who cracks their egg on the big end. The Lilliputians considered the big endians as inferiors. The big endians fought a long and senseless war with the Lilliputians who insisted it was only proper to break an egg on the little end.

Common Error: An error will occur when data is stored in Big Endian by one computer and read in Little Endian format on another.

Boolean information

A boolean number is has two states. The two values could represent the logical true or false. The positive logic representation defines true as a 1 or high, and false as a 0 or low. If you were controlling a motor, light, heater or air conditioner the boolean could mean on or off. In communication systems, we represent the information as a sequence of booleans: mark or space. For black or white graphic displays we use booleans to specify the state of each pixel. The most efficient storage of booleans on a computer is to map each boolean into one memory bit. In this way, we could pack 8 booleans into each byte. If we have just one boolean to store in memory, out of convenience we allocate an entire byte or word for it. Most C compilers including ICC11/ICC12/Metrowerks define:

False be all zeros, and
True be any nonzero value.

Many programmers add the following macros

#define TRUE 1

   #define FALSE 0

Decimal Numbers

Decimal numbers are written as a sequence of decimal digits (0 through 9). The number may be preceded by a plus or minus sign or followed by a Lor U. Lower case l or u could also be used. The minus sign gives the number a negative value, otherwise it is positive. The plus sign is optional for positive values. Unsigned 16-bit numbers between 32768 and 65535 should be followed by U. You can place a Lat the end of the number to signify it to be a 32-bit signed number. The range of a decimal number depends on the data type as shown in the following table.

type	range	precision	examples
unsigned char	0 to 255	8 bits	0 10 123
char	-127 to 127	8 bits	-123 0 10 +10
unsigned int	0 to 65535U	16 bits	0 2000 2000U 50000U
int	-32767 to 32767	16 bits	-1000 0 1000 +20000
unsigned short	0 to 65535U	16 bits	0 2000 2000U 50000U
short	-32767 to 32767	16 bits	-1000 0 1000 +20000
long	-2147483647L to 2147483647L	32 bits	-1234567L 0L 1234567L

Table 3-10. The range of decimal numbers.

Because the 6811 and 6812 microcomputers are most efficient for 16 bit data (and not 32 bit data), the unsigned int and int data types are 16 bits. On the other hand, on a x86-based machine, the unsigned int and int data types are 32 bits. In order to make your software more compatible with other machines, it is preferable to use the short type when needing 16 bit data and the long type for 32 bit data.

type	6811/6812	x86
unsigned char	8 bits	8 bits
char	8 bits	8 bits
unsigned int	16 bits	32 bits
int	16 bits	32 bits
unsigned short	16 bits	16 bits
short	16 bits	16 bits
long	32 bits	32 bits

Table 3-11. Differences between a 6811/6812 and an x86

Since the 6811 and 6812 microcomputers do not have direct support of 32-bit numbers, the use of long data types should be minimized. On the other hand, a careful observation of the code generated yields the fact that these compilers are more efficient with 16 bit numbers than with 8 bit numbers.

Decimal numbers are reduced to their two's complement or unsigned binary equivalent and stored as 8/16/32-bit binary values.

The manner in which decimal literals are treated depends on the context. For example

short I;

   unsigned short J;

   char K;

   unsigned char L;

   long M;

   void main(void){ 

       I=97;    /* 16 bits 0x0061 */

       J=97;    /* 16 bits 0x0061 */

       K=97;    /* 8 bits 0x61 */

       L=97;    /* 8 bits 0x61 */

       M=97;    /* 32 bits 0x00000061 */}

The 6812 code generated by the ICC12 compiler is as follows

    .area text 

   _main:: 

       pshx 

       tfr s,x 

       movw #97,_I   ;16 bits

       movw #97,_J   ;16 bits 

       movb #97,_K   ;8 bits 

       movb #97,_L   ;8 bits 

       ldy #L2 

       jsr __ly2reg ;32 bits

       ldy #_M 

       jsr __lreg2y 

       tfr x,s 

       pulx 

       rts 

       .area bss 

   _M:: .blkb 4 

   _L:: .blkb 1 

   _K:: .blkb 1 

   _J:: .blkb 2 

   _I:: .blkb 2 

       .area text 

   L2: .word 0,97

The 6812 code generated by the Metrowerks compiler is much more efficient when dealing with 32 bit long integers

    LDAB #97

       CLRA 

       STD I

       STD J

       STAB K

       STAB L

       STD M:2

       CLRB 

       STD M

       RTS

Octal Numbers

If a sequence of digits begins with a leading 0(zero) it is interpreted as an octal value. There are only eight octal digits, 0 through 7. As with decimal numbers, octal numbers are converted to their binary equivalent in 8-bit or 16-bit words. The range of an octal number depends on the data type as shown in the following table.

type	range	precision	examples
unsigned char	0 to 0377	8 bits	0 010 0123
char	-0200 to 0177	8 bits	-0123 0 010 +010
unsigned int	0 to 0177777	16 bits	0 02000 0150000U
int	-077777 to 077777	16 bits	-01000 0 01000 +020000
unsigned short	0 to 0177777	16 bits	0 02000 0150000U
short	-077777 to 077777	16 bits	-01000 0 01000 +020000
long	-017777777777L to 017777777777L	32 bits	-01234567L 0L 01234567L

Table 3-12. The range of octal numbers.

Notice that the octal values 0 through 07 are equivalent to the decimal values 0 through 7. One of the advantages of this format is that it is very easy to convert back and forth between octal and binary. Each octal digit maps directly to/from 3 binary digits.

Hexadecimal Numbers

The hexadecimal number system uses base 16 as opposed to our regular decimal number system that uses base 10. Like the octal format, the hexadecimal format is also a convenient mechanism for us humans to represent binary information, because it is extremely simple for us to convert back and forth between binary and hexadecimal. A nibble is defined as 4 binary bits. Each value of the 4-bit nibble is mapped into a unique hex digit.

Hex Digit	Decimal Value	Binary Value
0	0	0000
1	1	0001
2	2	0010
3	3	0011
4	4	0100
5	5	0101
6	6	0110
7	7	0111
8	8	1000
9	9	1001
A or a	10	1010
B or b	11	1011
C or c	12	1100
D or d	13	1101
E or e	14	1110
F or f	15	1111

Table 3-13. Definition of hexadecimal representation.

Computer programming environments use a wide variety of symbolic notations to specify the numbers in various bases. The following table illustrates various formats for numbers

environment	binary format	hexadecimal format	decimal format
Freescale assembly language	%01111010	$7A	122
Intel and TI assembly language	01111010B	7AH	122
C language	-	0x7A	122

Table 3-14. Various hexadecimal formats.

To convert from binary to hexadecimal we can:

1) divide the binary number into right justified nibbles;
2) convert each nibble into its corresponding hexadecimal digit.

To convert from hexadecimal to binary we can:

1) convert each hexadecimal digit into its corresponding 4 bit binary nibble;
2) combine the nibbles into a single binary number.

If a sequence of digits begins with 0x or 0X then it is taken as a hexadecimal value. In this case the word digits refers to hexadecimal digits (0 through F). As with decimal numbers, hexadecimal numbers are converted to their binary equivalent in 8-bit bytes or16-bit words. The range of a hexadecimal number depends on the data type as shown in the following table.

type	range	precision	examples
unsigned char	0x00 to 0xFF	8 bits	0x01 0x3a 0xB3
char	-0x7F to 0x7F	8 bits	-0x01 0x3a -0x7B
unsigned int	0x0000 to 0xFFFF	16 bits	0x22 0Xabcd 0xF0A6
int	-0x7FFF to 0x7FFF	16 bits	-0x22 0X0 +0x70A6
unsigned short	0x0000 to 0xFFFF	16 bits	0x22 0Xabcd 0xF0A6
short	-0x7FFF to 0x7FFF	16 bits	-0x1234 0x0 +0x7abc
long	-0x7FFFFFFF to 0x7FFFFFFF	32 bits	-0x1234567 0xABCDEF

Table 3-15. The range of hexadecimal numbers.

Character Literals

Character literals consist of one or two characters surrounded by apostrophes. The manner in which character literals are treated depends on the context. For example

short I;

   unsigned short J;

   char K;

   unsigned char L;

   long M;

   void main(void){ 

       I='a';    /* 16 bits 0x0061 */

       J='a';    /* 16 bits 0x0061 */

       K='a';    /* 8 bits 0x61 */

       L='a';    /* 8 bits 0x61 */

       M='a';    /* 32 bits 0x00000061 */}

The 6812 code generated by the ICC12 compiler is as follows

    .area text 

   _main:: 

       pshx 

       tfr s,x 

       movw #97,_I   ;16 bits

       movw #97,_J   ;16 bits 

       movb #97,_K   ;8 bits 

       movb #97,_L   ;8 bits 

       ldy #L2 

       jsr __ly2reg ;32 bits

       ldy #_M 

       jsr __lreg2y 

       tfr x,s 

       pulx 

       rts 

       .area bss 

   _M:: .blkb 4 

   _L:: .blkb 1 

   _K:: .blkb 1 

   _J:: .blkb 2 

   _I:: .blkb 2 

       .area text 

   L2: .word 0,97

The 6812 code generated by the Metrowerks compiler is as follows

    LDAB #97

       CLRA 

       STD I

       STD J

       STAB K

       STAB L

       STD M:2

       CLRB 

       STD M

       RTS

All standard ASCII characters are positive because the high-order bit is zero. In most cases it doesn't matter if we declare character variables as signed or unsigned. On the other hand, we have seen earlier that the compiler treats signed and unsigned numbers differently. Unless a character variable is specifically declared to be unsigned, its high-order bit will be taken as a sign bit. Therefore, we should not expect a character variable, which is not declared unsigned, to compare equal to the same character literal if the high-order bit is set. For more on this see Chapter 4 on Variables.

String Literals

Strictly speaking, C does not recognize character strings, but it does recognize arrays of characters and provides a way to write character arrays, which we call strings. Surrounding a character sequence with quotation marks, e.g., "Jon", sets up an array of characters and generates the address of the array. In other words, at the point in a program where it appears, a string literal produces the address of the specified array of character literals. The array itself is located elsewhere. ICC11 and ICC12 will place the strings into the text area. I.e., the string literals are considered constant and will be defined in the ROM of an embedded system. This is very important to remember. Notice that this differs from a character literal which generates the value of the literal directly. Just to be sure that this distinct feature of the C language is not overlooked, consider the following example:

char *pt;

   void main(void){ 

       pt="Jon"; /* pointer to the string */

       printf(pt); /* passes the pointer not the data itself */

   }

The 6812 code generated by the ICC12 compiler is as follows

    .area text 

   _main:: 

       movw #L2,_pt 

       ldd _pt

       jsr _printf

       rts 

       .area bss 

   _pt:: .blkb 2 

       .area text 

   L2: .byte 'J,'o,'n,0

The 6812 code generated by the Metrowerks compiler is virtually the same as ICC12. Both compilers place the string in memory and use a pointer to it when calling printf. ICC12 will pass the parameter in RegD,while Metrowerks pushes the parameter on the stack.

    MOVW #"Jon",pt 

       LDD pt

       PSHD

       JSR printf

       PULD

       RTS

Notice that the pointer, pt, is allocated in RAM (.area bss) and the string is stored in ROM (.area text). The assignment statement pt="Jon"; copies the address not the data. Similarly, the function printf() must receive the address of a string as its first (in this case, only) argument. First, the address of the string is assigned to the character pointer pt (ICC11/ICC12 use the 16 bit Register D for the first parameter). Unlike other languages, the string itself is not assigned to pt, only its address is. After all, pt is a 16-bit object and, therefore, cannot hold the string itself. The same program could be written better as

void main(void){ 

       printf("Jon"); /* passes the pointer not the data itself */

   }

Notice again that the program passes a pointer to the string into printf(), and not the string itself. The 6812 code generated by the ICC12 compiler is as follows

    .area text 

   _main:: 

       ldd #L2

       jsr _printf

       rts 

       .area text 

   L2: .byte 'J,'o,'n,0

Except for the parameter passing, the 6812 code generated by the Metrowerks compiler is virtually the same as ICC12.

    LDD #"Jon" 

       PSHD

       JSR printf

       PULD

       RTS

In this case, it is tempting to think that the string itself is being passed to printf(); but, as before, only its address is.

Since strings may contain as few as one or two characters, they provide an alternative way of writing character literals in situations where the address, rather than the character itself, is needed.

It is a convention in C to identify the end of a character string with a null (zero) character. Therefore, C compilers automatically suffix character strings with such a terminator. Thus, the string "Jon" sets up an array of four characters ('J', 'o', 'n', and zero) and generates the address of the first character, for use by the program.

Remember that 'A' is different from "A", consider the following example:

char letter,*pt;

   void main(void){ 

       pt="A";      /* pointer to the string */

       letter='A';  /* the data itself ('A' ASCII 65=$41) */

   }

The 6812 code generated by the ICC12 compiler is as follows

    .area text 

   _main:: 

       movw #L2,_pt 

       movb #65,_letter

       rts 

       .area bss 

   _letter:: .blkb 1 

   _pt:: .blkb 2 

       .area text 

   L2: .byte 'A,0

The 6812 code generated by the Metrowerks compiler is as follows

    MOVW #"A",pt 

       LDAB #65

       STAB letter

       RTS

Escape Sequences

Sometimes it is desirable to code nongraphic characters in a character or string literal. This can be done by using an escape sequence--a sequence of two or more characters in which the first (escape) character changes the meaning of the following character(s). When this is done the entire sequence generates only one character. C uses the backslash (\) for the escape character. The following escape sequences are recognized by the ICC11/ICC12/Metrowerks compilers:

sequence	name	value
\n	newline, linefeed	$0A = 10
\t	tab	$09 = 9
\b	backspace	$08 = 8
\f	form feed	$0C = 12
\a	bell	$07 = 7
\r	return	$0D = 13
\v	vertical tab	$0B = 11
\0	null	$00 = 0
\"	ASCII quote	$22 = 34
\\	ASCII back slash	$5C = 92
\'	ASCII single quote	$27 = 39

Table 3-16. The escape sequences supported by ICC11 ICC12 and Metrowerks.

Other nonprinting characters can also be defined using the \ooo octal format. The digits ooo can define any 8-bit octal number. The following three lines are equivalent:

    printf("\tJon\n");

       printf("\11Jon\12"); 

       printf("\011Jon\012");

The term newline refers to a single character which, when written to an output device, starts a new line. Some hardware devices use the ASCII carriage return (13) as the newline character while others use the ASCII line feed (10). It really doesn't matter which is the case as long as we write \n in our programs. Avoid using the ASCII value directly since that could produce compatibility problems between different compilers.

There is one other type of escape sequence: anything undefined. If the backslash is followed by any character other than the ones described above, then the backslash is ignored and the following character is taken literally. So the way to code the backslash is by writing a pair of backslashes and the way to code an apostrophe or a quote is by writing \' or \" respectively.

Pages

Sunday, December 18, 2011

Embedded C Tutorial (Chapter-3)