Chapter 2: Tokens
What's in Chapter 2?
Literals include numbers characters and strings
Keywords are predefined
Names are user-defined
Punctuation marks
Operators
To understand the syntax of a C program, we divide it into tokens separated by white spaces and punctuation. Remember the white spaces include space, tab, carriage returns and line feeds. A token may be a single character or a sequence of characters that form a single item. The first step of a compiler is to process the program into a list of tokens and punctuation marks. The following example includes punctuation marks of
( ) { } ;
The compiler then checks for proper syntax. And, finally, it
creates object code that performs the intended operations. In
the following example:void main(void){ short z;
z = 0;
while(1){
z = z+1;
}
}
void main ( void ) { short z ; z = 0 ; while ( 1 ) { z = z + 1
; } }
Since tokens are the building blocks of programs, we begin our study of C language by defining its tokens.
ASCII Character Set
Like most programming languages C uses the standard ASCII character set. The following table shows the 128 standard ASCII code. One or more white space can be used to separate tokens and or punctuation marks. The white space characters in C include horizontal tab (9=$09), the carriage return (13=$0D), the line feed (10=$0A), space (32=$20).
BITS 4 to 6
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
||
0 |
NUL |
DLE |
SP |
0 |
@ |
P |
` |
p |
|
B |
1 |
SOH |
DC1 |
! |
1 |
A |
Q |
a |
q |
I |
2 |
STX |
DC2 |
" |
2 |
B |
R |
b |
r |
T |
3 |
ETX |
DC3 |
# |
3 |
C |
S |
c |
s |
S |
4 |
EOT |
DC4 |
$ |
4 |
D |
T |
d |
t |
5 |
ENQ |
NAK |
% |
5 |
E |
U |
e |
u |
|
0 |
6 |
ACK |
SYN |
& |
6 |
F |
V |
f |
v |
7 |
BEL |
ETB |
' |
7 |
G |
W |
g |
w |
|
T |
8 |
BS |
CAN |
( |
8 |
H |
X |
h |
x |
O |
9 |
HT |
EM |
) |
9 |
I |
Y |
i |
y |
A |
LF |
SUB |
* |
: |
J |
Z |
j |
z |
|
3 |
B |
VT |
ESC |
+ |
; |
K |
[ |
k |
{ |
C |
FF |
FS |
, |
< |
L |
\ |
l |
| |
|
D |
CR |
GS |
- |
= |
M |
] |
m |
} |
|
E |
SO |
RS |
. |
> |
N |
^ |
n |
~ |
|
F |
S1 |
US |
/ |
? |
O |
_ |
o |
DEL |
the numeric digits 0 to 9 (48 to 57 or $30 to $39),
the uppercase alphabet A to Z (65 to 90 or $41 to $5A),
the lowercase alphabet a to z (97 to122 or $61 to $7A), and
the special characters (all the rest). Numeric literals consist of an uninterrupted sequence of digits delimited by white spaces or special characters (operators or punctuation). Although ICC12 and Metrowerks do support floating point, this document will not cover it. The use of floating point requires a substantial about of program memory and execution time, therefore most applications should be implemented using integer math. Consequently the period will not appear in numbers as described in this document. For more information about numbers see the sections on decimals, octals, or hexadecimals in Chapter 3.
Character literals are written by enclosing an ASCII character in apostrophes (single quotes). We would write
'a'
for a character with the ASCII value of the lowercase a (97).
The control characters can also be defined as constants. For example
'\t'
is the tab character. For more information about character literals
see the section on characters in Chapter 3.String literals are written as a sequence of ASCII characters bounded by quotation marks (double quotes). Thus, "ABC" describes a string of characters containing the first three letters of the alphabet in uppercase. For more information about string literals see the section on strings in Chapter 3.
Keywords
There are some predefined tokens, called keywords, that have specific meaning in C programs. The reserved words we will cover in this document are:
keyword
|
meaning
|
asm |
Insert assembly code |
auto |
Specifies a variable as automatic (created on the stack) |
break |
Causes the program control structure to finish |
case |
One possibility within a switch statement |
char |
8-bit integer |
const |
Defines global parameter as constant in ROM,
and defines a local parameter as fixed value |
continue |
Causes the program to go to beginning of loop |
default |
Used in switch statement for all other cases |
do |
Used for creating program loops |
double |
Specifies variable as double precision floating point |
else |
Alternative part of a conditional |
extern |
Defined in another module |
float |
Specifies variable as single precision floating point |
for |
Used for creating program loops |
goto |
Causes program to jump to specified location |
if |
Conditional control structure |
int |
16-bit integer
(same as short on the 6811 and 6812) It should be avoided in
most cases because the implementation will vary from compiler to
compiler. |
long |
32-bit integer |
register |
Specifies how to implement a local |
return |
Leave function |
short |
16-bit integer |
signed |
Specifies variable as signed (default) |
sizeof |
Built-in function returns the size of an object |
static |
Stored permanently in memory, accessed locally |
struct |
Used for creating data structures |
switch |
Complex conditional control structure |
typedef |
Used to create new data types |
unsigned |
Always greater than or equal to zero |
void |
Used in parameter list to mean no parameter |
volatile |
Can change implicitly outside
the direct action of the software. It disables compiler
optimization, forcing the compiler to fetch a new value each
time |
while |
Used for creating program loops |
Names
We use names to identify our variables, functions, and macros. ICC11/ICC12 names may be up to 31 characters long. Metrowerks names may be up to xxx characters long. Names must begin with a letter or underscore and the remaining characters must be either letters or digits. We can use a mixture of upper and lower case or the underscore character to create self-explaining symbols. E.g.,
time_of_day go_left_then_stop
TimeOfDay GoLeftThenStop
The careful selection of names goes a long way to making our programs more readable. Names may be written with both upper and lowercase letters. The names are case sensitive. Therefore the following names are different:
thetemperature
THETEMPERATURE
TheTemperature
Every global name defined with the ICC11/ICC12 compiler generates an assembly language label of the same name, but preceded by an underscore. The purpose of the underscore is to avoid clashes with the assembler's reserved words. So, as a matter of practice, we should not ordinarily name globals with leading underscores. Metrowerks labels will not include the underscore. For examples of this naming convention, observe the assembly generated by the compiler (either the assembly itself in the *.s file or the listing file *.lst file.) These assembly names are important during the debugging stages. We can use the map file to get the absolute addresses for these labels, then use the debugger to observe and modify their contents.
Since the ImageCraft compiler adds its own underscore, names written with a leading underscore appear in the assembly file with two leading underscores.
Developing a naming convention will avoid confusion. Possible ideas to consider include:
1. Start every variable name with its type. E.g.,
n means 8 bit signed integer
u means 8 bit unsigned integer
m means 16 bit signed integer
v means 16 bit unsigned integer
c means 8 bit ASCII character
s means null terminated ASCII string3. Start every global variable and function with associated file or module name. In the following example the names all begin with
Bit_
. Notice how similar this naming convention recreates the look
and feel of the modularity achieved by classes in C++. E.g., /* **********file=Bit.c*************
Pointer implementation of the a Bit_Fifo
These routines can be used to save (Bit_Put) and
recall (Bit_Get) binary data 1 bit at a time (bit streams)
Information is saved/recalled in a first in first out manner
Bit_FifoSize is the number of 16 bit words in the Bit_Fifo
The Bit_Fifo is full when it has 16*Bit_FifoSize-1 bits */
#define Bit_FifoSize4
// 16*4-1=31 bits of storage
unsigned short Bit_Fifo[Bit_FifoSize]; // storage for Bit Stream
struct Bit_Pointer{
unsigned short Mask; // 0x8000, 0x4000,...,2,1
unsigned short *WPt;}; // Pointer to word containing bit
typedef struct Bit_Pointer Bit_PointerType;
Bit_PointerType Bit_PutPt; // Pointer of where to put next
Bit_PointerType Bit_GetPt; // Pointer of where to get next
/* Bit_FIFO is empty if Bit_PutPt==Bit_GetPt */
/* Bit_FIFO is full if Bit_PutPt+1==Bit_GetPt */
short Bit_Same(Bit_PointerType p1, Bit_PointerType p2){
if((p1.WPt==p2.WPt)&&(p1.Mask==p2.Mask))
return(1); //yes
return(0);} // no
void Bit_Init(void) {
Bit_PutPt.Mask=Bit_GetPt.Mask=0x8000;
Bit_PutPt.WPt=Bit_GetPt.WPt=&Bit_Fifo[0]; /* Empty */
}
// returns TRUE=1 if successful,
// FALSE=0 if full and data not saved
// input is boolean FALSE if data==0
short Bit_Put (short data) { Bit_PointerType myPutPt;
myPutPt=Bit_PutPt;
myPutPt.Mask=myPutPt.Mask>>1;
if(myPutPt.Mask==0) {
myPutPt.Mask=0x8000;
if((++myPutPt.WPt)==&Bit_Fifo[Bit_FifoSize])
myPutPt.WPt=&Bit_Fifo[0]; // wrap
}
if (Bit_Same(myPutPt,Bit_GetPt))
return(0); /* Failed, Bit_Fifo was full */
else {
if(data)
(*Bit_PutPt.WPt) |= Bit_PutPt.Mask; // set bit
else
(*Bit_PutPt.WPt) &= ~Bit_PutPt.Mask; // clear bit
Bit_PutPt=myPutPt;
return(1);
}
}
// returns TRUE=1 if successful,
// FALSE=0 if empty and data not removed
// output is boolean 0 means FALSE, nonzero is true
short Bit_Get (unsigned short *datapt) {
if (Bit_Same(Bit_PutPt,Bit_GetPt))
return(0); /* Failed, Bit_Fifo was empty */
else {
*datapt=(*Bit_GetPt.WPt)&Bit_GetPt.Mask;
Bit_GetPt.Mask=Bit_GetPt.Mask>>1;
if(Bit_GetPt.Mask==0) {
Bit_GetPt.Mask=0x8000;
if((++Bit_GetPt.WPt)==&Bit_Fifo[Bit_FifoSize])
Bit_GetPt.WPt=&Bit_Fifo[0]; // wrap
}
return(1);
}
}
Punctuation
Punctuation marks (semicolons, colons, commas, apostrophes, quotation marks, braces, brackets, and parentheses) are very important in C. It is one of the most frequent sources of errors for both the beginning and experienced programmers.
Semicolons
Semicolons are used as statement terminators. Strange and confusing syntax errors may be generated when you forget a semicolon, so this is one of the first things to check when trying to remove syntax errors. Notice that one semicolon is placed at the end of every simple statement in the following example
#define PORTB *(unsigned char volatile *)(0x1004)
void Step(void){
PORTB = 10;
PORTB = 9;
PORTB = 5;
PORTB = 6;
}
#
and conclude at the end of the line. The following example will
fill the array DataBuffer
with data read from the input port (PORTC). We assume in this
example that Port C has been initialized as an input. Semicolons
are also used in the for loop
statement (see also Chapter 6), as illustrated byvoid Fill(void){ short j;
for(j=0; j<100; j++){
DataBuffer[j] = PORTC;
}
}
Colons
We can define a label using the colon. Although C has a
goto
statement, I
strongly discourage its use. I believe the software is easier
to understand using the block-structured control statements (if
, if else
, for
, while
, do while
, and switch case
.) The following example will return after the Port C input reads
the same value 100 times in a row. Again we assume Port C has
been initialized as an input. Notice that every time the current
value on Port C is different from the previous value the counter
is reinitialized.char Debounce(void){ short Cnt; unsigned char LastData;
Start: Cnt=0; /* number of times Port C is the same
*/
LastData=PORTC;
Loop: if(++Cnt==100) goto Done; /* same thing 100 times
*/
if(LastData!=PORTC) goto Start;/* changed */
goto Loop;
Done: return(LastData);
}
case
, and default
prefixes that appear in switch statements. For more information
see the section on switch in Chapter 6. In the following example, the next stepper motor
output is found (the proper sequence is 10,9,5,6). The default
case is used to restart the pattern.unsigned char NextStep(unsigned char step){ unsigned char theNext;
switch(step){
case 10: theNext=9; break;
case 9: theNext=5; break;
case 5: theNext=6; break;
case 6: theNext=10; break;
default: theNext=10;
}
return(theNext);
}
goto
and switch
), we see that a label is created that is a potential target for
a transfer of control.Commas
Commas separate items that appear in lists. We can create multiple variables of the same type. E.g.,
unsigned short beginTime,endTime,elapsedTime;
short add(short x, short y){ short z;
z = x+y;
if((x>0)&&(y>0)&&(z<0))z = 32767;
if((x<0)&&(y<0)&&(z>0))z = -32768;
return(z);
}
void main(void){ short a,b;
a=add(2000,2000)
b=0
while(1){
b=add(b,1);
}
Lists can also be used in general expressions. Sometimes it adds clarity to a program if related variables are modified at the same place. The value of a list of expressions is always the value of the last expression in the list. In the following example, first
thetime
is incremented, thedate is decremented, then x is set to k+2.x=(thetime++,--thedate,k+2);
Apostrophes are used to specify character literals. For more information about character literals see the section on characters in Chapter 3. Assuming the function
OutChar
will print a single ASCII character, the following example will
print the lower case alphabet:void Alphabet(void){ unsigned char mych;
for(mych='a';mych<='z';mych++){
OutChar(mych); /* Print next letter */
}
}
Quotation marks are used to specify string literals. For more information about string literals see the section on strings in Chapter 3. Example
unsigned const char Msg[12]= "Hello World"; /* Place for 11
characters and termination*/
void PrintHelloWorld(void){
SCI_OutString("Hello World");
SCI_OutString(Msg);
}
Letter='A';
places the ASCII code (65) into the variable Letter
. The command pt="A";
creates an ASCII string and places a pointer to it into the variable
pt
. Braces
Braces {} are used throughout C programs. The most common application is for creating a compound statement. Each open brace { must be matched with a closing brace }. One approach that helps to match up braces is to use indenting. Each time an open brace is used, the source code is tabbed over. In this way, it is easy to see at a glance the brace pairs. Examples of this approach to tabbing are the Bit_Put function within Listing 2-2 and the median function in Listing 1-4.
Brackets
Square brackets enclose array dimensions (in declarations) and subscripts (in expressions). Thus,
short Fifo[100];
Fifo
consisting of 80 words numbered from 0 through 99, andPutPt = &Fifo[0];
PutPt
to the address of the first entry of the array.Parentheses
Parentheses enclose argument lists that are associated with function declarations and calls. They are required even if there are no arguments.
As with all programming languages, C uses parentheses to control the order in which expressions are evaluated. Thus, (11+3)/2 yields 7, whereas 11+3/2 yields 12. Parentheses are very important when writing expressions.
Operators
The special characters used as expression operators are covered in the operator section in chapter 5. There are many operators, some of which are single characters
~ ! @ % ^ & * - + = | / : ? < > ,
++ -- << >> <= += -= *= /= == |= %= &= ^= || && !=
<<= >>=
The C syntax can be confusing to the beginning programmer. For example
z = x+y; /* sets z equal to the sum of x and y */
z = x_y; /* sets z equal to the value of x_y */
No comments:
Post a Comment