What type of command does it have in assembly language? Assembly Language Basics

What type of command does it have in assembly language?  Assembly Language Basics
What type of command does it have in assembly language? Assembly Language Basics

Structures in assembly language

The arrays we considered above are a collection of elements of the same type. But often in applications there is a need to consider a certain set of data different types as some single type.

This is very important, for example, for database programs, where it is necessary to associate a collection of data of different types with one object.

For example, we previously looked at Listing 4, in which we worked with an array of three-byte elements. Each element, in turn, consisted of two elements of different types: a one-byte counter field and a two-byte field, which could carry some other information necessary for storage and processing. If the reader is familiar with one of the high-level languages, then he knows that such an object is usually described using a special data type - structures.

In order to improve the usability of the assembly language, this data type was also introduced into it.

A-priory structure is a data type consisting of a fixed number of elements of different types.

To use structures in a program, you must perform three steps:

    Set structure template .

    In essence, this means defining a new data type, which can subsequently be used to define variables of this type.

    Define structure instance .

    This stage involves initializing a specific variable with a predefined (using template) structure.

    Organize accessing structure elements .

It is very important that you understand from the very beginning what the difference is between description structures in the program and its definition.

Describe structure in a program simply means indicating its outline or pattern; memory is not allocated.

This template can only be considered as information for the translator about the location of the fields and their default value.

Define structure means instructing the translator to allocate memory and assign a symbolic name to this memory area.

A structure can be described in a program only once, but defined any number of times.

Description of the structure template

The structure template description has the following syntax:

structure_name STRUC

structure_name ENDS

Here is a sequence of data description directives db, dw, dd, dq And dt.

Their operands determine the size of the fields and, if necessary, the initial values. These values ​​will probably initialize the corresponding fields when defining the structure.

As we already noted when describing the template, memory is not allocated, since this is just information for the translator.

Location The template in the program can be arbitrary, but, following the logic of the one-pass translator, it must be located before the place where the variable with the type of the given structure is defined. That is, when describing a variable with the type of a certain structure in a data segment, its template must be placed at the beginning of the data segment or before it.

Let's consider working with structures using the example of modeling a database about employees of a certain department.

For simplicity, in order to avoid problems of converting information when entering, we will agree that all fields are character fields.

Let's define the record structure of this database with the following template:

Defining data with structure type

To use a structure described using a template in a program, you must define a variable with the type of this structure. To do this, the following syntactic construction is used:

[variable name] structure_name

    variable name- identifier of a variable of this structure type.

    Specifying a variable name is optional. If you do not specify it, a memory area of ​​size equal to the sum of the lengths of all elements of the structure will simply be allocated.

    list of values- a list of initial values ​​of structure elements enclosed in angle brackets, separated by commas.

    His assignment is also optional.

    If the list is not fully specified, then all fields of the structure for this variable are initialized with values ​​from the template, if any are specified.

    It is possible to initialize individual fields, but in this case the missing fields must be separated by commas. Omitted fields will be initialized with the values ​​from the structure template. If, when defining a new variable with the type of a given structure, we agree with all the field values ​​in its template (that is, those specified by default), then we just need to write angle brackets.

    Eg: victor worker.

For example, let's define several variables with the type of structure described above.

Methods for working with structure

The idea of ​​introducing a structured type into any programming language is to combine variables of different types into one object.

The language must have a means of accessing these variables within a specific instance of the structure. In order to refer to a field of some structure in a command, a special operator is used - symbol ". " (dot). It is used in the following syntax:

    address_expression- identifier of a variable of some structural type or expression in brackets in accordance with the syntax rules indicated below (Fig. 1);

    structure_field_name- field name from the structure template.

    This, in fact, is also an address, or rather, the offset of the field from the beginning of the structure.

Thus the operator " . " (dot) evaluates expression

Rice. 5. Syntax of an address expression in a structure field access operator

Let us demonstrate using the example of the structure we have defined. worker some techniques for working with structures.

For example, extract to ax field values ​​with age. Since it is unlikely that the age of a working person will be greater than 99 years, then after placing the contents of this character field in the register ax It will be convenient to convert it to binary representation with the command aad.

Be careful because due to the principle of data storage “low byte at low address” the highest digit of the age will be placed in al, and the youngest - in ah.

To make adjustments, just use the command xchg al,ah:

mov ax,word ptr sotr1.age ;in al age sotr1

or you can do it like this:

Further work with an array of structures is carried out in the same way as with a one-dimensional array. Several questions arise here:

What to do with the size and how to organize indexing of array elements?

Similar to other identifiers defined in the program, the translator assigns a type attribute to the name of the structure type and the name of the variable with the structure type. The value of this attribute is the size in bytes occupied by the fields of this structure. You can retrieve this value using the operator type.

Once the size of a structure instance is known, organizing indexing in an array of structures is not particularly difficult.

Eg:

How to copy a field from one structure to the corresponding field of another structure? Or how to copy the entire structure? Let's copy the field nam third employee in the field nam fifth employee:

mas_sotr worker 10 dup()

mov bx,offset mas_sotr

mov si,(type worker)*2 ;si=77*2

mov di,(type worker)*4 ;si=77*4

It seems to me that being a programmer sooner or later makes a person look like a good housewife. He, like her, is constantly in search of where to save something, cut back, and make a wonderful dinner from a minimum of ingredients. And if this is successful, then the moral satisfaction you get is no less, and maybe even more, than from a wonderful dinner with a housewife. The degree of this satisfaction, it seems to me, depends on the degree of love for one’s profession.

On the other hand, successes in the development of software and hardware somewhat relax the programmer, and quite often a situation similar to the well-known proverb about a fly and an elephant is observed - to solve some minor problem, heavy tools are used, the effectiveness of which, in the general case, is significant only when implementation of relatively large projects.

The presence of the following two types of data in the language is probably explained by the desire of the “housewife” to use the working area of ​​the table (RAM) as efficiently as possible when preparing food or for placing products (program data).

Course work

By discipline " System Programming»

Topic No. 4: “Solving procedure problems”

Option 2

EAST SIBERIAN STATE UNIVERSITY

TECHNOLOGY AND MANAGEMENT

____________________________________________________________________

COLLEGE OF TECHNOLOGY

EXERCISE

for course work

Discipline:
Topic: Solving problems on procedures
Performer(s): Arina Aleksandrovna Glavinskaya
Head: Dambaeva Sesegma Viktorovna
Summary of work: study of subroutines in Assembly language,
solving problems using subroutines
1. Theoretical part: Basic information about the Assembly language (set
commands, etc.), Organization of subroutines, Methods of passing parameters
in subroutines
2. Practical part: Develop two subroutines, one of which converts any given letter into a capital letter (including for Russian letters), and the other converts a letter into a lowercase letter.
converts any given letter to uppercase, and the other converts a given letter to lowercase.
Converts a letter to lowercase.
Project deadlines according to schedule:
1. Theoretical part - 30% by week 7.
2. Practical part - 70% by week 11.
3. Protection - 100% by week 14.
Design requirements:
1. The calculation and explanatory note of the course project must be presented in
electronic and hard copies.
2. The volume of the report must be at least 20 typewritten pages excluding attachments.
3. The RPP is drawn up in accordance with GOST 7.32-91 and signed by the manager.

Work manager __________________

Performer __________________

Date of issue " 26 " September 2017 G.


Introduction. 2

1.1 Basic information about the Assembly language. 3

1.1.1 Command set. 4

1.2 Organization of subroutines in Assembly language. 4

1.3 Methods of passing parameters in subroutines. 6

1.3.1 Passing parameters through registers.. 6

1.3.2 Passing parameters via the stack. 7

2 PRACTICAL SECTION... 9

2.1 Statement of the problem. 9

2.2 Description of the solution to the problem. 9

2.3 Testing the program... 7

Conclusion. 8

References.. 9


Introduction

Assembly language is notoriously difficult to program. As you know, there are many different languages ​​now high level, which allow you to spend much less effort when writing programs. Naturally, the question arises when a programmer may need to use assembler when writing programs. Currently, we can point out two areas in which the use of Assembly language is justified, and often necessary.

Firstly, these are the so-called machine-dependent system programs; they usually control various devices computer (such programs are called drivers). In these system programs special machine commands are used that do not need to be used in ordinary (or, as they say, applied) programs. These commands are impossible or very difficult to define in a high-level language.

The second area of ​​application of Assembler is related to optimization of program execution. Very often, translator programs (compilers) from high-level languages ​​produce a very inefficient machine language program. This usually applies to computational programs in which most of the time a very small (about 3-5%) section of the program (the main loop) is executed. To solve this problem, so-called multilingual programming systems can be used, which allow parts of the program to be written in different languages. Typically, the main part of the program is written in a high-level programming language (Fortran, Pascal, C, etc.), and time-critical sections of the program are written in Assembly. The speed of the entire program can increase significantly. Often this the only way make the program produce results in an acceptable time.

The purpose of this course work is to gain practical programming skills in assembly language.

Job objectives:

1. Study basic information about the Assembly language (structure and components of an Assembly program, command format, organization of subroutines, etc.);

2. Study the types of bit operations, the format and logic of operation of Assembler logical instructions;

3. Solve an individual problem on the use of subroutines in Assembly language;

4.. Formulate a conclusion about the work done.

1 THEORETICAL SECTION

Assembly Language Basics

Assembler is a low-level programming language, which is a format for recording machine commands that is convenient for human perception.

Assembly language commands correspond one to one to processor commands and, in fact, represent a convenient symbolic form of recording (mnemonic code) of commands and their arguments. Assembly language also provides basic programming abstractions: linking program parts and data through symbolically named labels and directives.

Assembly directives allow you to include blocks of data (described explicitly or read from a file) into a program; repeat a certain fragment a specified number of times; compile the fragment according to the condition; set the execution address of a fragment, change the values ​​of labels during the compilation process; use macro definitions with parameters, etc.

Advantages and disadvantages

· minimal amount of redundant code (use of fewer commands and memory accesses). As a result, high speed and smaller size programs;

· large amounts of code, a large number of additional small tasks;

· poor code readability, difficulty of support (debugging, adding features);

· the difficulty of implementing programming paradigms and any other somewhat complex conventions, the complexity of joint development;

· fewer available libraries, their low compatibility;

· direct access to hardware: input/output ports, special processor registers;

· maximum “fit” for the desired platform (use of special instructions, technical features"gland");

· non-portability to other platforms (except binary compatible ones).

In addition to instructions, a program may contain directives: commands that are not translated directly into machine instructions, but control the operation of the compiler. Their set and syntax vary significantly and depend not on the hardware platform, but on the compiler used (generating dialects of languages ​​within the same family of architectures). The set of directives includes:

· definition of data (constants and variables);

· management of program organization in memory and output file parameters;

· setting the compiler operating mode;

· all kinds of abstractions (i.e. elements of high-level languages) - from the design of procedures and functions (to simplify the implementation of the procedural programming paradigm) to conditional constructs and loops (for the structured programming paradigm);

· macros.

Command set

Typical assembly language commands are:

· Data transfer commands (mov, etc.)

· Arithmetic commands(add, sub, imul, etc.)

Logical and bitwise operations (or, and, xor, shr, etc.)

· Program execution control commands (jmp, loop, ret, etc.)

· Interrupt commands (sometimes referred to as control commands): int

· I/O commands to ports (in, out)

Microcontrollers and microcomputers are also characterized by commands that perform checks and transitions based on conditions, for example:

· jne - jump if not equal;

· jge - jump if greater than or equal to .

Assembly language commands (Lecture)

LECTURE PLAN

1. Main groups of operations.

Pentium.

1. Main groups of operations

Microprocessors execute a set of commands that implement the following main groups of operations:

Forwarding operations

Arithmetic operations,

Logical operations

Shift operations

Comparison and testing operations

Bit operations

Program management operations;

Processor control operations.

2. Mnemonic codes of processor commands Pentium

When describing commands, their mnemonic designations (mnemonic codes) are usually used, which are used to specify the command when programming in Assembly language. For different versions of Assembler, the mnemonic codes of some commands may differ. For example, for the command to call a subroutine, the mnemonic code is usedCALL or JSR (“Jump to SubRoutine"). However, the mnemonic codes for most commands for the main types of microprocessors are the same or differ slightly, since they are abbreviations of the corresponding English words that define the operation being performed. Let's look at the command mnemonic codes adopted for processors Pentium.

Forwarding commands. The main team of this group is the teamMOV , which provides data transfer between two registers or between a register and a memory cell. Some microprocessors implement transfers between two memory cells, as well as bulk transfers of the contents of several registers from memory. For example, microprocessors of the 68 family Motorola xxx execute the commandMOVE , providing transfer from one memory cell to another, and the commandMOVEM , which writes to memory or loads from memory the contents of a specified set of registers (up to 16 registers). TeamXCHG mutually exchanges the contents of two processor registers or a register and a memory cell.

Input commands IN and output OUT implement sending data from a processor register to an external device or receiving data from an external device to a register. These commands specify the number of the interface device (input/output port) through which data is transferred. Note that many microprocessors do not have special commands for accessing external devices. In this case, input and output of data in the system is performed using the commandMOV , which specifies the address of the required interface device. Thus, the external device is addressed as a memory cell, and a certain section is allocated in the address space in which the addresses of interface devices (ports) connected to the system are located.

Arithmetic operations commands. The main commands in this group are addition, subtraction, multiplication and division, which have a number of options. Addition commands ADD and subtraction SUB perform the corresponding operations withcpossessed by two registers, a register and a memory location, or using an immediate operand. Teams AD C , S.B. B perform addition and subtraction taking into account the value of the attributeC, set when forming a transfer during the execution of the previous operation. Using these commands, sequential addition of operands is implemented, the number of bits of which exceeds the processor capacity. Team N.E.G. changes the sign of the operand, converting it to two's complement.

Multiplication and division operations can be performed on signed numbers (commandsI MUL, I DIV ) or unsigned (commands MUL, DIV ). One of the operands is always located in a register, the second can be in a register, a memory cell, or be an immediate operand. The result of the operation is located in the register. When multiplying (commandsMUL , IMUL ) the result is double-bit, for which two registers are used. When dividing (commandsDIV , IDIV ) as a dividend, a double-bit operand is used, placed in two registers, and as a result, the quotient and remainder are written to two registers.

Logical Operation Commands . Almost all microprocessors perform logical operations AND, OR, Exclusive OR, which are performed on the same bits of operands using commands AND, OR, X OR . Operations are performed on the contents of two registers, a register and a memory location, or using an immediate operand. Team NOT inverts the value of each bit of the operand.

Shift Commands. Microprocessors perform arithmetic, logical and cyclic shifts of addressed operands by one or more bits. The operand to be shifted can be in a register or memory location, and the number of shift bits is specified by the immediate operand contained in the instruction or determined by the contents of the specified register. The transfer sign is usually involved in the implementation of the shiftCin the status register (S.R. or EFLAGS), which contains the last bit of the operand removed from the register or memory cell.

Comparison and testing commands . Comparison of operands is usually done using the commandCMP , which subtracts operands and sets feature values N, Z, V, C in the status register according to the result obtained. In this case, the result of the subtraction is not saved, and the values ​​of the operands do not change. Subsequent analysis of the obtained feature values ​​allows us to determine the relative value (>,<, =) операндов со знаком или без знака. Использование различных способов адресации позволяет производит сравнение содержимого двух регистров, регистра и ячейки памяти, непосредственно заданного операнда с содержимым регистра или ячейки памяти.

Some microprocessors execute the test command TST , which is a single-operand version of the compare instruction. When this command is executed, the signs are set N, Z according to the sign and value (equal or non-zero) of the addressed operand.

Bit Operation Instructions . These commands set the value of the attributeCin the status register in accordance with the value of the bit being testedbn in the addressed operand. In some microprocessors, based on the result of bit testing, the attribute is setZ. Test bit numbernis specified either by the contents of the register specified in the command, or by the immediate operand.

The commands of this group implement different options for changing the bit being tested. Command BT keeps the value of this bit unchanged.Command B T S post-test sets the value bn=1, and the command B T C - meaning bn=0.Team B T C inverts the value of bit bn after testing it.

Program management operations. To control the program, a large number of commands are used, among which are:

- unconditional control transfer commands;

- conditional jump commands;

- teams for organizing program cycles;

- interrupt commands;

- commands for changing attributes.

Unconditional transfer of control is performed by the commandJMP , which loads into the program counterPCnew content that is the address of the next command to be executed. This address is either directly specified in the commandJMP (direct addressing), or calculated as the sum of the current contentsPCand the offset specified in the command, which is a signed number (relative addressing). BecausePCcontains the address of the next program command, the latter method specifies the jump address, offset relative to the next address by a specified number of bytes. With a positive offset, the transition is made to subsequent commands of the program, with a negative offset - to the previous ones.

A subroutine is also called by unconditionally transferring control using the commandCALL (or JSR ). However, in this case, before loading intoPC new content that specifies the address of the first command of the subroutine, it is necessary to save its current value (the address of the next command) in order to ensure a return to the main program after execution of the subroutine (or to the previous subroutine when nesting subroutines). Conditional jump commands (program branches) load intoPCnew content if certain conditions are met, which are usually set according to the current value of various attributes in the status register. If the condition is not met, then the next program command is executed.

Feature control commands provide writing - reading the contents of the status register in which features are stored, as well as changing the values ​​of individual features. For example, Pentium processors implement the commands LAHF And SAHF , which load the low byte, which contains the signs, from the status register EFLAG to the low byte of the register EAX and padding the low byte EFLAGS from register E AX.. Teams CLC, STC carry out setting the values ​​of the transfer sign CF=0, CF=1, and the command CMC causes the value of this attribute to be inverted. Since attributes determine the flow of program execution during conditional transitions, attribute change commands are usually used to control the program.

Processor control commands . This group includes stop commands, no operation commands, and a number of commands that determine the operating mode of the processor or its individual blocks. TeamHLT stops program execution and puts the processor into a stop state, which is exited when an interrupt or restart signal is received ( Reset). Team NOP (“empty” command), which does not cause any operations to be performed, is used to implement program delays or fill gaps formed in the program.

Special teams CLI, STI prohibit and enable servicing of interrupt requests. In processors Pentium a control bit (flag) is used for thisIF in the register EFLAGS.

Many modern microprocessors issue an identification command that allows the user or other devices to obtain information about the type of processor used in a given system. In processors Pentuim the command for this is CPUID , during which the necessary data about the processor enters the registers EAXEBXECXEDX and can then be read by the user or the operating system.

Depending on the operating modes implemented by the processor and the specified types of data being processed, the set of executed commands can be significantly expanded.

Some processors perform arithmetic operations with binary-decimal numbers or execute special instructions to correct the result when processing such numbers. Many high-performance processors include FPU - number processing unit c "floating point".

A number of modern processors implement group processing of several integers or numbers c “floating point” using one command according to the principle SIMD (“Single Instruction – Multiple Data” ”) - “One command – Lots of data.” Simultaneous execution of operations on multiple operands significantly improves processor performance when working with video and audio data. Such operations are widely used for processing images, audio signals and other applications. To perform these operations, special blocks have been introduced into the processors that implement the corresponding sets of instructions, which in various types of processors ( Pentium, Athlon) got the nameMMX (“ Milti- Media Extension ”) – Multimedia Extension,SSE(“Streaming SIMD Extension”) – Streaming SIMD - extension, “3 DExtension– Three-dimensional Expansion.

A characteristic feature of the company’s processors Intel , starting with the 80286 model, is priority control when accessing memory, which is provided when the processor operates in protected virtual addresses mode - “ Protected Mode ” (protected mode). To implement this mode, special groups of commands are used, which serve to organize memory protection in accordance with the adopted priority access algorithm.

Introduction.

The language in which the source program is written is called entrance language, and the language into which it is translated for execution by the processor is on days off tongue. The process of converting input language into output language is called broadcast. Since processors are capable of executing programs in binary machine language, which is not used for programming, translation of all source programs is necessary. Known two ways broadcasts: compilation and interpretation.

At compilation the source program is first completely translated into an equivalent program in the output language, called object program and then executed. This process is implemented using a special programs, called compiler. A compiler for which the input language is a symbolic form of representing the machine (output) language of binary codes is called assembler.

At interpretations Each line of text in the source program is analyzed (interpreted) and the command specified in it is immediately executed. The implementation of this method is entrusted to interpreter program. Interpretation takes a long time. To increase its efficiency, instead of processing each line, the interpreter first converts all team strings to characters (

). The generated sequence of symbols is used to perform the functions assigned to the original program.

The assembly language discussed below is implemented using compilation.

Features of the language.

Main features of the assembler:

● instead of binary codes, the language uses symbolic names - mnemonics. For example, for the addition command (

) mnemonics are used

Subtractions (

multiplication (

Divisions (

etc. Symbolic names are also used to address memory cells. To program in assembly language, instead of binary codes and addresses, you need to know only symbolic names that the assembler translates into binary codes;

each statement corresponds one machine command(code), i.e. there is a one-to-one correspondence between machine commands and operators in an assembly language program;

● language provides access to all objects and teams. High-level languages ​​do not have this ability. For example, assembly language allows you to check bits of the flag register, and high-level language (for example,

) does not have this ability. Note that systems programming languages ​​(for example, C) often occupy an intermediate position. In terms of accessibility, they are closer to assembly language, but have the syntax of a high-level language;

● assembly language is not a universal language. Each specific group of microprocessors has its own assembler. High-level languages ​​do not have this drawback.

Unlike high-level languages, writing and debugging a program in assembly language takes a lot of time. Despite this, assembly language has received wide use due to the following circumstances:

● a program written in assembly language is significantly smaller in size and runs much faster than a program written in a high-level language. For some applications, these indicators play a primary role, for example, many system programs (including compilers), programs in credit cards, cell phones, device drivers, etc.;

● Some procedures require full access to the hardware, which is usually not possible in a high-level language. This includes interrupts and interrupt handlers in operating systems, as well as device controllers in real-time embedded systems.

In most programs, only a small percentage of the total code is responsible for a large percentage of the program's execution time. Typically, 1% of the program is responsible for 50% of the execution time, and 10% of the program is responsible for 90% of the execution time. Therefore, to write a specific program in real conditions, both assembler and one of the high-level languages ​​are used.

Operator format in assembly language.

An assembly language program is a list of commands (statements, sentences), each of which occupies a separate line and contains four fields: a label field, an operation field, an operand field, and a comment field. Each field has a separate column.

Label field.

Column 1 is allocated for the label field. The label is a symbolic name, or identifier, addresses memory. It is necessary so that you can:

● make a conditional or unconditional transition to the command;

● gain access to the location where the data is stored.

Such statements are provided with a label. To indicate a name, (capital) letters of the English alphabet and numbers are used. The name must have a letter at the beginning and a colon separator at the end. The colon label can be written on a separate line, and the opcode can be written on the next line in column 2, which simplifies the compiler's work. The absence of a colon does not allow distinguishing a label from an operation code if they are located on separate lines.

In some versions of assembly language, colons are placed only after instruction labels, not after data labels, and the length of the label may be limited to 6 or 8 characters.

There should not be identical names in the label field, since the label is associated with command addresses. If during program execution there is no need to call a command or data from memory, then the label field remains empty.

Operation code field.

This field contains the mnemonic code for a command or pseudo-command (see below). The command mnemonic code is chosen by the language developers. In assembly language

mnemonic is selected to load a register from memory

), and to save the contents of the register in memory - a mnemonic

). In assembly languages

for both operations you can use the same name, respectively

If the choice of mnemonic names can be arbitrary, then the need to use two machine instructions is determined by the processor architecture

The mnemonics of registers also depends on the assembler version (Table 5.2.1).

Operand field.

Additional information necessary to perform the operation is located here. In the operand field for jump commands, the address to which the jump needs to be made is indicated, as well as addresses and registers that are operands for the machine command. As an example, we give operands that can be used for 8-bit processors

● numerical data,

presented in different number systems. To indicate the number system used, the constant is followed by one of the Latin letters: B,

Accordingly, binary, octal, hexadecimal, decimal number systems (

You don't have to write it down). If the first digit of a hexadecimal number is A, B, C,

Then an insignificant 0 (zero) is added in front;

● codes of internal microprocessor registers and memory cells

M (sources or receivers of information) in the form of the letters A, B, C,

M or their addresses in any number system (for example, 10B - register address

in binary system);

● identifiers,

for register pairs of aircraft,

The first letters are B,

N; for a pair of accumulator and feature register -

; for the program counter -

;for the stack pointer -

● labels indicating the addresses of the operands or next instructions in the conditional

(if the condition is met) and unconditional transitions. For example, operand M1 in the command

means the need for an unconditional transition to the command, the address of which in the label field is marked with the identifier M1;

● expressions,

which are constructed by linking the data discussed above using arithmetic and logical operators. Note that the method for reserving data space depends on the language version. Assembly language developers for

Define the word), and later introduced an alternative option.

which was in the language for processors from the very beginning

In language version

used

Define a constant).

Processors process operands of different lengths. To define it, assembler developers made different decisions, for example:

II registers of different lengths have different names: EAX - for placing 32-bit operands (type

); AX - for 16-bit (type

and AN - for 8-bit (type

● for processors

Suffixes are added to each operation code: suffix

For type

; suffix ".B" for type

different opcodes are used for operands of different lengths, for example, to load a byte, a halfword (

) and words into a 64-bit register using opcodes

respectively.

Comments field.

This field provides explanations about the actions of the program. Comments do not affect the operation of the program and are intended for humans. They may be needed to modify a program, which without such comments may be completely incomprehensible even to experienced programmers. A comment begins with a symbol and is used to explain and document programs. The starting character of a comment can be:

● semicolon (;) in languages ​​for the company’s processors

● exclamation mark (!) in languages ​​for

Each separate comment line is preceded by a leading character.

Pseudo-commands (directives).

In assembly language there are two main types of commands:

basic instructions that are the equivalent of processor machine code. These commands perform all the processing intended by the program;

pseudo-commands or directives, designed to service the process of translating a program into a code combination language. As an example in table. 5.2.2 shows some pseudo-commands from the assembler

for the family

.

When programming, there are situations when, according to the algorithm, the same chain of commands must be repeated many times. To get out of this situation you can:

● write the required sequence of commands whenever it appears. This approach leads to an increase in the volume of the program;

● arrange this sequence into a procedure (subroutine) and call it if necessary. This output has its drawbacks: each time you have to execute a special procedure call command and a return command, which, if the sequence is short and frequently used, can greatly reduce the speed of the program.

The simplest and most effective way to repeat a chain of commands over and over again is to use macro, which can be represented as a pseudo-command designed to re-translate a group of commands often found in a program.

A macro, or macrocommand, is characterized by three aspects: macrodefinition, macroinversion and macroextension.

Macro definition

This is a designation for a repeatedly repeated sequence of program commands, used for references in the text of the program.

The macro definition has the following structure:

List of expressions; Macro definition

In the given structure of macro-definition, three parts can be distinguished:

● title

macro, including the name

Pseudo-command

and a set of parameters;

● marked with dots body macro;

● team

graduation

macro definitions.

The macro definition parameter set contains a list of all parameters given in the operand field for the selected group of instructions. If these parameters were given earlier in the program, then they do not need to be indicated in the macro definition header.

To reassemble the selected group of commands, an appeal consisting of the name is used

macro commands and list of parameters with other values.

When the assembler encounters a macro definition during the compilation process, it stores it in the macro definition table. At subsequent appearances in the program of the name (

) of a macro, the assembler replaces it with the body of the macro.

Using a macro name as an opcode is called macro-reversal(macro call), and replacing it with the body of the macro - macro expansion.

If a program is represented as a sequence of characters (letters, numbers, spaces, punctuation marks and carriage returns to move to a new line), then macro expansion consists of replacing some chains from this sequence with other chains.

Macro expansion occurs during the assembly process, not during program execution. Methods for manipulating strings of characters are assigned to macro means.

The assembly process is carried out in two passes:

● On the first pass, all macro definitions are preserved, and macro calls are expanded. In this case, the original program is read and converted into a program in which all macro definitions are removed, and each macro call is replaced by the body of the macro;

● the second pass processes the resulting program without macros.

Macros with parameters.

To work with repeated sequences of commands, the parameters of which can take different values, macro definitions are provided:

● with actual parameters that are placed in the operand field of the macro call;

● with formal parameters. During macro expansion, each formal parameter appearing in the body of the macro is replaced by the corresponding actual parameter.

using macros with parameters.

Program 1 contains two similar sequences of commands, differing in that the first one swaps P and

And the second

Program 2 includes a macro with two formal parameters P1 and P2. During macro expansion, each P1 character within the macro body is replaced by the first actual parameter (P,

), and the symbol P2 is replaced by the second actual parameter (

) from program No. 1. In the macro call

program 2 is marked: P,

The first actual parameter,

Second actual parameter.

Program 1

Program 2

MOV EBX,Q MOV EAX,Pl

MOV Q,EAX MOV EBX,P2

MOV P,EBX MOV P2,EAX

Extended capabilities.

Let's look at some advanced language features

If a macro containing a conditional jump command and a label to be jumped to is called two or more times, the label will be duplicated (duplicate label problem), which will cause an error. Therefore, each call assigns a separate label as a parameter (by the programmer). In language

the label is declared local (

) and thanks to advanced capabilities, the assembler automatically generates a different label each time the macro is expanded.

allows you to define macros inside other macros. This advanced feature is very useful in combination with conditional linking of a program. Let's consider

IF WORDSIZE GT 16 M2 MACRO

The M2 macro can be defined in both parts of the statement

However, the definition depends on which processor the program is assembled on: 16-bit or 32-bit. If M1 is not called, then macro M2 will not be defined at all.

Another advanced feature is that macros can call other macros, including themselves - recursive call. In the latter case, to avoid an endless loop, the macro must pass a parameter to itself that changes with each expansion, and also check this parameter and end the recursion when the parameter reaches a certain value.

On the use of macro means in assembler.

When using macros, the assembler must be able to perform two functions: save macro definitions And expand macro challenges.

Saving macro definitions.

All macro names are stored in a table. Each name is accompanied by a pointer to the corresponding macro so that it can be called if necessary. Some assemblers have a separate table for macro names, others have a general table in which, along with macro names, all machine instructions and directives are located.

When encountering a macro during assembly is created:

new table element with the name of the macro, the number of parameters and a pointer to another macro definition table where the body of the macro will be stored;

● list formal parameters.

The body of the macro, which is simply a string of characters, is then read and stored in the macro definition table. Formal parameters found in the body of a loop are marked with a special symbol.

Internal representation of a macro

from the example above for program 2 (p. 244) is:

MOV EAX, MOV EBX, MOV MOV &

where the semicolon is used as the carriage return character, and the ampersand & is used as the formal parameter character.

Extending macro calls.

Whenever a macro definition is encountered during assembly, it is stored in the macro table. When a macro is called, the assembler temporarily stops reading input data from the input device and begins reading the stored macro body. The formal parameters extracted from the macro body are replaced by actual parameters and provided by the call. The ampersand & before parameters allows the assembler to recognize them.

Despite the fact that there are many versions of assembler, the assembly processes have common features and are similar in many ways. The operation of a two-pass assembler is discussed below.

Two-pass assembler.

A program consists of a number of statements. Therefore, it would seem that when assembling, you can use the following sequence of actions:

● translate it into machine language;

● transfer the resulting machine code to a file, and the corresponding part of the listing to another file;

● repeat the listed procedures until the entire program is translated.

However, this approach is not effective. An example is the so-called problem forward link. If the first statement is a jump to statement P, located at the very end of the program, then the assembler cannot translate it. He must first determine the address of operator P, and to do this he must read the entire program. Each complete reading of the source program is called passage. Let's show how you can solve the lookahead link problem using two passes:

on the first pass you should collect and store all symbol definitions (including labels) in the table, and on the second pass, read and assemble each operator. This method is relatively simple, but a second pass through the original program requires additional time spent on I/O operations;

● on the first pass you should convert the program into an intermediate form and save it in a table, and perform the second pass not according to the original program, but according to the table. This method of assembly saves time, since the second pass does not perform I/O operations.

First pass.

First pass goal- build a symbol table. As noted above, another goal of the first pass is to preserve all macro definitions and expand calls as they appear. Consequently, both symbol definition and macro expansion occur in one pass. The symbol can be either label, or meaning, to which a specific name is assigned using the -you directive:

;Value - buffer size

By assigning meaning to symbolic names in the command label field, the assembler essentially specifies the addresses that each command will have during program execution. For this purpose, the assembler stores during the assembly process instruction address counter(

) as a special variable. At the beginning of the first pass, the value of the special variable is set to 0 and incremented after each command processed by the length of that command. As an example in table. 5.2.3 shows a program fragment indicating the length of commands and counter values. On the first pass, tables are generated symbolic names, directives And operation codes, and if necessary literal table. A literal is a constant for which the assembler automatically reserves memory. Let us immediately note that modern processors contain instructions with immediate addresses, so their assemblers do not support literals.

Symbol Name Table

contains one element for each name (Table 5.2.4). Each element of the symbolic name table contains the name itself (or a pointer to it), its numerical value, and sometimes some additional information, which may include:

● the length of the data field associated with the symbol;

● memory reallocation bits (which indicate whether the value of a symbol changes if the program is loaded at a different address than the assembler intended);

● information about whether the symbol can be accessed from outside the procedure.

Symbolic names are labels. They can be specified using operators (for example,

Directive table.

This table lists all the directives, or pseudo-commands, that are encountered when assembling a program.

Operation code table.

For each operation code, the table has separate columns: operation code designation, operand 1, operand 2, hexadecimal value of the operation code, command length and command type (Table 5.2.5). Operation codes are divided into groups depending on the number and type of operands. The command type determines the group number and specifies the procedure that is called to process all commands in that group.

Second pass.

Goal of the second pass- creation of an object program and printing, if necessary, of the assembly protocol; output information necessary for the linker to link procedures that were assembled at different times into one executable file.

In the second pass (as in the first), the lines containing the statements are read and processed one by one. The original operator and the output operator derived from it in hexadecimal object The code can be printed or placed in a buffer for later printing. After resetting the instruction address counter, the next statement is called.

The source program may contain errors, for example:

the given symbol is not defined or is defined more than once;

● the opcode is represented by an invalid name (due to a typo), does not have enough operands, or has too many operands;

● no operator

Some assemblers can detect an undefined symbol and replace it. However, in most cases, when it encounters an error statement, the assembler displays an error message on the screen and attempts to continue the assembly process.

Articles dedicated to assembly language.

General information about assembly language

Symbolic assembly language can largely eliminate the disadvantages of machine language programming.

Its main advantage is that in assembly language all program elements are presented in symbolic form. The conversion of symbolic command names into their binary codes is entrusted to a special program - an assembler, which frees the programmer from labor-intensive work and eliminates the inevitable errors.

Symbolic names entered when programming in assembly language usually reflect the semantics of the program, and the abbreviation of commands reflects their main function. For example: PARAM - parameter, TABLE - table, MASK - mask, ADD - addition, SUB - subtraction, etc. etc. Such names are easy for a programmer to remember.

For programming in assembly language, it is necessary to have more complex tools than for programming in machine language: you need computer systems based on a microcomputer or PC with a set of peripheral devices (alphanumeric keyboard, character display, float drive and printing device), as well as resident or cross-programming systems for the required types of microprocessors. Assembly language allows you to effectively write and debug much more complex programs than machine language (up to 1 - 4 KB).

Assembly languages ​​are machine-oriented, i.e., dependent on the machine language and structure of the corresponding microprocessor, since in them each microprocessor instruction is assigned a specific symbolic name.

Assembly languages ​​provide a significant increase in programmer productivity compared to machine languages ​​and at the same time retain the ability to use all software-available hardware resources of the microprocessor. This enables skilled programmers to write programs that run in less time and occupy less memory than programs written in a high-level language.

In this regard, almost all programs for controlling input/output devices (drivers) are written in assembly language, despite the presence of a fairly large range of high-level languages.

Using assembly language, the programmer can set the following parameters:

mnemonics (symbolic name) of each microprocessor machine language command;

a standard format for lines of a program written in assembly language;

a format for specifying different addressing methods and command options;

format for specifying character constants and integer constants in various number systems;

pseudo-commands that control the process of assembling (translating) a program.

In assembly language, a program is written line by line, that is, one line is allocated for each command.

For microcomputers built on the basis of the most common types of microprocessors, there may be several variants of assembly language, but usually one is widely used in practice - this is the so-called standard assembly language

Programming at the machine instruction level is the minimum level at which programs can be written. The system of machine instructions must be sufficient to implement the required actions by issuing instructions to the computer hardware.

Each machine command consists of two parts:

· operating room - determining “what to do”;

· operand - defining processing objects, “what to do with”.

The microprocessor machine command, written in assembly language, is one line with the following syntactic form:

label command/directive operand(s) ;comments

In this case, the required field in the line is a command or directive.

The label, command/directive, and operands (if any) are separated by at least one space or tab character.

If a command or directive needs to be continued on the next line, the backslash character is used: \.

By default, assembly language does not distinguish between uppercase and lowercase letters when writing commands or directives.

Direct addressing: The effective address is determined directly by the offset field of the machine instruction, which can be 8, 16, or 32 bits in size.

mov eax, sum ; eax = sum

The assembler replaces sum with the corresponding address stored in the data segment (addressed by the ds register by default) and places the value stored at sum in the eax register.

Indirect addressing in turn has the following types:

· indirect basic (register) addressing;

· indirect basic (register) addressing with offset;

· indirect index addressing;

· indirect basic index addressing.

Indirect basic (register) addressing. With this addressing, the effective address of the operand can be located in any of the general purpose registers, except sp/esp and bp/ebp (these are specific registers for working with the stack segment). Syntactically in a command, this addressing mode is expressed by enclosing the register name in square brackets.

mov eax, ; eax = *esi; *esi value at address esi