А word assembly language - Word и Excel - помощь в работе с программами

I’ve done some research.
A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don’t understand is what’s the point of having a byte? Why not say 8 bits?

I asked a prof this question and he said most machines these days are byte-addressable, but what would that make a word?

Peter Cordes

317k45 gold badges583 silver badges818 bronze badges

asked Oct 13, 2011 at 6:17

Byte: Today, a byte is almost always 8 bit. However, that wasn’t always the case and there’s no «standard» or something that dictates this. Since 8 bits is a convenient number to work with it became the de facto standard.

Word: The natural size with which a processor is handling data (the register size). The most common word sizes encountered today are 8, 16, 32 and 64 bits, but other sizes are possible. For examples, there were a few 36 bit machines, or even 12 bit machines.

The byte is the smallest addressable unit for a CPU. If you want to set/clear single bits, you first need to fetch the corresponding byte from memory, mess with the bits and then write the byte back to memory.

By contrast, one definition for word is the biggest chunk of bits with which a processor can do processing (like addition and subtraction) at a time – typically the width of an integer register. That definition is a bit fuzzy, as some processors might have different register sizes for different tasks (integer vs. floating point processing for example) or are able to access fractions of a register. The word size is the maximum register size that the majority of operations work with.

There are also a few processors which have a different pointer size: for example, the 8086 is a 16-bit processor which means its registers are 16 bit wide. But its pointers (addresses) are 20 bit wide and were calculated by combining two 16 bit registers in a certain way.

In some manuals and APIs, the term «word» may be «stuck» on a former legacy size and might differ from what’s the actual, current word size of a processor when the platform evolved to support larger register sizes. For example, the Intel and AMD x86 manuals still use «word» to mean 16 bits with DWORD (double-word, 32 bit) and QWORD (quad-word, 64 bit) as larger sizes. This is then reflected in some APIs, like Microsoft’s WinAPI.

answered Oct 13, 2011 at 6:51

DarkDustDarkDust

90.4k19 gold badges188 silver badges223 bronze badges

What I don’t understand is what’s the point of having a byte? Why not say 8 bits?

Apart from the technical point that a byte isn’t necessarily 8 bits, the reasons for having a term is simple human nature:

economy of effort (aka laziness) — it is easier to say «byte» rather than «eight bits»
tribalism — groups of people like to use jargon / a private language to set them apart from others.

Just go with the flow. You are not going to change 50+ years of accumulated IT terminology and cultural baggage by complaining about it.

FWIW — the correct term to use when you mean «8 bits independent of the hardware architecture» is «octet».

answered Oct 13, 2011 at 6:47

Stephen CStephen C

692k94 gold badges792 silver badges1205 bronze badges

BYTE

I am trying to answer this question from C++ perspective.

The C++ standard defines ‘byte’ as “Addressable unit of data large enough to hold any member of the basic character set of the execution environment.”

What this means is that the byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters.
In the United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits.
Hence it is guaranteed that a byte will have at least 8 bits.

In other words, a byte is the amount of memory required to store a single character.

If you want to verify ‘number of bits’ in your C++ implementation, check the file ‘limits.h’. It should have an entry like below.

#define CHAR_BIT      8         /* number of bits in a char */

WORD

A Word is defined as specific number of bits which can be processed together (i.e. in one attempt) by the machine/system.
Alternatively, we can say that Word defines the amount of data that can be transferred between CPU and RAM in a single operation.

The hardware registers in a computer machine are word sized.
The Word size also defines the largest possible memory address (each memory address points to a byte sized memory).

Note – In C++ programs, the memory addresses points to a byte of memory and not to a word.

answered May 29, 2012 at 18:12

It seems all the answers assume high level languages and mainly C/C++.

But the question is tagged «assembly» and in all assemblers I know (for 8bit, 16bit, 32bit and 64bit CPUs), the definitions are much more clear:

byte  = 8 bits 
word  = 2 bytes
dword = 4 bytes = 2Words (dword means "double word")
qword = 8 bytes = 2Dwords = 4Words ("quadruple word")

answered Feb 3, 2013 at 18:38

johnfoundjohnfound

6,8014 gold badges30 silver badges58 bronze badges

Why not say 8 bits?

Because not all machines have 8-bit bytes. Since you tagged this C, look up CHAR_BIT in limits.h.

answered Oct 13, 2011 at 6:19

cnicutarcnicutar

177k25 gold badges361 silver badges391 bronze badges

A word is the size of the registers in the processor. This means processor instructions like, add, mul, etc are on word-sized inputs.

But most modern architectures have memory that is addressable in 8-bit chunks, so it is convenient to use the word «byte».

answered Oct 13, 2011 at 6:21

VoidStarVoidStar

5,1611 gold badge30 silver badges44 bronze badges

In this context, a word is the unit that a machine uses when working with memory. For example, on a 32 bit machine, the word is 32 bits long and on a 64 bit is 64 bits long. The word size determines the address space.

In programming (C/C++), the word is typically represented by the int_ptr type, which has the same length as a pointer, this way abstracting these details.

Some APIs might confuse you though, such as Win32 API, because it has types such as WORD (16 bits) and DWORD (32 bits). The reason is that the API was initially targeting 16 bit machines, then was ported to 32 bit machines, then to 64 bit machines. To store a pointer, you can use INT_PTR. More details here and here.

answered Oct 13, 2011 at 6:39

npclaudiunpclaudiu

2,3911 gold badge16 silver badges18 bronze badges

The exact length of a word varies. What I don’t understand is what’s the point of having a byte? Why not say 8 bits?

Even though the length of a word varies, on all modern machines and even all older architectures that I’m familiar with, the word size is still a multiple of the byte size. So there is no particular downside to using «byte» over «8 bits» in relation to the variable word size.

Beyond that, here are some reasons to use byte (or octet¹) over «8 bits»:

Larger units are just convenient to avoid very large or very small numbers: you might as well ask «why say 3 nanoseconds when you could say 0.000000003 seconds» or «why say 1 kilogram when you could say 1,000 grams», etc.
Beyond the convenience, the unit of a byte is somehow as fundamental as 1 bit since many operations typically work not at the byte level, but at the byte level: addressing memory, allocating dynamic storage, reading from a file or socket, etc.
Even if you were to adopt «8 bit» as a type of unit, so you could say «two 8-bits» instead of «two bytes», it would be often be very confusing to have your new unit start with a number. For example, if someone said «one-hundred 8-bits» it could easily be interpreted as 108 bits, rather than 100 bits.

¹ Although I’ll consider a byte to be 8 bits for this answer, this isn’t universally true: on older machines a byte may have a different size (such as 6 bits. Octet always means 8 bits, regardless of the machine (so this term is often used in defining network protocols). In modern usage, byte is overwhelmingly used as synonymous with 8 bits.

answered Feb 10, 2018 at 22:17

BeeOnRopeBeeOnRope

59k15 gold badges200 silver badges371 bronze badges

Whatever the terminology present in datasheets and compilers, a ‘Byte’ is eight bits. Let’s not try to confuse enquirers and generalities with the more obscure exceptions, particularly as the word ‘Byte’ comes from the expression «By Eight». I’ve worked in the semiconductor/electronics industry for over thirty years and not once known ‘Byte’ used to express anything more than eight bits.

answered Feb 3, 2013 at 18:04

A group of 8 bits is called a byte ( with the exception where it is not for certain architectures )

A word is a fixed sized group of bits that are handled as a unit by the instruction set and/or hardware of the processor. That means the size of a general purpose register ( which is generally more than a byte ) is a word

In the C, a word is most often called an integer => int

answered Oct 13, 2011 at 6:23

tolitiustolitius

22k6 gold badges69 silver badges81 bronze badges

Reference:https://www.os-book.com/OS9/slide-dir/PPT-dir/ch1.ppt

The basic unit of computer storage is the bit. A bit can contain one of two
values, 0 and 1. All other storage in a computer is based on collections of bits.
Given enough bits, it is amazing how many things a computer can represent:
numbers, letters, images, movies, sounds, documents, and programs, to name
a few. A byte is 8 bits, and on most computers it is the smallest convenient
chunk of storage. For example, most computers don’t have an instruction to
move a bit but do have one to move a byte. A less common term is word,
which is a given computer architecture’s native unit of data. A word is made up
of one or more bytes. For example, a computer that has 64-bit registers and 64-
bit memory addressing typically has 64-bit (8-byte) words. A computer executes
many operations in its native word size rather than a byte at a time.
Computer storage, along with most computer throughput, is generally measured
and manipulated in bytes and collections of bytes.
A kilobyte, or KB, is 1,024 bytes
a megabyte, or MB, is 1,024 2 bytes
a gigabyte, or GB, is 1,024 3 bytes
a terabyte, or TB, is 1,024 4 bytes
a petabyte, or PB, is 1,024 5 bytes
Computer manufacturers often round off these numbers and say that a
megabyte is 1 million bytes and a gigabyte is 1 billion bytes. Networking
measurements are an exception to this general rule; they are given in bits
(because networks move data a bit at a time)

answered Apr 13, 2020 at 9:00

LiLiLiLi

3833 silver badges11 bronze badges

If a machine is byte-addressable and a word is the smallest unit that can be addressed on memory then I guess a word would be a byte!

answered Oct 13, 2011 at 6:19

K-balloK-ballo

80k20 gold badges159 silver badges169 bronze badges

The terms of BYTE and WORD are relative to the size of the processor that is being referred to. The most common processors are/were 8 bit, 16 bit, 32 bit or 64 bit. These are the WORD lengths of the processor. Actually half of a WORD is a BYTE, whatever the numerical length is. Ready for this, half of a BYTE is a NIBBLE.

answered Feb 9, 2018 at 17:59

In fact, in common usage, word has become synonymous with 16 bits, much like byte has with 8 bits. Can get a little confusing since the «word size» on a 32-bit CPU is 32-bits, but when talking about a word of data, one would mean 16-bits. Microcontrollers with a 32-bit word size have taken to calling their instructions «longs» (supposedly to try and avoid the word/doubleword confusion).

answered Oct 13, 2011 at 12:52

Brian KnoblauchBrian Knoblauch

20.5k15 gold badges61 silver badges92 bronze badges

Источник

Assembly language

Typical secondary output from an assembler—showing original assembly language (right) for the Motorola MC6800 and the assembled form

Paradigm	Imperative, unstructured, often metaprogramming (through macros), certain assemblers are object-oriented and/or structured
First appeared	1947; 76 years ago
Typing discipline	None
Filename extensions	`.asm`, `.s`, `.inc`, `.wla`, `.SRC` and several others depending on the assembler

In computer programming, assembly language (alternatively assembler language^[1] or symbolic machine code),^[2]^[3]^[4] often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture’s machine code instructions.^[5] Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives,^[6] symbolic labels of, e.g., memory locations, registers, and macros^[7]^[1] are generally also supported.

The first assembly code in which a language is used to represent machine code instructions is found in Kathleen and Andrew Donald Booth’s 1947 work, Coding for A.R.C..^[8] Assembly code is converted into executable machine code by a utility program referred to as an assembler. The term «assembler» is generally attributed to Wilkes, Wheeler and Gill in their 1951 book The Preparation of Programs for an Electronic Digital Computer,^[9] who, however, used the term to mean «a program that assembles another program consisting of several sections into a single program».^[10] The conversion process is referred to as assembly, as in assembling the source code. The computational step when an assembler is processing a program is called assembly time.

Because assembly depends on the machine code instructions, each assembly language^{[nb 1]} is specific to a particular computer architecture.^[11]^[12]^[13]

Sometimes there is more than one assembler for the same architecture, and sometimes an assembler is specific to an operating system or to particular operating systems. Most assembly languages do not provide specific syntax for operating system calls, and most assembly languages can be used universally with any operating system,^{[nb 2]} as the language provides access to all the real capabilities of the processor, upon which all system call mechanisms ultimately rest. In contrast to assembly languages, most high-level programming languages are generally portable across multiple architectures but require interpreting or compiling, much more complicated tasks than assembling.

In the first decades of computing, it was commonplace for both systems programming and application programming to take place entirely in assembly language. While still irreplaceable for some purposes, the majority of programming is now conducted in higher-level interpreted and compiled languages. In «No Silver Bullet», Fred Brooks summarised the effects of the switch away from assembly language programming: «Surely the most powerful stroke for software productivity, reliability, and simplicity has been the progressive use of high-level languages for programming. Most observers credit that development with at least a factor of five in productivity, and with concomitant gains in reliability, simplicity, and comprehensibility.»^[14]

Today, it is typical to use small amounts of assembly language code within larger systems implemented in a higher-level language, for performance reasons or to interact directly with hardware in ways unsupported by the higher-level language. For instance, just under 2% of version 4.9 of the Linux kernel source code is written in assembly; more than 97% is written in C.^[15]

Assembly language syntax[edit]

Assembly language uses a mnemonic to represent, e.g., each low-level machine instruction or opcode, each directive, typically also each architectural register, flag, etc. Some of the mnemonics may be built in and some user defined. Many operations require one or more operands in order to form a complete instruction. Most assemblers permit named constants, registers, and labels for program and memory locations, and can calculate expressions for operands. Thus, programmers are freed from tedious repetitive calculations and assembler programs are much more readable than machine code. Depending on the architecture, these elements may also be combined for specific instructions or addressing modes using offsets or other data as well as fixed addresses. Many assemblers offer additional mechanisms to facilitate program development, to control the assembly process, and to aid debugging.

Some are column oriented, with specific fields in specific columns; this was very common for machines using punched cards in the 1950s and early 1960s. Some assemblers have free-form syntax, with fields separated by delimiters, e.g., punctuation, white space. Some assemblers are hybrid, with, e.g., labels, in a specific column and other fields separated by delimiters; this became more common than column oriented syntax in the 1960s.

IBM System/360[edit]

All of the IBM assemblers for System/360, by default, have a label in column 1, fields separated by delimiters in columns 2-71, a continuation indicator in column 72 and a sequence number in columns 73-80. The delimiter for label, opcode, operands and comments is spaces, while individual operands are separated by commas and parentheses.

Terminology[edit]

A macro assembler is an assembler that includes a macroinstruction facility so that (parameterized) assembly language text can be represented by a name, and that name can be used to insert the expanded text into other code.
- Open code refers to any assembler input outside of a macro definition.
A cross assembler (see also cross compiler) is an assembler that is run on a computer or operating system (the host system) of a different type from the system on which the resulting code is to run (the target system). Cross-assembling facilitates the development of programs for systems that do not have the resources to support software development, such as an embedded system or a microcontroller. In such a case, the resulting object code must be transferred to the target system, via read-only memory (ROM, EPROM, etc.), a programmer (when the read-only memory is integrated in the device, as in microcontrollers), or a data link using either an exact bit-by-bit copy of the object code or a text-based representation of that code (such as Intel hex or Motorola S-record).
A high-level assembler is a program that provides language abstractions more often associated with high-level languages, such as advanced control structures (IF/THEN/ELSE, DO CASE, etc.) and high-level abstract data types, including structures/records, unions, classes, and sets.
A microassembler is a program that helps prepare a microprogram, called firmware, to control the low level operation of a computer.
A meta-assembler is «a program that accepts the syntactic and semantic description of an assembly language, and generates an assembler for that language»,^[16] or that accepts an assembler source file along with such a description and assembles the source file in accordance with that description. «Meta-Symbol» assemblers for the SDS 9 Series and SDS Sigma series of computers are meta-assemblers.^[17]^{[nb 3]} Sperry Univac also provided a Meta-Assembler for the UNIVAC 1100/2200 series.^[18]
inline assembler (or embedded assembler) is assembler code contained within a high-level language program.^[19] This is most often used in systems programs which need direct access to the hardware.

Key concepts[edit]

Assembler[edit]

An assembler program creates object code by translating combinations of mnemonics and syntax for operations and addressing modes into their numerical equivalents. This representation typically includes an operation code («opcode») as well as other control bits and data. The assembler also calculates constant expressions and resolves symbolic names for memory locations and other entities.^[20] The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitution – e.g., to generate common short sequences of instructions as inline, instead of called subroutines.

Some assemblers may also be able to perform some simple types of instruction set-specific optimizations. One concrete example of this may be the ubiquitous x86 assemblers from various vendors. Called jump-sizing,^[20] most of them are able to perform jump-instruction replacements (long jumps replaced by short or relative jumps) in any number of passes, on request. Others may even do simple rearrangement or insertion of instructions, such as some assemblers for RISC architectures that can help optimize a sensible instruction scheduling to exploit the CPU pipeline as efficiently as possible.^[21]

Assemblers have been available since the 1950s, as the first step above machine language and before high-level programming languages such as Fortran, Algol, COBOL and Lisp. There have also been several classes of translators and semi-automatic code generators with properties similar to both assembly and high-level languages, with Speedcode as perhaps one of the better-known examples.

There may be several assemblers with different syntax for a particular CPU or instruction set architecture. For instance, an instruction to add memory data to a register in a x86-family processor might be add eax,[ebx], in original Intel syntax, whereas this would be written addl (%ebx),%eax in the AT&T syntax used by the GNU Assembler. Despite different appearances, different syntactic forms generally generate the same numeric machine code. A single assembler may also have different modes in order to support variations in syntactic forms as well as their exact semantic interpretations (such as FASM-syntax, TASM-syntax, ideal mode, etc., in the special case of x86 assembly programming).

Number of passes[edit]

There are two types of assemblers based on how many passes through the source are needed (how many times the assembler reads the source) to produce the object file.

One-pass assemblers process the source code once. For symbols used before they are defined, the assembler will emit «errata» after the eventual definition, telling the linker or the loader to patch the locations where the as yet undefined symbols had been used.
Multi-pass assemblers create a table with all symbols and their values in the first passes, then use the table in later passes to generate code.

In both cases, the assembler must be able to determine the size of each instruction on the initial passes in order to calculate the addresses of subsequent symbols. This means that if the size of an operation referring to an operand defined later depends on the type or distance of the operand, the assembler will make a pessimistic estimate when first encountering the operation, and if necessary, pad it with one or more
«no-operation» instructions in a later pass or the errata. In an assembler with peephole optimization, addresses may be recalculated between passes to allow replacing pessimistic code with code tailored to the exact distance from the target.

The original reason for the use of one-pass assemblers was memory size and speed of assembly – often a second pass would require storing the symbol table in memory (to handle forward references), rewinding and rereading the program source on tape, or rereading a deck of cards or punched paper tape. Later computers with much larger memories (especially disc storage), had the space to perform all necessary processing without such re-reading. The advantage of the multi-pass assembler is that the absence of errata makes the linking process (or the program load if the assembler directly produces executable code) faster.^[22]

Example: in the following code snippet, a one-pass assembler would be able to determine the address of the backward reference BKWD when assembling statement S2, but would not be able to determine the address of the forward reference FWD when assembling the branch statement S1; indeed, FWD may be undefined. A two-pass assembler would determine both addresses in pass 1, so they would be known when generating code in pass 2.

S1   B    FWD
  ...
FWD   EQU *
  ...
BKWD  EQU *
  ...
S2    B   BKWD

High-level assemblers[edit]

More sophisticated high-level assemblers provide language abstractions such as:

High-level procedure/function declarations and invocations
Advanced control structures (IF/THEN/ELSE, SWITCH)
High-level abstract data types, including structures/records, unions, classes, and sets
Sophisticated macro processing (although available on ordinary assemblers since the late 1950s for, e.g., the IBM 700 series and IBM 7000 series, and since the 1960s for IBM System/360 (S/360), amongst other machines)
Object-oriented programming features such as classes, objects, abstraction, polymorphism, and inheritance^[23]

See Language design below for more details.

Assembly language[edit]

A program written in assembly language consists of a series of mnemonic processor instructions and meta-statements (known variously as declarative operations, directives, pseudo-instructions, pseudo-operations and pseudo-ops), comments and data. Assembly language instructions usually consist of an opcode mnemonic followed by an operand, which might be a list of data, arguments or parameters.^[24] Some instructions may be «implied,» which means the data upon which the instruction operates is implicitly defined by the instruction itself—such an instruction does not take an operand. The resulting statement is translated by an assembler into machine language instructions that can be loaded into memory and executed.

For example, the instruction below tells an x86/IA-32 processor to move an immediate 8-bit value into a register. The binary code for this instruction is 10110 followed by a 3-bit identifier for which register to use. The identifier for the AL register is 000, so the following machine code loads the AL register with the data 01100001.^[24]

10110000 01100001

This binary computer code can be made more human-readable by expressing it in hexadecimal as follows.

B0 61

Here, B0 means ‘Move a copy of the following value into AL, and 61 is a hexadecimal representation of the value 01100001, which is 97 in decimal. Assembly language for the 8086 family provides the mnemonic MOV (an abbreviation of move) for instructions such as this, so the machine code above can be written as follows in assembly language, complete with an explanatory comment if required, after the semicolon. This is much easier to read and to remember.

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)

In some assembly languages (including this one) the same mnemonic, such as MOV, may be used for a family of related instructions for loading, copying and moving data, whether these are immediate values, values in registers, or memory locations pointed to by values in registers or by immediate (a.k.a. direct) addresses. Other assemblers may use separate opcode mnemonics such as L for «move memory to register», ST for «move register to memory», LR for «move register to register», MVI for «move immediate operand to memory», etc.

If the same mnemonic is used for different instructions, that means that the mnemonic corresponds to several different binary instruction codes, excluding data (e.g. the 61h in this example), depending on the operands that follow the mnemonic. For example, for the x86/IA-32 CPUs, the Intel assembly language syntax MOV AL, AH represents an instruction that moves the contents of register AH into register AL. The^{[nb 4]} hexadecimal form of this instruction is:

88 E0

The first byte, 88h, identifies a move between a byte-sized register and either another register or memory, and the second byte, E0h, is encoded (with three bit-fields) to specify that both operands are registers, the source is AH, and the destination is AL.

In a case like this where the same mnemonic can represent more than one binary instruction, the assembler determines which instruction to generate by examining the operands. In the first example, the operand 61h is a valid hexadecimal numeric constant and is not a valid register name, so only the B0 instruction can be applicable. In the second example, the operand AH is a valid register name and not a valid numeric constant (hexadecimal, decimal, octal, or binary), so only the 88 instruction can be applicable.

Assembly languages are always designed so that this sort of unambiguousness is universally enforced by their syntax. For example, in the Intel x86 assembly language, a hexadecimal constant must start with a numeral digit, so that the hexadecimal number ‘A’ (equal to decimal ten) would be written as 0Ah or 0AH, not AH, specifically so that it cannot appear to be the name of register AH. (The same rule also prevents ambiguity with the names of registers BH, CH, and DH, as well as with any user-defined symbol that ends with the letter H and otherwise contains only characters that are hexadecimal digits, such as the word «BEACH».)

Returning to the original example, while the x86 opcode 10110000 (B0) copies an 8-bit value into the AL register, 10110001 (B1) moves it into CL and 10110010 (B2) does so into DL. Assembly language examples for these follow.^[24]

MOV AL, 1h        ; Load AL with immediate value 1
MOV CL, 2h        ; Load CL with immediate value 2
MOV DL, 3h        ; Load DL with immediate value 3

The syntax of MOV can also be more complex as the following examples show.^[25]

MOV EAX, [EBX]	  ; Move the 4 bytes in memory at the address contained in EBX into EAX
MOV [ESI+EAX], CL ; Move the contents of CL into the byte at address ESI+EAX
MOV DS, DX        ; Move the contents of DX into segment register DS

In each case, the MOV mnemonic is translated directly into one of the opcodes 88-8C, 8E, A0-A3, B0-BF, C6 or C7 by an assembler, and the programmer normally does not have to know or remember which.^[24]

Transforming assembly language into machine code is the job of an assembler, and the reverse can at least partially be achieved by a disassembler. Unlike high-level languages, there is a one-to-one correspondence between many simple assembly statements and machine language instructions. However, in some cases, an assembler may provide pseudoinstructions (essentially macros) which expand into several machine language instructions to provide commonly needed functionality. For example, for a machine that lacks a «branch if greater or equal» instruction, an assembler may provide a pseudoinstruction that expands to the machine’s «set if less than» and «branch if zero (on the result of the set instruction)». Most full-featured assemblers also provide a rich macro language (discussed below) which is used by vendors and programmers to generate more complex code and data sequences. Since the information about pseudoinstructions and macros defined in the assembler environment is not present in the object program, a disassembler cannot reconstruct the macro and pseudoinstruction invocations but can only disassemble the actual machine instructions that the assembler generated from those abstract assembly-language entities. Likewise, since comments in the assembly language source file are ignored by the assembler and have no effect on the object code it generates, a disassembler is always completely unable to recover source comments.

Each computer architecture has its own machine language. Computers differ in the number and type of operations they support, in the different sizes and numbers of registers, and in the representations of data in storage. While most general-purpose computers are able to carry out essentially the same functionality, the ways they do so differ; the corresponding assembly languages reflect these differences.

Multiple sets of mnemonics or assembly-language syntax may exist for a single instruction set, typically instantiated in different assembler programs. In these cases, the most popular one is usually that supplied by the CPU manufacturer and used in its documentation.

Two examples of CPUs that have two different sets of mnemonics are the Intel 8080 family and the Intel 8086/8088. Because Intel claimed copyright on its assembly language mnemonics (on each page of their documentation published in the 1970s and early 1980s, at least), some companies that independently produced CPUs compatible with Intel instruction sets invented their own mnemonics. The Zilog Z80 CPU, an enhancement of the Intel 8080A, supports all the 8080A instructions plus many more; Zilog invented an entirely new assembly language, not only for the new instructions but also for all of the 8080A instructions. For example, where Intel uses the mnemonics MOV, MVI, LDA, STA, LXI, LDAX, STAX, LHLD, and SHLD for various data transfer instructions, the Z80 assembly language uses the mnemonic LD for all of them. A similar case is the NEC V20 and V30 CPUs, enhanced copies of the Intel 8086 and 8088, respectively. Like Zilog with the Z80, NEC invented new mnemonics for all of the 8086 and 8088 instructions, to avoid accusations of infringement of Intel’s copyright. (It is questionable whether such copyrights can be valid, and later CPU companies such as AMD^{[nb 5]} and Cyrix republished Intel’s x86/IA-32 instruction mnemonics exactly with neither permission nor legal penalty.) It is doubtful whether in practice many people who programmed the V20 and V30 actually wrote in NEC’s assembly language rather than Intel’s; since any two assembly languages for the same instruction set architecture are isomorphic (somewhat like English and Pig Latin), there is no requirement to use a manufacturer’s own published assembly language with that manufacturer’s products.

Language design[edit]

Basic elements[edit]

There is a large degree of diversity in the way the authors of assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of 3 types of instruction statements that are used to define program operations:

Opcode mnemonics
Data definitions
Assembly directives

Opcode mnemonics and extended mnemonics[edit]

Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value or a pair of values. Operands can be immediate (value coded in the instruction itself), registers specified in the instruction or implied, or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand, e.g., the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP («NO OPeration» – do nothing for one step) for BC with a mask of 0.

Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU’s do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks. For the SPARC architecture, these are known as synthetic instructions.^[26]

Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b.^[27] These are sometimes known as pseudo-opcodes.

Mnemonics are arbitrary symbols; in 1985 the IEEE published Standard 694 for a uniform set of mnemonics to be used by all assemblers. The standard has since been withdrawn.

Data directives[edit]

There are instructions used to define data elements to hold data and variables. They define the type of data, the length and the alignment of data. These instructions can also define whether the data is available to outside programs (programs assembled separately) or only to the program in which the data section is defined. Some assemblers classify these as pseudo-ops.

Assembly directives[edit]

Assembly directives, also called pseudo-opcodes, pseudo-operations or pseudo-ops, are commands given to an assembler «directing it to perform operations other than assembling instructions».^[20] Directives affect how the assembler operates and «may affect the object code, the symbol table, the listing file, and the values of internal assembler parameters». Sometimes the term pseudo-opcode is reserved for directives that generate object code, such as those that generate data.^[28]

The names of pseudo-ops often start with a dot to distinguish them from machine instructions. Pseudo-ops can make the assembly of the program dependent on parameters input by a programmer, so that one program can be assembled in different ways, perhaps for different applications. Or, a pseudo-op can be used to manipulate presentation of a program to make it easier to read and maintain. Another common use of pseudo-ops is to reserve storage areas for run-time data and optionally initialize their contents to known values.

Symbolic assemblers let programmers associate arbitrary names (labels or symbols) with memory locations and various constants. Usually, every constant and variable is given a name so instructions can reference those locations by name, thus promoting self-documenting code. In executable code, the name of each subroutine is associated with its entry point, so any calls to a subroutine can use its name. Inside subroutines, GOTO destinations are given labels. Some assemblers support local symbols which are often lexically distinct from normal symbols (e.g., the use of «10$» as a GOTO destination).

Some assemblers, such as NASM, provide flexible symbol management, letting programmers manage different namespaces, automatically calculate offsets within data structures, and assign labels that refer to literal values or the result of simple computations performed by the assembler. Labels can also be used to initialize constants and variables with relocatable addresses.

Assembly languages, like most other computer languages, allow comments to be added to program source code that will be ignored during assembly. Judicious commenting is essential in assembly language programs, as the meaning and purpose of a sequence of binary machine instructions can be difficult to determine. The «raw» (uncommented) assembly language generated by compilers or disassemblers is quite difficult to read when changes must be made.

Macros[edit]

Many assemblers support predefined macros, and others support programmer-defined (and repeatedly re-definable) macros involving sequences of text lines in which variables and constants are embedded. The macro definition is most commonly^{[nb 6]} a mixture of assembler statements, e.g., directives, symbolic machine instructions, and templates for assembler statements. This sequence of text lines may include opcodes or directives. Once a macro has been defined its name may be used in place of a mnemonic. When the assembler processes such a statement, it replaces the statement with the text lines associated with that macro, then processes them as if they existed in the source code file (including, in some assemblers, expansion of any macros existing in the replacement text). Macros in this sense date to IBM autocoders of the 1950s.^[29]^{[nb 7]}

Macro assemblers typically have directives to, e.g., define macros, define variables, set variables to the result of an arithmetic, logical or string expression, iterate, conditionally generate code. Some of those directives may be restricted to use within a macro definition, e.g., MEXIT in HLASM, while others may be permitted within open code (outside macro definitions), e.g., AIF and COPY in HLASM.

In assembly language, the term «macro» represents a more comprehensive concept than it does in some other contexts, such as the pre-processor in the C programming language, where its #define directive typically is used to create short single line macros. Assembler macro instructions, like macros in PL/I and some other languages, can be lengthy «programs» by themselves, executed by interpretation by the assembler during assembly.

Since macros can have ‘short’ names but expand to several or indeed many lines of code, they can be used to make assembly language programs appear to be far shorter, requiring fewer lines of source code, as with higher level languages. They can also be used to add higher levels of structure to assembly programs, optionally introduce embedded debugging code via parameters and other similar features.

Macro assemblers often allow macros to take parameters. Some assemblers include quite sophisticated macro languages, incorporating such high-level language elements as optional parameters, symbolic variables, conditionals, string manipulation, and arithmetic operations, all usable during the execution of a given macro, and allowing macros to save context or exchange information. Thus a macro might generate numerous assembly language instructions or data definitions, based on the macro arguments. This could be used to generate record-style data structures or «unrolled» loops, for example, or could generate entire algorithms based on complex parameters. For instance, a «sort» macro could accept the specification of a complex sort key and generate code crafted for that specific key, not needing the run-time tests that would be required for a general procedure interpreting the specification. An organization using assembly language that has been heavily extended using such a macro suite can be considered to be working in a higher-level language since such programmers are not working with a computer’s lowest-level conceptual elements. Underlining this point, macros were used to implement an early virtual machine in SNOBOL4 (1967), which was written in the SNOBOL Implementation Language (SIL), an assembly language for a virtual machine. The target machine would translate this to its native code using a macro assembler.^[30] This allowed a high degree of portability for the time.

Macros were used to customize large scale software systems for specific customers in the mainframe era and were also used by customer personnel to satisfy their employers’ needs by making specific versions of manufacturer operating systems. This was done, for example, by systems programmers working with IBM’s Conversational Monitor System / Virtual Machine (VM/CMS) and with IBM’s «real time transaction processing» add-ons, Customer Information Control System CICS, and ACP/TPF, the airline/financial system that began in the 1970s and still runs many large computer reservation systems (CRS) and credit card systems today.

It is also possible to use solely the macro processing abilities of an assembler to generate code written in completely different languages, for example, to generate a version of a program in COBOL using a pure macro assembler program containing lines of COBOL code inside assembly time operators instructing the assembler to generate arbitrary code. IBM OS/360 uses macros to perform system generation. The user specifies options by coding a series of assembler macros. Assembling these macros generates a job stream to build the system, including job control language and utility control statements.

This is because, as was realized in the 1960s, the concept of «macro processing» is independent of the concept of «assembly», the former being in modern terms more word processing, text processing, than generating object code. The concept of macro processing appeared, and appears, in the C programming language, which supports «preprocessor instructions» to set variables, and make conditional tests on their values. Unlike certain previous macro processors inside assemblers, the C preprocessor is not Turing-complete because it lacks the ability to either loop or «go to», the latter allowing programs to loop.

Despite the power of macro processing, it fell into disuse in many high level languages (major exceptions being C, C++ and PL/I) while remaining a perennial for assemblers.

Macro parameter substitution is strictly by name: at macro processing time, the value of a parameter is textually substituted for its name. The most famous class of bugs resulting was the use of a parameter that itself was an expression and not a simple name when the macro writer expected a name. In the macro:

foo: macro a
load a*b

the intention was that the caller would provide the name of a variable, and the «global» variable or constant b would be used to multiply «a». If foo is called with the parameter a-c, the macro expansion of load a-c*b occurs. To avoid any possible ambiguity, users of macro processors can parenthesize formal parameters inside macro definitions, or callers can parenthesize the input parameters.^[31]

Support for structured programming[edit]

Packages of macros have been written providing structured programming elements to encode execution flow. The earliest example of this approach was in the Concept-14 macro set,^[32] originally proposed by Harlan Mills (March 1970), and implemented by Marvin Kessler at IBM’s Federal Systems Division, which provided IF/ELSE/ENDIF and similar control flow blocks for OS/360 assembler programs. This was a way to reduce or eliminate the use of GOTO operations in assembly code, one of the main factors causing spaghetti code in assembly language. This approach was widely accepted in the early 1980s (the latter days of large-scale assembly language use). IBM’s High Level Assembler Toolkit^[33] includes such a macro package.

A curious design was A-Natural, a «stream-oriented» assembler for 8080/Z80, processors^[34] from Whitesmiths Ltd. (developers of the Unix-like Idris operating system, and what was reported to be the first commercial C compiler). The language was classified as an assembler because it worked with raw machine elements such as opcodes, registers, and memory references; but it incorporated an expression syntax to indicate execution order. Parentheses and other special symbols, along with block-oriented structured programming constructs, controlled the sequence of the generated instructions. A-natural was built as the object language of a C compiler, rather than for hand-coding, but its logical syntax won some fans.

There has been little apparent demand for more sophisticated assemblers since the decline of large-scale assembly language development.^[35] In spite of that, they are still being developed and applied in cases where resource constraints or peculiarities in the target system’s architecture prevent the effective use of higher-level languages.^[36]

Assemblers with a strong macro engine allow structured programming via macros, such as the switch macro provided with the Masm32 package (this code is a complete program):

include masm32includemasm32rt.inc	; use the Masm32 library

.code
demomain:
  REPEAT 20
	switch rv(nrandom, 9)	; generate a number between 0 and 8
	mov ecx, 7
	case 0
		print "case 0"
	case ecx				; in contrast to most other programming languages,
		print "case 7"		; the Masm32 switch allows "variable cases"
	case 1 .. 3
		.if eax==1
			print "case 1"
		.elseif eax==2
			print "case 2"
		.else
			print "cases 1 to 3: other"
		.endif
	case 4, 6, 8
		print "cases 4, 6 or 8"
	default
		mov ebx, 19		     ; print 20 stars
		.Repeat
			print "*"
			dec ebx
		.Until Sign?		 ; loop until the sign flag is set
	endsw
	print chr$(13, 10)
  ENDM
  exit
end demomain

Use of assembly language[edit]

Historical perspective[edit]

Assembly languages were not available at the time when the stored-program computer was introduced. Kathleen Booth «is credited with inventing assembly language»^[37]^[38] based on theoretical work she began in 1947, while working on the ARC2 at Birkbeck, University of London following consultation by Andrew Booth (later her husband) with mathematician John von Neumann and physicist Herman Goldstine at the Institute for Advanced Study.^[38]^[39]

In late 1948, the Electronic Delay Storage Automatic Calculator (EDSAC) had an assembler (named «initial orders») integrated into its bootstrap program. It used one-letter mnemonics developed by David Wheeler, who is credited by the IEEE Computer Society as the creator of the first «assembler».^[20]^[40]^[41] Reports on the EDSAC introduced the term «assembly» for the process of combining fields into an instruction word.^[42] SOAP (Symbolic Optimal Assembly Program) was an assembly language for the IBM 650 computer written by Stan Poley in 1955.^[43]

Assembly languages eliminate much of the error-prone, tedious, and time-consuming first-generation programming needed with the earliest computers, freeing programmers from tedium such as remembering numeric codes and calculating addresses. They were once widely used for all sorts of programming. However, by the late 1950s,^{[citation needed]} their use had largely been supplanted by higher-level languages, in the search for improved programming productivity. Today, assembly language is still used for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues.^[44] Typical uses are device drivers, low-level embedded systems, and real-time systems (see § Current usage).

Numerous programs have been written entirely in assembly language. The Burroughs MCP (1961) was the first computer for which an operating system was not developed entirely in assembly language; it was written in Executive Systems Problem Oriented Language (ESPOL), an Algol dialect. Many commercial applications were written in assembly language as well, including a large amount of the IBM mainframe software written by large corporations. COBOL, FORTRAN and some PL/I eventually displaced much of this work, although a number of large organizations retained assembly-language application infrastructures well into the 1990s.

Assembly language has long been the primary development language for 8-bit home computers such Atari 8-bit family, Apple II, MSX, ZX Spectrum, and Commodore 64. Interpreted BASIC dialects on these systems offer insufficient execution speed and insufficient facilities to take full advantage of the available hardware. These systems have severe resource constraints, idiosyncratic memory and display architectures, and provide limited system services. There are also few high-level language compilers suitable for microcomputer use. Similarly, assembly language is the default choice for 8-bit consoles such as the Atari 2600 and Nintendo Entertainment System.

Key software for IBM PC compatibles was written in assembly language, such as MS-DOS, Turbo Pascal, and the Lotus 1-2-3 spreadsheet. As computer speed grew exponentially, assembly language became a tool for speeding up parts of programs, such as the rendering of Doom, rather than a dominant development language. In the 1990s, assembly language was used to get performance out of systems such as the Sega Saturn^[45] and as the primary language for arcade hardware based on the TMS34010 integrated CPU/GPU such as Mortal Kombat and NBA Jam.

Current usage[edit]

There has been debate over the usefulness and performance of assembly language relative to high-level languages.^[46]

Although assembly language has specific niche uses where it is important (see below), there are other tools for optimization.^[47]

As of July 2017, the TIOBE index of programming language popularity ranks assembly language at 11, ahead of Visual Basic, for example.^[48] Assembler can be used to optimize for speed or optimize for size. In the case of speed optimization, modern optimizing compilers are claimed^[49] to render high-level languages into code that can run as fast as hand-written assembly, despite the counter-examples that can be found.^[50]^[51]^[52] The complexity of modern processors and memory sub-systems makes effective optimization increasingly difficult for compilers, as well as for assembly programmers.^[53]^[54] Moreover, increasing processor performance has meant that most CPUs sit idle most of the time,^[55] with delays caused by predictable bottlenecks such as cache misses, I/O operations and paging. This has made raw code execution speed a non-issue for many programmers.

There are some situations in which developers might choose to use assembly language:

Writing code for systems with older processors^{[clarification needed]} that have limited high-level language options such as the Atari 2600, Commodore 64, and graphing calculators.^[56] Programs for these computers of the 1970s and 1980s are often written in the context of demoscene or retrogaming subcultures.
Code that must interact directly with the hardware, for example in device drivers and interrupt handlers.
In an embedded processor or DSP, high-repetition interrupts require the shortest number of cycles per interrupt, such as an interrupt that occurs 1000 or 10000 times a second.
Programs that need to use processor-specific instructions not implemented in a compiler. A common example is the bitwise rotation instruction at the core of many encryption algorithms, as well as querying the parity of a byte or the 4-bit carry of an addition.
A stand-alone executable of compact size is required that must execute without recourse to the run-time components or libraries associated with a high-level language. Examples have included firmware for telephones, automobile fuel and ignition systems, air-conditioning control systems, security systems, and sensors.
Programs with performance-sensitive inner loops, where assembly language provides optimization opportunities that are difficult to achieve in a high-level language. For example, linear algebra with BLAS^[50]^[57] or discrete cosine transformation (e.g. SIMD assembly version from x264^[58]).
Programs that create vectorized functions for programs in higher-level languages such as C. In the higher-level language this is sometimes aided by compiler intrinsic functions which map directly to SIMD mnemonics, but nevertheless result in a one-to-one assembly conversion specific for the given vector processor.
Real-time programs such as simulations, flight navigation systems, and medical equipment. For example, in a fly-by-wire system, telemetry must be interpreted and acted upon within strict time constraints. Such systems must eliminate sources of unpredictable delays, which may be created by (some) interpreted languages, automatic garbage collection, paging operations, or preemptive multitasking. However, some higher-level languages incorporate run-time components and operating system interfaces that can introduce such delays. Choosing assembly or lower level languages for such systems gives programmers greater visibility and control over processing details.
Cryptographic algorithms that must always take strictly the same time to execute, preventing timing attacks.
Video encoders and decoders such as rav1e (an encoder for AV1)^[59] and dav1d (the reference decoder for AV1)^[60] contain assembly to leverage AVX2 and ARM Neon instructions when available.
Modify and extend legacy code written for IBM mainframe computers.^[61]^[62]
Situations where complete control over the environment is required, in extremely high-security situations where nothing can be taken for granted.
Computer viruses, bootloaders, certain device drivers, or other items very close to the hardware or low-level operating system.
Instruction set simulators for monitoring, tracing and debugging where additional overhead is kept to a minimum.
Situations where no high-level language exists, on a new or specialized processor for which no cross compiler is available.
Reverse-engineering and modifying program files such as:
- existing binaries that may or may not have originally been written in a high-level language, for example when trying to recreate programs for which source code is not available or has been lost, or cracking copy protection of proprietary software.
- Video games (also termed ROM hacking), which is possible via several methods. The most widely employed method is altering program code at the assembly language level.

Assembly language is still taught in most computer science and electronic engineering programs. Although few programmers today regularly work with assembly language as a tool, the underlying concepts remain important. Such fundamental topics as binary arithmetic, memory allocation, stack processing, character set encoding, interrupt processing, and compiler design would be hard to study in detail without a grasp of how a computer operates at the hardware level. Since a computer’s behavior is fundamentally defined by its instruction set, the logical way to learn such concepts is to study an assembly language. Most modern computers have similar instruction sets. Therefore, studying a single assembly language is sufficient to learn: I) the basic concepts; II) to recognize situations where the use of assembly language might be appropriate; and III) to see how efficient executable code can be created from high-level languages.^[23]

Typical applications[edit]

Assembly language is typically used in a system’s boot code, the low-level code that initializes and tests the system hardware prior to booting the operating system and is often stored in ROM. (BIOS on IBM-compatible PC systems and CP/M is an example.)
Assembly language is often used for low-level code, for instance for operating system kernels, which cannot rely on the availability of pre-existing system calls and must indeed implement them for the particular processor architecture on which the system will be running.
Some compilers translate high-level languages into assembly first before fully compiling, allowing the assembly code to be viewed for debugging and optimization purposes.
Some compilers for relatively low-level languages, such as Pascal or C, allow the programmer to embed assembly language directly in the source code (so called inline assembly). Programs using such facilities can then construct abstractions using different assembly language on each hardware platform. The system’s portable code can then use these processor-specific components through a uniform interface.
Assembly language is useful in reverse engineering. Many programs are distributed only in machine code form which is straightforward to translate into assembly language by a disassembler, but more difficult to translate into a higher-level language through a decompiler. Tools such as the Interactive Disassembler make extensive use of disassembly for such a purpose. This technique is used by hackers to crack commercial software, and competitors to produce software with similar results from competing companies.
Assembly language is used to enhance speed of execution, especially in early personal computers with limited processing power and RAM.
Assemblers can be used to generate blocks of data, with no high-level language overhead, from formatted and commented source code, to be used by other code.^[63]^[64]

Notes[edit]

^ Other than meta-assemblers
^ However, that does not mean that the assembler programs implementing those languages are universal.
^ «Used as a meta-assembler, it enables the user to design his own programming languages and to generate processors for such languages with a minimum of effort.»
^ This is one of two redundant forms of this instruction that operate identically. The 8086 and several other CPUs from the late 1970s/early 1980s have redundancies in their instruction sets, because it was simpler for engineers to design these CPUs (to fit on silicon chips of limited sizes) with the redundant codes than to eliminate them (see don’t-care terms). Each assembler will typically generate only one of two or more redundant instruction encodings, but a disassembler will usually recognize any of them.
^ AMD manufactured second-source Intel 8086, 8088, and 80286 CPUs, and perhaps 8080A and/or 8085A CPUs, under license from Intel, but starting with the 80386, Intel refused to share their x86 CPU designs with anyone—AMD sued about this for breach of contract—and AMD designed, made, and sold 32-bit and 64-bit x86-family CPUs without Intel’s help or endorsement.
^ In 7070 Autocoder, a macro definition is a 7070 macro generator program that the assembler calls; Autocoder provides special macros for macro generators to use.
^ «The following minor restriction or limitation is in effect with regard to the use of 1401 Autocoder when coding macro instructions …»

References[edit]

^ ^a ^b «Assembler language». High Level Assembler for z/OS & z/VM & z/VSE Language Reference Version 1 Release 6. IBM. 2014 [1990]. SC26-4940-06.
^ «Assembly: Review» (PDF). Computer Science and Engineering. College of Engineering, Ohio State University. 2016. Archived (PDF) from the original on 2020-03-24. Retrieved 2020-03-24.
^ Archer, Benjamin (November 2016). Assembly Language For Students. North Charleston, South Carolina, USA: CreateSpace Independent Publishing. ISBN 978-1-5403-7071-6. Assembly language may also be called symbolic machine code.
^ Streib, James T. (2020). «Guide to Assembly Language». Undergraduate Topics in Computer Science. Cham: Springer International Publishing. doi:10.1007/978-3-030-35639-2. ISBN 978-3-030-35638-5. ISSN 1863-7310. S2CID 195930813. Programming in assembly language has the same benefits as programming in machine language, except it is easier.
^ Saxon, James A.; Plette, William S. (1962). Programming the IBM 1401, a self-instructional programmed manual. Englewood Cliffs, New Jersey, USA: Prentice-Hall. LCCN 62-20615. (NB. Use of the term assembly program.)
^ Kornelis, A. F. (2010) [2003]. «High Level Assembler – Opcodes overview, Assembler Directives». Archived from the original on 2020-03-24. Retrieved 2020-03-24.
^ «Macro instructions». High Level Assembler for z/OS & z/VM & z/VSE Language Reference Version 1 Release 6. IBM. 2014 [1990]. SC26-4940-06.
^ Booth, Andrew D; Britten, Kathleen HV (1947). Coding for A.R.C. (PDF). Institute for Advanced Study, Princeton. Retrieved 2022-11-04.
^ Wilkes, Maurice Vincent; Wheeler, David John; Gill, Stanley J. (1951). The preparation of programs for an electronic digital computer (Reprint 1982 ed.). Tomash Publishers. ISBN 978-0-93822803-5. OCLC 313593586.
^ Fairhead, Harry (2017-11-16). «History of Computer Languages — The Classical Decade, 1950s». I Programmer. Archived from the original on 2020-01-02. Retrieved 2020-03-06.
^ «How do assembly languages depend on operating systems?». Stack Exchange. Stack Exchange Inc. 2011-07-28. Archived from the original on 2020-03-24. Retrieved 2020-03-24. (NB. System calls often vary, e.g. for MVS vs. VSE vs. VM/CMS; the binary/executable formats for different operating systems may also vary.)
^ Austerlitz, Howard (2003). «Computer Programming Languages». Data Acquisition Techniques Using PCs. Elsevier. pp. 326–360. doi:10.1016/b978-012068377-2/50013-9. ISBN 9780120683772. Assembly language (or Assembler) is a compiled, low-level computer language. It is processor-dependent since it basically translates the Assembler’s mnemonics directly into the commands a particular CPU understands, on a one-to-one basis. These Assembler mnemonics are the instruction set for that processor.
^ Carnes, Beau (2022-04-27). «Learn Assembly Language Programming with ARM». freeCodeCamp.org. Retrieved 2022-06-21. Assembly language is often specific to a particular computer architecture so there are multiple types of assembly languages. ARM is an increasingly popular assembly language.
^ Brooks, Frederick P. (1986). «No Silver Bullet—Essence and Accident in Software Engineering». Proceedings of the IFIP Tenth World Computing Conference: 1069–1076.
^ Anguiano, Ricardo. «linux kernel mainline 4.9 sloccount.txt». Gist. Retrieved 2022-05-04.
^ Daintith, John, ed. (2019). «meta-assembler». A Dictionary of Computing. Archived from the original on 2020-03-24. Retrieved 2020-03-24.
^ Xerox Data Systems (Oct 1975). Xerox Meta-Symbol Sigma 5-9 Computers Language and Operations Reference Manual (PDF). p. vi. Archived (PDF) from the original on 2022-10-09. Retrieved 2020-06-07.
^ Sperry Univac Computer Systems (1977). Sperry Univac Computer Systems Meta-Assembler (MASM) Programmer Reference (PDF). Archived (PDF) from the original on 2022-10-09. Retrieved 2020-06-07.
^ «How to Use Inline Assembly Language in C Code». gnu.org. Retrieved 2020-11-05.
^ ^a ^b ^c ^d Salomon, David (February 1993) [1992]. Written at California State University, Northridge, California, USA. Chivers, Ian D. (ed.). Assemblers and Loaders (PDF). Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. pp. 7, 237–238. ISBN 0-13-052564-2. Archived (PDF) from the original on 2020-03-23. Retrieved 2008-10-01. (xiv+294+4 pages)
^ Finlayson, Ian; Davis, Brandon; Gavin, Peter; Uh, Gang-Ryung; Whalley, David; Själander, Magnus; Tyson, Gary (2013). «Improving processor efficiency by statically pipelining instructions». Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems. pp. 33–44. doi:10.1145/2465554.2465559. ISBN 9781450320856. S2CID 8015812.
^ Beck, Leland L. (1996). «2». System Software: An Introduction to Systems Programming. Addison Wesley.
^ ^a ^b Hyde, Randall (September 2003) [1996-09-30]. «Foreword («Why would anyone learn this stuff?») / Chapter 12 – Classes and Objects». The Art of Assembly Language (2 ed.). No Starch Press. ISBN 1-886411-97-2. Archived from the original on 2010-05-06. Retrieved 2020-06-22. Errata: [1] (928 pages) [2][3]
^ ^a ^b ^c ^d Intel Architecture Software Developer’s Manual, Volume 2: Instruction Set Reference (PDF). Vol. 2. Intel Corporation. 1999. Archived from the original (PDF) on 2009-06-11. Retrieved 2010-11-18.
^ Ferrari, Adam; Batson, Alan; Lack, Mike; Jones, Anita (2018-11-19) [Spring 2006]. Evans, David (ed.). «x86 Assembly Guide». Computer Science CS216: Program and Data Representation. University of Virginia. Archived from the original on 2020-03-24. Retrieved 2010-11-18.
^ «The SPARC Architecture Manual, Version 8» (PDF). SPARC International. 1992. Archived from the original (PDF) on 2011-12-10. Retrieved 2011-12-10.
^ Moxham, James (1996). «ZINT Z80 Interpreter». Z80 Op Codes for ZINT. Archived from the original on 2020-03-24. Retrieved 2013-07-21.
^ Hyde, Randall. «Chapter 8. MASM: Directives & Pseudo-Opcodes» (PDF). The Art of Computer Programming. Archived (PDF) from the original on 2020-03-24. Retrieved 2011-03-19.
^ Users of 1401 Autocoder. Archived from the original on 2020-03-24. Retrieved 2020-03-24.
^ Griswold, Ralph E. (1972). «Chapter 1». The Macro Implementation of SNOBOL4. San Francisco, California, USA: W. H. Freeman and Company. ISBN 0-7167-0447-1.
^ «Macros (C/C++), MSDN Library for Visual Studio 2008». Microsoft Corp. 2012-11-16. Archived from the original on 2020-03-24. Retrieved 2010-06-22.
^ Kessler, Marvin M. (1970-12-18). «*Concept* Report 14 — Implementation of Macros To Permit Structured Programming in OS/360». MVS Software: Concept 14 Macros. Gaithersburg, Maryland, USA: International Business Machines Corporation. Archived from the original on 2020-03-24. Retrieved 2009-05-25.
^ «High Level Assembler Toolkit Feature Increases Programmer Productivity». IBM. 1995-12-12. Announcement Letter Number: A95-1432.
^ Whitesmiths Ltd (1980-07-15). A-Natural Language Reference Manual.
^ «assembly language: Definition and Much More from Answers.com». answers.com. Archived from the original on 2009-06-08. Retrieved 2008-06-19.
^ Provinciano, Brian (2005-04-17). «NESHLA: The High Level, Open Source, 6502 Assembler for the Nintendo Entertainment System». Archived from the original on 2020-03-24. Retrieved 2020-03-24.
^ Dufresne, Steven (2018-08-21). «Kathleen Booth: Assembling Early Computers While Inventing Assembly». Archived from the original on 2020-03-24. Retrieved 2019-02-10.
^ ^a ^b Booth, Andrew Donald; Britten, Kathleen Hylda Valerie (September 1947) [August 1947]. General considerations in the design of an all purpose electronic digital computer (PDF) (2 ed.). The Institute for Advanced Study, Princeton, New Jersey, USA: Birkbeck College, London. Archived (PDF) from the original on 2020-03-24. Retrieved 2019-02-10. The non-original ideas, contained in the following text, have been derived from a number of sources, … It is felt, however, that acknowledgement should be made to Prof. John von Neumann and to Dr. Herman Goldstein for many fruitful discussions …
^ Campbell-Kelly, Martin (April 1982). «The Development of Computer Programming in Britain (1945 to 1955)». IEEE Annals of the History of Computing. 4 (2): 121–139. doi:10.1109/MAHC.1982.10016. S2CID 14861159.
^ Campbell-Kelly, Martin (1980). «Programming the EDSAC». IEEE Annals of the History of Computing. 2 (1): 7–36. doi:10.1109/MAHC.1980.10009.
^ «1985 Computer Pioneer Award ‘For assembly language programming’ David Wheeler».
^ Wilkes, Maurice Vincent (1949). «The EDSAC – an Electronic Calculating Machine». Journal of Scientific Instruments. 26 (12): 385–391. Bibcode:1949JScI…26..385W. doi:10.1088/0950-7671/26/12/301.
^ da Cruz, Frank (2019-05-17). «The IBM 650 Magnetic Drum Calculator». Computing History — A Chronology of Computing. Columbia University. Archived from the original on 2020-02-15. Retrieved 2012-01-17.
^ Collen, Morris F. (March–April 1994). «The Origins of Informatics». Journal of the American Medical Informatics Association. 1 (2): 96–97. doi:10.1136/jamia.1994.95236152. PMC 116189. PMID 7719803.
^ Pettus, Sam (2008-01-10). «SegaBase Volume 6 — Saturn». Archived from the original on 2008-07-13. Retrieved 2008-07-25.
^ Kauler, Barry (1997-01-09). Windows Assembly Language and Systems Programming: 16- and 32-Bit Low-Level Programming for the PC and Windows. CRC Press. ISBN 978-1-48227572-8. Retrieved 2020-03-24. Always the debate rages about the applicability of assembly language in our modern programming world.
^ Hsieh, Paul (2020-03-24) [2016, 1996]. «Programming Optimization». Archived from the original on 2020-03-24. Retrieved 2020-03-24. … design changes tend to affect performance more than … one should not skip straight to assembly language until …
^ «TIOBE Index». TIOBE Software. Archived from the original on 2020-03-24. Retrieved 2020-03-24.
^ Rusling, David A. (1999) [1996]. «Chapter 2 Software Basics». The Linux Kernel. Archived from the original on 2020-03-24. Retrieved 2012-03-11.
^ ^a ^b Markoff, John Gregory (2005-11-28). «Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speeding Up Chips». The New York Times. Seattle, Washington, USA. Archived from the original on 2020-03-23. Retrieved 2010-03-04.
^ «Bit-field-badness». hardwarebug.org. 2010-01-30. Archived from the original on 2010-02-05. Retrieved 2010-03-04.
^ «GCC makes a mess». hardwarebug.org. 2009-05-13. Archived from the original on 2010-03-16. Retrieved 2010-03-04.
^ Hyde, Randall. «The Great Debate». Archived from the original on 2008-06-16. Retrieved 2008-07-03.
^ «Code sourcery fails again». hardwarebug.org. 2010-01-30. Archived from the original on 2010-04-02. Retrieved 2010-03-04.
^ Click, Cliff; Goetz, Brian. «A Crash Course in Modern Hardware». Archived from the original on 2020-03-24. Retrieved 2014-05-01.
^ «68K Programming in Fargo II». Archived from the original on 2008-07-02. Retrieved 2008-07-03.
^ «BLAS Benchmark-August2008». eigen.tuxfamily.org. 2008-08-01. Archived from the original on 2020-03-24. Retrieved 2010-03-04.
^ «x264.git/common/x86/dct-32.asm». git.videolan.org. 2010-09-29. Archived from the original on 2012-03-04. Retrieved 2010-09-29.
^ «rav1e/README.md at v0.6.3». GitHub. Archived from the original on 2023-02-21. Retrieved 2023-02-21.
^ «README.md · 1.1.0 · VideoLAN / dav1d». Archived from the original on 2023-02-21. Retrieved 2023-02-21.
^ Bosworth, Edward (2016). «Chapter 1 – Why Study Assembly Language». www.edwardbosworth.com. Archived from the original on 2020-03-24. Retrieved 2016-06-01.
^ «z/OS Version 2 Release 3 DFSMS Macro Instructions for Data Sets» (PDF). IBM. 2019-02-15. Archived (PDF) from the original on 2021-06-25. Retrieved 2021-09-14.
^ Paul, Matthias R. (2001) [1996], «Specification and reference documentation for NECPINW», NECPINW.CPI — DOS code page switching driver for NEC Pinwriters (2.08 ed.), FILESPEC.TXT, NECPINW.ASM, EUROFONT.INC from NECPI208.ZIP, archived from the original on 2017-09-10, retrieved 2013-04-22
^ Paul, Matthias R. (2002-05-13). «[fd-dev] mkeyb». freedos-dev. Archived from the original on 2018-09-10. Retrieved 2018-09-10.

Источник

Маркеры

Каждому типу текста
(текст-рассуждение, текст-описание)
соответствует определенная
последовательность подачи информации,
т.е. существует принципиальная схема,
по которой строится каждый конкретный
текст. Знание сигнальных слов (маркеров),
выполняющих функцию выделения важного,
функцию связи и обеспечения общей
смысловой направленности служит надежной
опорой понимания и эффективным средством
нахождения необходимой информации.

Маркеры,
сигнализирующие
о
перечислении,
порядке
следования:
one,
three …,
first,
second …,
next, then, finally, to begin with, eventually, subsequently, to
conclude, in the end и
т.д.

Маркеры,
сигнализирующие о добавочной информации:

подчеркивание
сходства:
in
the same
way,
likewise, equally, similarly;
подтверждение
ранее
изложенного:
moreover,
also, furthermore, in addition, above all, again;
переход к новой
информации: by
the
way,
now;
противопоставление:
but,
then, on the one hand, on the other hand, alternatively, rather,
instead, on the contrary;
введение неожиданной
информации, которая не вытекает из
предшествующей, а находится в противоречии
с ней: anyway,
anyhow,
however,
nevertheless,
though,
yet,
in
spite
of,
at
the
same
time.

Маркеры,
обозначающие
итог
изложенного
или
вывод,
результат,
который
следует
из
предшествующей
информации:
as
a
result, consequently, in consequence», hence, to conclude, then,
thus, therefore, so,
so
far.

Маркеры, указывающие
на объяснение, толкование, изложение
ран сказанного другими снопами: in
other
words,
namely,
rather,
that
is
to
say
и т.д.

Маркеры, указывающие
на иллюстрацию примером: for
instance,
for
example,
to
illustrate.

Как правило, при
наличии в предложении маркера, он
переводится в первую очередь. Запомните
некоторые из них:

again
кроме
того; к тому же

also
к
тому же

alternatively
с
другой стороны

consequently
следовательно;
поэтому; в результате

for
instance; for example
например

further
далее;
затем

furthermore
к
тому же

however
однако

now
теперь;
в то время

therefore
поэтому;
следовательно

thus
итак;
таким образом

on the one
hand
с
одной
стороны

on
the
other
hand
с
другой стороны

conversely
наоборот

3. Прочитайте
следующий текст. Выделите маркеры и
скажите, о чем они сигнализируют.

Why Assembly Language?

Many people
write all of their computer programs in one of the so-called
«high-level languages», particularly BASIC. BASIC is easy to
learn, easy to use, and fast enough for most computing tasks. That
being the case, why would anyone want to use any other language? One
reason is that BASIC, like human languages, is not well-suited to
everything Some tasks are much easier in other languages. Special
computing tasks like graphics, music, or word processing are often
easier in special languages. Furthermore,BASIC
is quite slow. The term slow may surprise the beginner, since short
programs seem to run instantaneously. However, problems occur in the
following situations:

When large
amounts of data are involved. You will notice how slow BASIC is when
a program must, for example, sort a long list of names and addresses
or accounts Similarly, BASIC will be quite slow when a program must
search through a 50-page report or keep inventory records on
thousands of items.
When
graphics is involved. If a program is drawing a picture on the
Screen, it must work quickly or the delay will be intolerable. If
objects in the picture are supposed to move, the program must be
fast enough to make the motion look natural. This is particularly
difficult when the picture contains many objects (such as space
ships, base stations, and alien invaders), all of which are moving
in different directions.
When a lot
of decisions or «thinking» is required. This is often
necessary in complex games like checkers or chess. The program has
to try many possibilities and docile on a reasonable move.
Obviously, the more possibilities there are and the more analysis
required, the longer it will take the computer to move.

Why is BASIC slow? In the
first place, the computer actually translates each BASIC statement
into simple internal commands (so-called machine

or assembly language). It does
this every time it runs the BASIC program. Thus, much of the
computer’s time is spent translating the program, not running it.

There are
versions of BASIC called compilers that perform the translation once
and then save the translated version. However, BASIC would still be
slow because of its mechanical nature. It is really like an
automobile with an automatic transmission; no amount of coaxing can
ever get you the performance or fuel economy that a skilful driver
can achieve with a manual transmission. The human being is simply a
more flexible, more skilful, and smarter operator than is the
automatic transmission or the BASIC interpreter or compiler.

Assembly language is the
computer equivalent of a manual transmission. It gives the programmer
greater control over the computer at the cost of more work, more
detail, and less convenience. Like an automatic transmission, BASIC
is good enough most of the time for most programmers. But for those
who must get maximum performance from their computers, assembly
language is essential. You will find that most complex games,
graphics programs, and large business programs are written at least
partially in assembly language.

Even if
assembly language is your likely choice, you may be wondering whether
you have enough background to learn assembly language programming.
You do if you have done some programming of any kind. If you know
BASIC or some other high-level language that’s fine. If you have
developed programs in an assembly language, that’s even better.

Like BASIC,
assembly language is a set of words that tell the computer what to
do. However, the words in the assembly language instruction refer to
computer components directly. It’s like the difference between
telling someone to walk to the mailbox and telling them precisely how
to move their muscles and maneuver past obstacles. Obviously, a
simple command is sufficient most of the time; only athletes or
mountain climbers need the more detailed instructions.

Assembly language programs
give the computer detailed commands, such as «load 32 into the
AX register», «transfer the contents of the CL register
into the DL register», and «store the number in the DL
register into memory location 3,456». As you see, BASIC and
assembly language differ in how you instruct the computer. With
BASIC, you speak in generalities: with assembly language, you speak
in specifics.

Although assembly language
programs take more time and effort to write than BASIC programs, they
also run much faster. The level of detail is the

key here. The idea is the same
as an athlete who runs faster or jumps farther by watching every step
of what he or she does. Precise form is essential to achieving
maximum performance.

Because assembly language
requires you to operate on the computer’s internal components, you
must understand the features and capabilities of the integrated
circuits (or «chip») that holds these components, the
computer’s microprocessor.

Notes:

suit
подходить

instantaneously
мгновенно

search
искать;
перебирать

keep
inventory records
хранить
записи

item
пункт;
статья (в описи)

delay
задержка

intolerable
невыносимый

alien
invaders
захватчики

statement
утверждение;
оператор

flexible
гибкий

smart
умный;
интеллектуальный

convenience
удобство

background
основа

obstacle
препятствие

generality
общность

specific
конкретность

В случае затруднений
выполните упражнение №1 в Unit
4.

Заполните таблицу:

	BASIC	Assembly language
advantages
disadvantages

А у Вас есть опыт
работы с BASIC/Assembly
language?
Поделитесь Вашим опытом. Ваше мнение
об этих языках программирования.

4.
Прочитайте текст. А Вы хорошо знаете
Structured
Programming?

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

Источник

If you want to know how computers work and avoid a few programming pitfalls, it’s helpful to learn the basics. So, let’s take a quick look at assembly language and a few of the benefits of understanding it.

If you’re a software developer, you’re probably aware that there are multiple types of computer languages that developers and systems use. Computers understand machine-level language, while humans use higher-level languages like JavaScript, C++, etc. But there’s a language that falls in the middle that helps bridge the divide between these two entities called assembly language. But what is an assembly language and where does it fit into the programming hierarchy?

What Is Assembly Language? A Look at Machine-Specific Programming

A basic illustration that shows how assemblers take an assembly language input and use it to create a machine learning output

A basic graphic that illustrates how assemblers convert assembly language into machine language.

Assembly language, also known as assembler language, is a low-level programming language that’s designed to communicate instructions with specific computer hardware and direct the flow of information. It does this using human-readable mnemonics (consisting of mnemonics like “LDA” to represent load accumulator) to form short code that makes it easier for the person trying to complete the work. These short codes are converted into machine learning language (binary, i.e., 1s and 0s) through the use of programs called assemblers.

In a nutshell, machine language uses binary code, which is almost impossible for humans to decipher, whereas assembly language uses mnemonic codes to write a program. Mnemonic codes make it simpler for humans to understand or remember something, and so make the language a bit easier for humans to use than machine code.

Assembly is language is nothing new; it’s been around almost as long as computers themselves. The first assembly language was invented in 1947 by mathematician Kathleen Booth at Birkbeck College, London, with assistance from Andrew Booth (her husband), Jon von Neumann, and Herman Goldstein.

Is Assembly Commonly Used by Developers?

No way. The truth is that most developers don’t use assembly language anymore because it’s often regarded as being cumbersome and challenging to use. This is so much the case that assembly language has basically gone the way of Javan Rhinoceroses — it’s virtually extinct as it’s something most professionals nowadays never learn. (In fact, many colleges only offer it as an elective course [if they offer it at all]!)

There are other languages that are easier to use — namely high-level languages, which we’ll speak more about momentarily. And many of these high-level languages bypass the need for knowing assembly language altogether because they convert a developer’s written code to a standard intermediate language on the backend.

Occasionally, you’ll find the rare hardcore developer who still uses assembly language. But you might want to snap a picture when you see it because it’s truly a rare occurrence.

Assembly Language Is One of Multiple Types of Computer Languages

To understand the concept of assembly language, it helps to take a holistic approach and learn about all three types of computer language. It’s important to note that some people say there are three main groups of languages, some say four, some say even more than that. The answer varies depending on whom you ask and how they categorize these languages.

1. Machine Language

Every device has a processor responsible for all the functions performed by that device. These processors use machine language to “talk” to the different parts of the device, for example, giving and receiving instructions to/from the keyboard and mouse. As we touched on earlier, machine language is written in binary form (0s and 1s) in a hexadecimal format and each family of processors has its own version. Although processors speak machine language fluently, it’s extremely challenging for a human to read or use — and virtually impossible for someone to do so quickly. That’s where assembly language comes in.

2. Assembly Languages

In general, assembly language is a bit more user-friendly than machine-level language but more difficult than high-level language. It uses short codes to instruct the machine to perform certain operations. Whereas machine language uses 0s and 1s to instruct the computer, assembly language uses mnemonics that are easier for humans to work with.

As we already know, processors only speak machine language. So, to translate the machine code into assembly language and vice versa, a program called an assembler is required. The translation process is known as assembling, and the time required to translate the language is called assembling time. As different families of processors use different machine codes, the assembly language for each family is also different. Some assembly languages work across different operating systems, whereas others are specific to one OS or platform.

Here’s a quick look at what assembly language looks and how it translates to machine code via the assembler:

3. Middle-Level Languages (i.e., Common Intermediate Language)

As we mentioned earlier, developers typically don’t use assembly languages anymore. Often, developers use high-level languages that are then compiled into intermediate language (IL) that gets translated into machine language code. Basically, this means that it bypasses the need for assembly altogether.

4. High-level Languages

So, what are higher-level languages? Assembly language requires multi-line, detailed instructions to carry out simple functions. Higher-level computer languages are processor agnostic and are designed to give instructions in a human-readable manner. Programs written in higher-level languages can be read by humans (as long as they know the language) and are concise and easy to operate.

For example, what could take eight lines worth of code in assembly might take three short commands in a high-level language. This information would then use a compiler tool to convert it into machine code.

However, the execution time is greater than with lower-level languages. Although the difference is minimal, some programmers prefer to use assembly language when a short program is needed, and time is of the essence.

Machine Language	Assembly Languages	Middle-Level Language	High-Level Language
Machine language is the most basic language to interact with the computer.	Assembly language is a low-level language that relies on codes to interact with the computer. It’s a little easier to work with than machine language but not as easy to use as high-level languages.	A middle-level language, often called common intermediate language (CIL) or bytecode, is a more universal language that compilers will use to run code cross-platform.	A high-level language is a computer language that uses commands easily understood by humans. This is often a developer’s go-to language because it’s easy to use.
Machine language is understood by the processor. It’s a binary language consisting of 0s and 1s	Assembly language contains slightly more human-friendly short codes. It relies on assemblers to convert the code into machine language.	It’s the easiest low-level programming language for people to read (easier than assembly).	High-level language has very user-friendly commands and syntax. In some cases, it uses a compiler to translate these commands into assembly, which then translates into machine language for processors to understand. But isn’t a technical requirement; many compilers instead translate to CIL instead (which then converts to machine language).
The execution speed of machine language is very high as the processor doesn’t have to translate it.	The execution speed of assembly language is not as fast as machine language. The commands need to be translated to be understood by the processor.	The execution speed of a compiler is slower than that of an assembler.	The execution speed of a high-level language is the slowest in comparison to the other types of languages.
It is humanly impossible to write executable programs in machine language.	By and large, this language is no longer used. It’s rare to see in the wild anymore,	This is something that’s done on the backend via a compiler that doesn’t require the developer to do anything.	The syntax of high-level language can be easily read by humans and hence, programming is faster and easier.
Example(s) of machine languages: Binary language	Examples of components that may use assembly: ARM, MIPS, x86 (each has its own assembly language)	Example of a CIL: Microsoft intermediate language (MSIL), which later became standardized as CIL	Examples of high-level languages: Java, JavaScript, C++, C#, F#

Assembly Language Short Code Consists of Opcode and Operands

Assembly language instructions use mnemonic code and meta statements to give instructions to the computer (once translated into machine language). A computer can perform arithmetic functions like addition and subtraction, and logical functions like comparison and conditional functions. To instruct the computer, the assembly language uses one operation code (opcode), followed by two arguments (operands).

Opcode: The operation code (opcode) instructs the machine about the type of operation to be performed.

Operands: Operands can represent the data to be processed, or the constants on which the operation is to be performed. Operands can either be a constant value or an address pointing to where the data is stored (e.g., a specific register value or memory location). They can define the address of the data using:

Register values – Registers are a small part of a computer processor used to store data for immediate, temporary use. They are built to enhance efficiency in program and operation executions.
Stack values – A stack is an abstract data type (ADT) that is linear and uses the last-in-first-out (LIFO) method for following operations. Homogenous data are stored in a stack where the last input becomes the first output.
Memory values – Random access memory (RAM) is the place where the computer stores temporary data like the variables in a program. Every byte has a unique memory location. So, if the program wants to retrieve the variable during the execution, the memory address would be used.
Input/output ports – Input/output ports are the sockets on a computer to attach external drives or networks. If the data is stored on any drives attached to the computer, you can point to that data in operands.

The following table shows the commands in different assembly architecture and assembler program languages:

Operation	Syntax in ARM	Syntax in x86	Syntax in MIPS
Addition	ADD	Add	Add
Subtraction	SUB	Sub	Sub
Multiplication	MUL	Mul	Mult

A Breakdown of the Two Types of Assemblers

Assembly languages need an assembler in some cases to translate high-level code into machine code. Assemblers are categorized based on the number of cycles (passes) they perform to generate the object code. We might use a one-pass or a multi-pass assembler depending on the situation and requirements.

Let’s take a look at the two types of assemblers.

One-pass assemblers (single-pass assemblers): A one-pass assembler passes the code through the assembler only once to generate the object file. It’s faster than a two-pass assembler but is also more complex. This type reads the source files one time before working on data structures in memory.
Two-pass assemblers (multi-pass assemblers): A multi-pass assembler is an older style of assembler that uses more than one pass to generate the object file. This involved the assembler reading files twice, each separate pass handling specific tasks.

Of course, there are other differences between one-pass and two-pass assemblers, but we’re not going to get into all of that here. Check out this Quora post for a more granular comparison.

And just remember: in many cases, assemblers are no longer used. This is because many high-level languages compile into CIL and then directly into machine language; they bypass the need for assembly altogether.

Pros and Cons of Learning and Using Assembly Languages

Assembly language is typically used in boot codes and operating system kernels but can also be used at higher levels to create many different kinds of programs. Let’s look at some of the pros and cons of learning and using assembly language:

Pros of Learning/Using Assembly Language	Cons of Learning/Using Assembly Language
Assembly language makes it faster to run complex code.	It is incredibly difficult to write code in assembly language compared to high-level languages.
Simpler for humans to understand than machine-level language.	The syntax is hard to learn and can be difficult to remember or use quickly.
Convenient to use in testing, debugging, optimization, and development.	The code is longer and more complex than with a high-level language.
Requires little memory and works well with devices with low random access memory (RAM).	It lacks portability between devices with different processors.
Assembly language can be used to reverse engineer software where the source code is not available.	It’s often bypassed by compilers that convert high-level languages straight to machine language.

Final Words on Assembly Language

Assembly language fits between high-level and machine-level languages, helping to facilitate communication between users and computers that use different languages. While high-level languages are easier for a programmer to use, machines don’t understand them. On the other hand, machine code is easier for devices but very difficult for humans to understand. So, assembly language serves as a way to bridge that gap historically.

Although assembly language is easier for us to use than machine language, that doesn’t mean it’s easy to use. This is why most high-level programming languages and programs nowadays bypass the need for assembly and go straight to CIL and machine language after that.

Источник

This article may be too long or complex for new English speakers to understand. This article should be shortened and/or simplified for English beginners. (November 2021)

An assembly language is a programming language that can be used to directly tell the computer what to do. An assembly language is almost exactly like the machine code that a computer can understand, except that it uses words in place of numbers. A computer cannot really understand an assembly program directly. However, it can easily change the program into machine code by replacing the words of the program with the numbers that they stand for. A program that does that is called an assembler.

Programs written in assembly language are usually made of instructions, which are small tasks that the computer performs when it is running the program. They are called instructions because the programmer uses them to instruct the computer what to do. The part of the computer that follows the instructions is the processor.

The assembly language of a computer is a low-level language, which means that it can only be used to do the simple tasks that a computer can understand directly. In order to perform more complex tasks, one must tell the computer each of the simple tasks that are part of the complex task. For example, a computer does not understand how to print a sentence on its screen. Instead, a program written in assembly must tell it how to do all of the small steps that are involved in printing the sentence.

Such an assembly program would be composed of many, many instructions, that together do something that seems very simple and basic to a human. This makes it hard for humans to read an assembly program. In contrast, a high-level programming language may have a single instruction such as PRINT «Hello, world!» that will tell the computer to perform all of the small tasks for you.

Development of assembly language[change | change source]

When computer scientists first built programmable machines, they programmed them directly in machine code, which is a series of numbers that instructed the computer what to do. Writing machine language was very hard to do and took a long time, so eventually assembly language was made. Assembly language is easier for a human to read and can be written faster, but it is still much harder for a human to use than a high-level programming language which tries to mimic human language.

Programming in machine code[change | change source]

To program in machine code, the programmer needs to know what each instruction looks like in binary (or hexadecimal). Although it is easy for a computer to quickly figure out what machine code means, it is hard for a programmer. Each instruction can have several forms, all of which just look like a bunch of numbers to people. Any mistake that someone makes while writing machine code will only be noticed when the computer does the wrong thing. Figuring out the mistake is hard because most people cannot tell what machine code means by looking at it. An example of what machine code looks like:

05 2A 00

This hexadecimal machine code tells an x86 computer processor to add 42 to the accumulator. It is very difficult for a person to read and understand it even if that person knows machine code.

Using assembly language instead[change | change source]

With assembly language, each instruction can be written as a short word, called a mnemonic, followed by other things like numbers or other short words. The mnemonic is used so that the programmer does not have to remember the exact numbers in machine code needed to tell the computer to do something. Examples of mnemonics in assembly language include add, which adds data, and mov, which moves data from one place to another. Because ‘mnemonic’ is an uncommon word, the phrase instruction type or just instruction is sometimes used instead, often incorrectly. The words and numbers after the first word give more information about what to do. For instance, things following an add might be what two things to add together and the things following mov say what to move and where to put it.

For example, the machine code in the previous section (05 2A 00) can be written in assembly as:

Assembly language also allows programmers to write the actual data the program uses in easier ways. Most assembly languages have support for easily making numbers and text. In machine code, each different type of number like positive, negative or decimal, would have to be manually converted into binary and text would have to be defined one letter at a time, as numbers.

Assembly language provides what is called an abstraction of machine code. When using assembly, programmers do not need to know the details of what numbers mean to the computer, the assembler figures that out instead. Assembly language actually still lets the programmer use all the features of the processor that they could with machine code. In this sense, assembly language has a very good, rare trait: it has the same ability to express things as the thing it is abstracting (machine code) while being much easier to use. Because of this, machine code is almost never used as a programming language.

Disassembly and debugging[change | change source]

When programs are finished, they have already been transformed into machine code so that the processor can actually run them. Sometimes, however, if the program has a bug (mistake) in it, programmers will want to be able to tell what each part of the machine code is doing. Disassemblers are programs that help programmers do that by transforming the machine code of the program back into assembly language, which is much easier to understand. Disassemblers, which turn machine code into assembly language, do the opposite of assemblers, which turn assembly language into machine code.

Computer organization[change | change source]

An understanding of how computers are organized, how they seem to work at a very low level, is needed to understand how an assembly language program works. At the most simplistic level, computers have three main parts:

main memory or RAM which holds data and instructions,
a processor, which processes the data by executing the instructions, and
input and output (sometimes shortened to I/O), which allow the computer to communicate with the outside world and store data outside of main memory so it can get the data back later.

Main memory[change | change source]

In most computers, memory is divided up into bytes. Each byte contains 8 bits. Each byte in memory also has an address which is a number that says where the byte is in memory. The first byte in memory has an address of 0, the next one has an address of 1, and so on. Dividing memory into bytes makes it byte addressable because each byte gets a unique address. Addresses of byte memories cannot be used to refer to a single bit of a byte. A byte is the smallest piece of memory that can be addressed.

Even though an address refers to a particular byte in memory, processors allow for using several bytes of memory in a row. The most common use of this feature is to use either 2 or 4 bytes in a row to represent a number, usually an integer. Single bytes are sometimes also used to represent integers, but because they are only 8 bits long, they can only hold 2⁸ or 256 different possible values. Using 2 or 4 bytes in a row raises the number of different possible values to be 2¹⁶, 65536 or 2³², 4294967296, respectively.

When a program uses a byte or a number of bytes in a row to represent something like a letter, number, or anything else, those bytes are called an object because they are all part of the same thing. Even though objects are all stored in identical bytes of memory, they are treated as though they have a ‘type’, which says how the bytes should be understood: either as an integer or a character or some other type (like a non-integer value). Machine code can also be thought of as a type that is interpreted as instructions. The notion of a type is very, very important because it defines what things can and can’t be done to the object and how to interpret the bytes of the object. For instance, it is not valid to store a negative number in a positive number object and it is not valid to store a fraction in an integer.

An address that points to (is the address of) a multi-byte object is the address to the first byte of that object – the byte that has the lowest address. As an aside, one important thing to note is that you can’t tell what the type of an object is — or even its size — by its address. In fact, you can’t even tell what type an object is by looking at it. An assembly language program needs to keep track of which memory addresses hold which objects, and how big those objects are. A program that does so is type safe because it only does things to objects that are safe to do on their type. A program that doesn’t will probably not work properly. Note that most programs do not actually explicitly store what the type of an object is, they just access objects consistently — the same object is always treated as the same type.

The processor[change | change source]

The processor runs (executes) instructions, which are stored as machine code in main memory. As well as being able to access memory for storage, most processors have a few small, fast, fixed-size spaces for holding objects that are currently being worked with. These spaces are called registers. Processors usually execute three types of instructions, although some instructions can be a combination of these types. Below are some examples of each type in x86 assembly language.

Instructions that read or write memory[change | change source]

The following x86 assembly language instruction reads (loads) a 2-byte object from the byte at address 4096 (0x1000 in hexadecimal) into a 16-bit register called ‘ax’:

In this assembly language, square brackets around a number (or a register name) mean that the number should be used as an address to the data that should be used. The use of an address to point to data is called indirection. In this next example, without the square brackets, another register, bx, actually gets the value 20 loaded into it.

Because no indirection was used, the actual value itself was put into the register.

If the operands (the things that come after the mnemonic), appear in the reverse order, an instruction that loads something from memory instead writes it to memory:

Here, the memory at address 1000h gets the value of bx. If this example is executed right after the previous one, the 2 bytes at 1000h and 1001h will be a 2 byte integer with the value of 20.

Instructions that perform mathematical or logical operations[change | change source]

Some instructions do things like subtraction or logical operations like not:

The machine code example earlier in this article would be this in assembly language:

Here, 42 and ax are added together and the result is stored back in ax. In x86 assembly it is also possible to combine a memory access and mathematical operation like this:

This instruction adds the value of the 2 byte integer stored at 1000h to ax and stores the answer in ax.

This instruction computes the or of the contents of the registers ax and bx and stores the result back into ax.

Instructions that decide what the next instruction is going to be[change | change source]

Usually, instructions are executed in the order they appear in memory, which is the order they are typed in the assembly code. The processor just executes them one after another. However, in order for processors to do complicated things, they need to execute different instructions based on what the data they were given is. The ability of processors to execute different instructions depending on something’s outcome is called branching. Instructions that decide what the next instruction should be are called branch instructions.

In this example, suppose someone wants to calculate the amount of paint they will need to paint a square with a certain side length. However, due to economy of scale the paint store will not sell them any less than amount of paint needed to paint a 100 x 100 square.

To figure out the amount of paint they will need to get based on the length of the square they want to paint, they come up with this set of steps:

subtract 100 from the side length
if the answer is less than zero, set the side length to 100
multiply the side length by itself

That algorithm can be expressed in the following code where ax is the side length.

	mov bx, ax
	sub bx, 100
	jge continue
	mov ax, 100
continue:
	mul ax

This example introduces several new things, but the first two instructions are familiar. They copy the value of ax into bx and then subtract 100 from bx.

One of the new things in this example is called a label, a concept found in assembly languages in general. Labels can be anything the programmer wants (unless it is the name of an instruction, which would confuse the assembler). In this example, the label is ‘continue’. It is interpreted by the assembler as the address of an instruction. In this case, it is the address of mult ax.

Another new concept is that of flags. On x86 processors, many instructions set ‘flags’ in the processor that can be used by the next instruction to decide what to do. In this case, if bx was less than 100, sub will set a flag that says the result was less than zero.

The next instruction is jge which is short for ‘jump if greater than or equal to’. It is a branch instruction. If the flags in the processor specify that the result was greater than or equal to zero, instead of just going to the next instruction the processor will jump to the instruction at the continue label, which is mul ax.

This example works fine, but it is not what most programmers would write. The subtract instruction set the flag correctly, but it also changes the value it operates on, which required the ax to be copied into bx. Most assembly languages allow for comparison instruction that do not change any of the arguments they are passed, but still set the flags properly and x86 assembly is no exception.

	cmp ax, 100
	jge continue
	mov ax, 100
continue:
	mul ax

Now, instead of subtracting 100 from ax, seeing if that number is less than zero, and assigning it back to ax, ax is left unchanged. The flags are still set the same way, and the jump is still taken in the same situations.

Input and output[change | change source]

While input and output are a fundamental part of computing, there is no one way they are done in assembly language. This is because the way I/O works depends on the set up of the computer and the operating system its running, not just what kind of processor it has. In the example section below, the Hello World example uses MS-DOS operating system calls and the example after it uses BIOS calls.

It is possible to do I/O in assembly language. Indeed, assembly language can generally express anything that a computer is capable of doing. However, even though there are instructions to add and branch in assembly language that will always do the same thing there are no instructions in assembly language that always do I/O.

The important thing to note is that the way that I/O works is not part of any assembly language because it is not part of how the processor works.

Assembly languages and portability[change | change source]

Even though assembly language is not directly run by the processor — machine code is, it still has a lot to do with it. Each processor family supports different features, instructions, rules for what the instructions can do, and rules for what combination of instructions are allowed where. Because of this, different types of processors still need different assembly languages.

Because each version of assembly language is tied to a processor family, it lacks something called portability. Something that has portability or is portable can be easily transferred from one type of computer to another. While other types of programming languages are portable, assembly language, in general, is not.

Assembly language and high-level languages[change | change source]

Although assembly language allows for an easy way to use all the processor’s features, it is not used for modern software projects for several reasons:

It takes a lot of effort to express a simple program in assembly.
Although not as error-prone as machine code, assembly language still offers very little protection against errors. Almost all assembly languages do not enforce type safety.
Assembly language does not promote good programming practices like modularity.
While each individual assembly language instruction is easy to understand, it is hard to tell what the intent of the programmer was who wrote it. In fact, the assembly language of a program is so hard to understand that companies do not worry about people dissassembling (getting the assembly language of) their programs.

As a result of these drawbacks, high-level languages like Pascal, C, and C++ are used for most projects instead. They allow programmers to express their ideas more directly instead of having to worry about telling the processor what to do every step of the way. They’re called high-level because the ideas the programmer can express in the same amount code are more complicated.

Programmers writing code in compiled high level languages use a program called a compiler to transform their code into assembly language. Compilers are much harder to write than assemblers are. Also, high-level languages do not always allow programmers to use all the features of the processor. This is because high-level languages are designed to support all processor families. Unlike assembly languages, that only support one type of processor, high-level languages are portable.

Even though compilers are more complicated than assemblers, decades of making and researching compilers has made them very good. Now, there is not much reason to use assembly language anymore for most projects, because compilers can usually figure out how to express programs in assembly language as well or better than programmers.

Example programs[change | change source]

A Hello, world! program written in x86 assembly:

adosseg
.model small
.stack 100h

.data
hello_message db 'Hello, World!',0dh,0ah,'$'

.code
main  proc
      mov    ax,@data
      mov    ds,ax

      mov    ah,9
      mov    dx,offset hello_message
      int    21h

      mov    ax,4C00h
      int    21h
main  endp
end   main.

A function that prints a number to the screen using BIOS interrupts written in NASM x86 assembly. Modular code is possible to write in assembly, but it takes extra effort. Note that anything that comes after a semicolon on a line is a comment and is ignored by the assembler. Putting comments in assembly language code is very important because large assembly language programs are so hard to understand.

; void printn(int number, int base);

printn:
	push	bp
	mov	bp, sp
	push	ax
	push 	bx
	push	cx
	push	dx
	push	si

	mov	si, 0
	mov	ax, [bp + 4]	; number
	mov	cx, [bp + 6]	; base

gloop:	inc	si		; length of string
	mov	dx, 0		; zero dx
	div	cx		; divide by base
	cmp	dx, 10		; is it ge 10?
	jge	num
	add	dx, '0'		; add zero to dx
	jmp	anum
num:	add	dx, ('A'- 10)	; hex value, add 'A' to dx - 10.
anum:	push	dx		; put dx onto stack.
	cmp	ax, 0		; should we continue?
	jne	gloop

	mov	bx, 7h		; for interrupt
tloop:	pop	ax		; get its value
	mov	ah, 0eh		; for interrupt
	int	10h		; write character
	dec	si		; get rid of character
	jnz	tloop
	
	pop	si	
	pop	dx
	pop	cx
	pop	bx
	pop	ax
	pop	bp
	ret	4

Books[change | change source]

Michael Singer, PDP-11. Assembler Language Programming and Machine Organization, John Wiley & Sons, NY: 1980.
Peter Norton, John Socha, Peter Norton’s Assembly Language Book for the IBM PC, Brady Books, NY: 1986.
Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers, 1999. ISBN 1-55860-410-3
John Waldron: Introduction to RISC Assembly Language Programming. Addison Wesley, 1998. ISBN 0-201-39828-1
Jeff Duntemann: Assembly Language Step-by-Step. Wiley, 2000. ISBN 0-471-37523-3
Paul Carter: PC Assembly Language. Free ebook, 2001.
Website
Robert Britton: MIPS Assembly Language Programming. Prentice Hall, 2003. ISBN 0-13-142044-5
Randall Hyde: The Art of Assembly Language. No Starch Press, 2003. ISBN 1-886411-97-2
Draft versions available online Archived 2011-01-28 at the Wayback Machine as PDF and HTML
Jonathan Bartlett: Programming from the Ground Up. Bartlett Publishing, 2004. ISBN 0-9752838-4-7
Available online as PDF and as HTML
ASM Community Book «An online book full of helpful ASM info, tutorials and code examples» by the ASM Community

Software[change | change source]

MenuetOS — Operating System written entirely in 64-bit assembly language
SB-Assembler for most 8-bit processors/controllers
GNU lightning, a library that generates assembly language code at run-time which is useful for Just-In-Time compilers
WinAsm Studio, The Assembly IDE — Free Downloads, Source Code , a free Assembly IDE, a lot of open source programs to download and a popular Board Archived 2008-08-05 at the Wayback Machine
The Netwide Assembler
GoAsm — a free component «Go» tools: support 32-bit & 64-bit Windows programming

Other websites[change | change source]

http://www.atariarchives.org/mlb/introduction.php
http://www.swansontec.com/sprogram.htm
The ASM Community, a great ASM programming resource including a Messageboard and an ASM Book Archived 2009-10-29 at the Wayback Machine Archived 2009-10-29 at the Wayback Machine
Intel Assembly 80×86 CodeTable (a cheat sheet reference)
Unix Assembly Language Programming
PPR: Learning Assembly Language
Assembly Language Programming Examples
Typed Assembly Language (TAL)
Authoring Windows Applications In Assembly Language
Information on Linux assembly programming
Terse: Algebraic Assembly Language for x86
Iczelion’s Win32 Assembly Tutorial Archived 2008-03-08 at the Wayback Machine
IBM z/Architecture Principles of Operation IBM manuals on mainframe machine language and internals.
IBM High Level Assembler IBM manuals on mainframe assembler language.
Assembly Optimization Tips by Mark Larson
Mainframe Assembler Forum
NASM Manual
Experiment with Intel x86/x64 operating modes with assembly Archived 2008-09-20 at the Wayback Machine
Build yourself an assembler (eniAsm project) Archived 2008-09-18 at the Wayback Machine and various assembly articles and tutorials
Encoding Intel x86/IA-32 Assembler Instructions Archived 2008-07-24 at the Wayback Machine

Источник

Assembly language syntax[edit]

IBM System/360[edit]

Terminology[edit]

Key concepts[edit]

Assembler[edit]

Number of passes[edit]

High-level assemblers[edit]

Assembly language[edit]

Language design[edit]

Basic elements[edit]

Opcode mnemonics and extended mnemonics[edit]

Data directives[edit]

Assembly directives[edit]

Macros[edit]

Support for structured programming[edit]

Use of assembly language[edit]

Historical perspective[edit]

Current usage[edit]

Typical applications[edit]

See also[edit]

Notes[edit]

References[edit]

Further reading[edit]

External links[edit]

Маркеры

Why Assembly Language?

If you want to know how computers work and avoid a few programming pitfalls, it’s helpful to learn the basics. So, let’s take a quick look at assembly language and a few of the benefits of understanding it.

What Is Assembly Language? A Look at Machine-Specific Programming

Is Assembly Commonly Used by Developers?

Assembly Language Is One of Multiple Types of Computer Languages

1. Machine Language

2. Assembly Languages

3. Middle-Level Languages (i.e., Common Intermediate Language)

4. High-level Languages

Assembly Language Short Code Consists of Opcode and Operands

A Breakdown of the Two Types of Assemblers

Pros and Cons of Learning and Using Assembly Languages

Final Words on Assembly Language

Development of assembly language[change | change source]

Programming in machine code[change | change source]

Using assembly language instead[change | change source]

Disassembly and debugging[change | change source]

Computer organization[change | change source]

Main memory[change | change source]

The processor[change | change source]

Instructions that read or write memory[change | change source]

Instructions that perform mathematical or logical operations[change | change source]

Instructions that decide what the next instruction is going to be[change | change source]

Input and output[change | change source]

Assembly languages and portability[change | change source]

Assembly language and high-level languages[change | change source]

Example programs[change | change source]

Books[change | change source]

Software[change | change source]

Other websites[change | change source]