分类: C/C++
1.比特序 / 位序 / bit numbering / bit endianness
我们知道一个字节有8位,也就是8个比特位。从第0位到第7位共8位。 比特序就是用来描述比特位在字节中的存放顺序的。通过阅读网页 http://en.wikipedia/wiki/Bit_numbering的内容,关于比特序我们得到下面的结论: (1)比特序分为两种: LSB 0 位序和 MSB 0 位序。 LSB是指 least significant bit,MSB是指 most significant bit。 LSB 0 位序是指:字节的第0位存放数据的 least significant bit,即我们的数据的最低位存放在字节的第0位。 MSB 0 位序是指:字节的第0位存放数据的most significant bit,即我们的数据的最高位存放在字节的第0位。
所以说对于代码:char *ch = 0x96; // 0x96 = 1001 0110
指针ch到底指向哪里呢?不难知道,如果是 LSB 0 位序则显然指针ch指向 最右边的也是 最低位的0. 而如果是 MSB 0 位序则显然指针ch指向 最左边的也是 最高位的1. LSB 0: A container for 8-bit binary number with the highlighted least significant bit assigned the bit number 0
MSB 0:A container for 8-bit binary number with the highlighted most significant bit assigned the bit number 0
(2) 小端CPU通常采用的是LSB 0 位序 ,但是大端CPU却有可能采用 LSB 0 位序 也有可能采用的是 MSB 0 位序 (Little-endian CPUs usually employ "LSB 0" bit numbering, however both bit numbering conventions can be seen in big-endianmachines. ) (3) 推荐的标准是 MSB 0 位序。 (The recommended style for Request for Comments documents is "MSB 0" bit numbering.) (4) Bit numbering is usually transparent to the software.
2.大小端和字节序 http://en.wikipedia/wiki/Endianess In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory (or, sometimes, as sent on a serial connection). Each sub-component in the representation has a unique degree of significance, like the place value of digits in a decimal number. These sub-components are typically 16- or 32-bit words, 8-bit bytes, or even bits. Endianness is a difference in data representation at the hardware level and may or may not be transparent at higher levels, depending on factors such as the type of high level language used. 计算机中,术语“端”是指:在内存中的一个较大的数据,它是由各个可以被单独寻址的 部分 组成,这些组成部分在该数据中是以怎样的顺序存放的呢?而这个问题涉及到“端”的概念,CPU是大端还是小端决定了这些组成部分的存放顺序。 这些组成部分可能是 16或32位的字、8位的字节、甚至是比特位。 The most common cases refer to how bytes are ordered within a single 16-, 32-, or 64-bit word。 我们通常碰到的情况是:字节是以怎样的顺序存放在一个16、32、64位的数据中。 (当我们要存取一个16、32、64位数据的某一组成部分,也就是某一个或几个字节时,就要特别注意机器的“大小端”) A big-endian machine stores the most significant byte first, and a little-endian machine stores the least significant byte first.
Endian | First Byte (lowest address) | Middle Bytes | Last Byte (highest address) | Summary |
---|---|---|---|---|
big | most significant | ... | least significant | Similar to a number written on paper (in Arabic numerals) |
little | least significant | ... | most significant | Arithmetic calculation order (see carry propagation) |
increasing addresses → | |||||
0Ah | 0Bh | 0Ch | 0Dh |
The most significant byte (MSB) value, which is 0Ah in our example, is stored at the memory location with the lowest address, the next byte value in significance, 0Bh, is stored at the following memory location and so on. This is akin to Left-to-Right reading in hexadecimal order.
Atomic element size 16-bitincreasing addresses → | |||||
0A0Bh | 0C0Dh |
The most significant atomic element stores now the value 0A0Bh, followed by 0C0Dh.
Little-endian Atomic element size 8-bit, address increment 1-byte (octet)increasing addresses → | |||||
0Dh | 0Ch | 0Bh | 0Ah |
The least significant byte (LSB) value, 0Dh, is at the lowest address. The other bytes follow in increasing order of significance.
Atomic element size 16-bitincreasing addresses → | |||||
0C0Dh | 0A0Bh |
The least significant 16-bit unit stores the value 0C0Dh, immediately followed by 0A0Bh. Note that 0C0Dh and 0A0Bh represent integers, not bit layouts (see bit numbering).
很显然“小端”机器符合“高高低低”的原则。及高位字节或字存放在高地址,低位字节或字存放在低地址。 另外“小端”机器中,数据在CPU的寄存器和内存中的存放顺序是一致的。Byte addresses increasing from right to left 在我们写: 0xFF86 时,很明显地址是从右向左递增的。也就是低位写在右边,高位写在左边。 但是当我们写字符串时:char *str = "Hello world!",却是低位的字符写在左边,高位的字符写在了右边。 With 8-bit atomic elements:
← increasing addresses | |||||
0Ah | 0Bh | 0Ch | 0Dh |
The least significant byte (LSB) value, 0Dh, is at the lowest address. The other bytes follow in increasing order of significance.(这个明显符合我们的习惯)
With 16-bit atomic elements:
← increasing addresses | |||||
0A0Bh | 0C0Dh |
The least significant 16-bit unit stores the value 0C0Dh, immediately followed by 0A0Bh.
The display of text is reversed from the normal display of languages such as English that read from left to right. For example, the word "XRAY" displayed in this manner, with each character stored in an 8-bit atomic element:
← increasing addresses | |||||
"Y" | "A" | "R" | "X" |
If pairs of characters are stored in 16-bit atomic elements (using 8 bits per character), it could look even stranger:
← increasing addresses | |||
"AY" | "XR" |
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- int main()
- {
- char a[] = {'a', 'b', 'c'};
- char b[] = {'d', 'e', 'f'};
-
- a[3] = 0;
- printf("strlen(a)=%d, strlen(b)=%d\n", strlen(a), strlen(b));
- printf("a=%s, b=%s\n", a, b);
- printf("sizeof(a)=%d, sizeof(b)=%d\n", sizeof(a), sizeof(b));
- return 0;
- }
a=abc, b=defabc
sizeof(a)=3, sizeof(b)=3 分析: 字符数组a和b都分配在栈上,先分配a, 而a中的字符是如何分配的呢?显然因为“写字符串时,低位的字符写在左边,高位的字符写在了右边”。'a'是最低位,'b'在中间,而'c'在最高位。而栈是从高地址从低地址扩展的。假如是小端CPU的话,按照“高高低低”的原则,高位的'c'应该最先分配,接着是'b',最后是'a'。 分配玩字符数组a之后,在分配字符数组b,同样的道理,高位的'f'应该最先分配,接着是'e',最后是'd'。 再执行a[3] = 0;显然a[3]的地址应该比'c'字符的地址要高。所以该语句执行玩之后的栈的情况如下: 高地址 <<---- 低地址 \0 c b a f e d 所以:a字符串打印的结果是:abc,而b字符串打印的结果是:defabc. strlen函数是计算字符串的长度,当然要找到最后的结束字符'\0',才停止计算。所以字符串a的长度是3,而字符串b的长度是6. sizeof并不根据末尾的结束字符来计算大小。 例子2:
- #include <stdio.h>
-
- int main()
- {
- unsigned long array[] = {0x12345678, 0xabcdef01, 0x456789ab};
- unsigned short ret;
-
- ret = *((unsigned short *)((unsigned long)array+7));
- printf("0x%x\n", ret);
-
- return 0;
- }
- #include <stdio.h>
- #include <stdlib.h>
- int main(void){
- int a[5]={1,2,3,4,5};
- int *ptr =(int *)(&a+1);
- printf("%d,%d\n",*(a+1),*(ptr-1))
- return 0;
-
- }
判断CPU是大端还是小端的方法有有多种:
- #include <stdio.h>
- #include <assert.h>
-
- int main()
- {
- unsigned short x = 0xff01;
-
- assert(sizeof(x) >= 2);
- if(*(char*)&x == 1) //if(char(x) == 1)
- printf("little-endian\n");
- else if((char)x > 1)
- printf("big-endian\n");
- else
- printf("unknown\n");
- return 0;
- }
- #include <stdio.h>
-
- int main()
- {
- union{
- char c;
- int i;
- }u;
- u.i = 0x0201;
-
- if(u.c == 1)
- printf("little-endian\n");
- else if(u.c == 2)
- printf("big-endian\n");
- else
- printf("unknown\n");
-
- return 0;
- }
- #include <stdio.h>
- union u{
- struct {
- char i:1;
- char j:2;
- char m:3;
- } s;
- char c;
- }r;
- int main()
- {
- r.s.i = 1; // 1
- r.s.j = 2; // 10
- r.s.m = 3; // 011
- printf("0x%x\n", r.c);
- return 0;
- }
- #include <stdio.h>
-
- union {
- struct
- {
- unsigned char a1:2;
- unsigned char a2:3;
- unsigned char a3:3;
- }x;
- unsigned char b;
- }d;
-
- int main(int argc, char* argv[])
- {
- d.b = 100; //100 == 0110 0100
-
- printf("0x%x\n0x%x\n0x%x\n", d.x.a1, d.x.a2, d.x.a3);
- return 0;
- }
发布者:admin,转转请注明出处:http://www.yc00.com/web/1754939964a5217969.html
评论列表(0条)