2023年7月6日发(作者:)
使⽤cat读取和echo写内核⽂件节点的⼀些问题平台:busybox-1.24.2Linux-4.10.17Qemu+vexpress-ca9概述:在写驱动的时候,我们经常会向⽤户空间导出⼀些⽂件,然后⽤户空间使⽤cat命令去读取该节点,从⽽完成kernel跟user的通信。但是有时会发现,如果节点对应的read回调函数写的有问题的话,使⽤cat命令后,节点对应的read函数会被频繁调⽤,log直接刷屏,⽽我们只希望read被调⽤⼀次,echo也是⼀样的道理。背后的原因是什么呢?如何解决呢?下⾯我们以debugfs下的节点读写为例说明⼀下。正⽂:⼀、read和write的介绍:1、系统调⽤readssize_t read(int fd, void *buf, size_t count);这个函数会从fd表⽰的⽂件描述符中读取count个字节到buf缓冲区当中,返回值有下⾯⼏种:如果返回值⼤于0,表⽰实际读到的字节数,返回0的话,表⽰读到了⽂件结尾,同时⽂件的file position也会被更新。实际读到的字节数可能会⽐count⼩。如果返回-1,表⽰读取失败,errno会被设置为相应的值。2、系统调⽤writessize_t write(int fd, const void *buf, size_t count);这个函数将以buf为⾸地址的缓冲区当中的count个字节写到⽂件描述符fd表⽰的⽂件当中,返回值:返回正整数,表⽰实际写⼊的字节数,返回0表⽰没有任何东西被写⼊,同时⽂件位置指针也会被更新返回-1,表⽰写失败,同时errno会被设置为相应的值。3、LDD3上对驱动中实现的read回调函数的解释原型:ssize_t (*read) (struct file *fp, char __user *user_buf, size_t count, loff_t *ppos);fp:被打开的节点的⽂件描述符;user_buf:表⽰的是⽤户空间的⼀段缓冲区的⾸地址,从kernel读取的数据需要存放该缓冲区当中count:表⽰⽤户期望读取的字节数;*ppos:表⽰当前当前⽂件位置指针的⼤⼩,这个值会需要驱动程序⾃⼰来更新,初始⼤⼩是0。如果返回值等于传递给read系统调⽤的count参数,则说明所请求的字节数传输成功完成。这是最理想的情况。如果返回值是正的,但是⽐count⼩,则说明只有部分数据传输成功。这种情况下因设备的不同可能有许多原因。⼤部分情况下,程序会再次读数据。例如,如果⽤fread函数读数据,这个库函数就会不断调⽤系统调⽤,直⾄所请求的数据传输完毕为⽌。如果返回值为0,则表⽰已经达到了⽂件尾。负值意味着发⽣了错误,该值指明了发⽣了什么错误,错误码在中定义。⽐如这样的⼀些错误:-EINTR(系统调⽤被中断)或者-EFAULT(⽆效地址)。4、LDD3上对驱动中实现的write回调函数的解释原型:ssize_t (*write) (struct file *fp, const char __user *user_buf, size_t count, loff_t *ppos);fp:被打开的要写的内核节点的⽂件描述符;user_buf:表⽰的是⽤户空间的⼀段缓冲区的⾸地址,其中存放的是⽤户需要传递给kernel的数据;count:⽤户期望写给kernel的字节数;*ppos:⽂件位置指针,需要驱动程序⾃⼰更新。如果返回值等于count,则完成了所请求数⽬的字节传输。如果返回值为正的,但⼩于count,则这传输了部分数据。程序很可能再次试图写⼊余下的数据。如果返回值为0,意味着什么也没有写⼊。这个结果不是错误,⽽且也没有理由返回⼀个错误码。再次重申,标准库会重复调⽤write。负值意味着发⽣了错误,与read相同,有效的错误码定义在中。上⾯加粗的红⾊字体引起驱动中的write或者read被反复调⽤的原因。⼆、简略的分析⼀下read和write系统调⽤的实现在⽤户空间调⽤read函数后,内核函数vfs_read会被调⽤:ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos){ ssize_t ret; if (!(file->f_mode & FMODE_READ)) return -EBADF; if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT; ret = rw_verify_area(READ, file, pos, count); if (!ret) { if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; ret = __vfs_read(file, buf, count, pos); if (ret > 0) { fsnotify_access(file); add_rchar(current, ret); } inc_syscr(current); } return ret;}下⾯是需要关注的:第9⾏检查⽤户空间的buf缓冲区是否可以写⼊。第14⾏检查count的⼤⼩,这⾥MAX_RW_COUNT被设置为1个页的⼤⼩,这⾥的值是4KB,也就是⼀次⽤户⼀次read最多获得4KB数据。第16⾏调⽤__vfs_read,这个函数最终会调⽤到我们的驱动中的read函数,可以看到这个函数的参数跟驱动中的read函数⼀样,驱动中read返回的数字ret会返回给⽤户,这⾥并没有看到更新pos,所以需要在我们的驱动中⾃⼰去更新。⽤户空间调⽤write函数后,内核函数vfs_write会被调⽤:ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos){ ssize_t ret; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; if (unlikely(!access_ok(VERIFY_READ, buf, count))) return -EFAULT; ret = rw_verify_area(WRITE, file, pos, count); if (!ret) { if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; file_start_write(file); ret = __vfs_write(file, buf, count, pos); if (ret > 0) { fsnotify_modify(file); add_wchar(current, ret); } inc_syscw(current); file_end_write(file); } return ret;}这⾥需要关注:第9⾏,检查⽤户空间的缓冲区buf是否可以读。第15⾏,限制⼀次写⼊的数据最多为1页,⽐如4KB。第17⾏的_vfs_write的参数跟驱动中的write的参数⼀样,__vfs_write的返回值ret也就是⽤户调⽤write时的返回值,表⽰实际写⼊的字节数,这⾥也没有看到更新pos的代码,所以需要我们⾃⼰在驱动的write中实现。三、简略分析cat和echo的实现由于使⽤的根⽂件系统使⽤busybox做的,所以cat和echo的实现在busybox的源码中,如下:coreutils/tils/:下⾯简略分析cat的实现,cat的默认实现采⽤了sendfile,采⽤sendfile可以减少不必要的内存拷贝,从⽽提⾼读写效率,这就是所谓的Linux的“零拷贝”。为了便于代码分析,可以关闭这个功能,然后cat就会调⽤read和write实现了:Busybox Settings ---> General Configuration ---> [ ] Use sendfile system call下⾯是cat的核⼼函数:以cat xxx为例其中src_fd就是被打开的内核节点的⽂件描述符,dst_fd就是标准输出描述符,size是0。static off_t bb_full_fd_action(int src_fd, int dst_fd, off_t size){ int status = -1; off_t total = 0; bool continue_on_write_error = 0; ssize_t sendfile_sz; char buffer[4 * 1024]; // ⽤户空间缓冲区,4KB⼤⼩ enum { buffer_size = sizeof(buffer) }; // 每次read期望获得的字节数 sendfile_sz = 0; if (!size) { size = (16 * 1024 *1024); // 刚开始,如传⼊的size是0,这⾥将size设置为16MB status = 1; /* 表⽰⼀直读到⽂件结尾,也就是直到read返回0 */ } while (1) { ssize_t rd;
rd = safe_read(src_fd, buffer, buffer_size); // 这⾥调⽤的就是read, 读取4KB,rd是实际读到的字节数 if (rd < 0) { bb_perror_msg(bb_msg_read_error); break; } read_ok: if (!rd) { /* 表⽰读到了⽂件结尾,那么结束循环 */ status = 0; break; } /* 将读到的内容输出到dst_fd表⽰的⽂件描述符 */ if (dst_fd >= 0 && !sendfile_sz) { ssize_t wr = full_write(dst_fd, buffer, rd); if (wr < rd) { if (!continue_on_write_error) { bb_perror_msg(bb_msg_write_error); break; } dst_fd = -1; } }
total += rd; // total记录的是读到的字节数的累计值 if (status < 0) { /* 如果传⼊的size不为0,那么status为-1,直到读到size个字节后,才会退出。如果size为0,这个条件不会满⾜ */ size -= rd; if (!size) { /* 'size' bytes copied - all done */ status = 0; break; } } } out: return status ? -1 : total; // 当读完毕,status为0,这⾥返回累计读到的字节数}从上⾯的分析我们知道如下信息:使⽤cat xxx时,上⾯的函数传⼊的size为0,那么上⾯的while循环会⼀直进⾏read,直到出错或者read返回0,read返回0也就是读到⽂件结尾。最后如果出错,那么返回-1,否则的话,返回读到的累计的字节数。到这⾥,应该就是知道为什么驱动中的read会被频繁调⽤了吧,也就是驱动中的read的返回值有问题。ECHO:echo的核⼼函数是full_write;这⾥fd是要写的内核节点,buf缓冲区中存放的是要写⼊的内容,len是buf缓冲区中存放的字节数。ssize_t FAST_FUNC full_write(int fd, const void *buf, size_t len){ ssize_t cc; ssize_t total; total = 0; while (len) { cc = safe_write(fd, buf, len); if (cc < 0) { if (total) { /* we already wrote some! */ /* user can do another write to know the error code */ return total; } return cc; /* write() returns -1 on failure. */ } total += cc; buf = ((const char *)buf) + cc; len -= cc; } return total;}上⾯的函数很简单,可以得到如下信息:如果write的函数返回值cc⼩于len的话,会⼀直调⽤write,直到报错或者len个字节全部写完。⽽这⾥的cc对应的就是我们的驱动中write的返回值。最后,返回实际写⼊的字节数或者⼀个错误码。到这⾥,应该也已经清除为什么调⽤⼀次echo后,驱动的write为什么会被频繁调⽤了吧,还是驱动中write的返回值的问题。知道的上⾯的原因,下⾯我们结合⼀个简单的驱动看看。四、实例分析1、先看两个刷屏的例⼦这个驱动在/sys/kernel/debug⽣成⼀个demo节点,⽀持读和写。#include #include #include #include #include static struct dentry *demo_dir;static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int ret, wrinten; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); ret = copy_to_user(user_buf, kbuf, wrinten+1); if (ret != 0) { printk(KERN_ERR "read error"); return -EIO; } *ppos += wrinten; *ppos += wrinten; return wrinten;}static ssize_t demo_write (struct file *fp, const char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10] = {0}; int ret; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); ret = copy_from_user(kbuf, user_buf, count); if (ret) { pr_err("%s: write errorn", __func__); return -EIO; } *ppos += count; return 0;}static const struct file_operations demo_fops = { .read = demo_read, .write = demo_write,};static int __init debugfs_demo_init(void){ int ret = 0; demo_dir = debugfs_create_file("demo", 0444, NULL, NULL, &demo_fops); return ret;}static void __exit debugfs_demo_exit(void){ if (demo_dir) debugfs_remove(demo_dir);}module_init(debugfs_demo_init);module_exit(debugfs_demo_exit);MODULE_LICENSE("GPL");我们先来看看运⾏结果:先试试写:[root@vexpress mnt]# echo 1 > /d/demo执⾏这个命令并不会返回,会卡主,再看看kernel log,已经刷屏:[ 1021.547015] user_buf: 00202268, count: 2, ppos: 0[ 1021.547181] user_buf: 00202268, count: 2, ppos: 2[ 1021.547319] user_buf: 00202268, count: 2, ppos: 4[ 1021.547466] user_buf: 00202268, count: 2, ppos: 6.... ....[ 1022.008736] user_buf: 00202268, count: 2, ppos: 6014[ 1022.008880] user_buf: 00202268, count: 2, ppos: 6016[ 1022.009012] user_buf: 00202268, count: 2, ppos: ...再试试读:[root@vexpress mnt]# cat / ...可以看到,终端被Hello填满了,再看看kernel log,刷屏了:[ 1832.074616] user_buf: becb6be8, count: 4096, ppos: 0[ 1832.075033] user_buf: becb6be8, count: 4096, ppos: 5[ 1832.075240] user_buf: becb6be8, count: 4096, ppos: 10[ 1832.075898] user_buf: becb6be8, count: 4096, ppos: 15[ 1832.076093] user_buf: becb6be8, count: 4096, ppos: 20[ 1832.076282] user_buf: becb6be8, count: 4096, ppos: 25[ 1832.076468] user_buf: becb6be8, count: 4096, ppos: 30[ 1832.076653] user_buf: becb6be8, count: 4096, ppos: 35[ 1832.076841] user_buf: becb6be8, count: 4096, ppos: 40可以看到规律,对于write,每次的count都是2,因为写下来的是个字符串的"1",ppos以2为台阶递增。此外,可以看到user_buf每次都相同,结合echo源码可以发现,⽤户的user_buf是在堆上分配的,所以地址⽐较⼩。对于read,每次要读的count都是4KB,ppos是以5为台阶递增,正好是strlen("Hello"),user_buf的值每次都相同,结合cat源码可以发现,⽤户的user_buf是在栈上分配的,所以地址⽐较⼤下图是x86系统下Linux进程的进程地址空间的内存布局,这是只是说明⼀下意思。下⾯开始分别针对write和read进⾏修改:2、对write进⾏修改write版本2:既然经过前⾯的分析,知道write被频繁调⽤的原因是⽤户调⽤write实际写⼊的字节数⼩于期望的,⽽⽤户的write的返回值来⾃驱动的write,那么我们直接让write返回count不就可以了吗。static ssize_t demo_write (struct file *fp, const char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10] = {0}; int ret; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); ret = copy_from_user(kbuf, user_buf, count); if (ret) { pr_err("%s: write errorn", __func__); return -EIO; } *ppos += count; return count;}验证:[root@vexpress mnt]# echo 1 > /d/demo敲完回车后,⽴马就返回了,kernel log也只打印了⼀次:[ 2444.363351] user_buf: 00202408, count: 2, ppos: 0write版本3:其实,kernel提供了⼀个很⽅便的函数,simple_write_to_buffer,这个函数专门完成从user空间向kernel空间拷贝数据:static ssize_t demo_write (struct file *fp, const char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10] = {0}; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); return simple_write_to_buffer(kbuf, sizeof(kbuf), ppos, user_buf, count);}验证:[root@vexpress mnt]# echo 1 > /d/demo敲完回车后,⽴马就返回了,kernel log也只打印了⼀次:[ 2739.984844] user_buf: 00202340, count: 2, ppos: 0简单看看simple_write_to_buffer的实现:/** * simple_write_to_buffer - copy data from user space to the buffer * @to: the buffer to write to * @available: the size of the buffer * @ppos: the current position in the buffer * @from: the user space buffer to read from * @count: the maximum number of bytes to read * * The simple_write_to_buffer() function reads up to @count bytes from the user * space address starting at @from into the buffer @to at offset @ppos. * * On success, the number of bytes written is returned and the offset @ppos is * advanced by this number, or negative value is returned on error. **/ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos, const void __user *from, size_t count){ loff_t pos = *ppos; size_t res; if (pos < 0) return -EINVAL; if (pos >= available || !count) return 0; if (count > available - pos) count = available - pos; res = copy_from_user(to + pos, from, count); if (res == count) return -EFAULT; count -= res; *ppos = pos + count; return count;}EXPORT_SYMBOL(simple_write_to_buffer);可以看到,最后返回的是count,如果copy_from_user没都拷贝全,将来write还是会被再次调⽤。3、对read进⾏修改我们知道read被反复调⽤的原因是,read返回的值⼩于⽤户期望读取的值,对于这⾥,就是4KB。⽽对于cat来说,每次read都期望获取4KB的数据,⽽且在不考虑出错的情况下,只有read返回0,cat才会终⽌。read版本2:static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int ret, wrinten; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); ret = copy_to_user(user_buf, kbuf, wrinten+1); if (ret != 0) { printk(KERN_ERR "read error"); return -EIO; } *ppos += wrinten; return 0;}验证:[root@vexpress mnt]# cat /d/demo执⾏回车后,"Hello"却没有输出,但是驱动的read驱动被调⽤了⼀次:[ 118.837456] user_buf: beeb0be8, count: 4096, ppos: 0这是什么原因呢?可以看看cat的核⼼函数bb_full_fd_action,其中,如果read返回0,并不会将读到的内容输出到标准输出上,所以cat的时候什么都没看到。既然返回0不⾏,那么返回count,也就是⽤户期望的4KB,⾏不⾏呢?read版本3:static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int ret, wrinten; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); ret = copy_to_user(user_buf, kbuf, wrinten+1); if (ret != 0) { printk(KERN_ERR "read error"); return -EIO; } *ppos += wrinten; return count;}验证:[root@vexpress mnt]# cat /d/demoȸT�/mnt/busybox�u0�@ξ���вu����ξl����@����可以看到,输出内容中有⼀些乱七⼋糟的东西。再看看kernel log,依然刷屏:[ 339.079698] user_buf: bece4be8, count: 4096, ppos: 0[ 339.080124] user_buf: bece4be8, count: 4096, ppos: 5[ 339.085525] user_buf: bece4be8, count: 4096, ppos: 10[ 339.085886] user_buf: bece4be8, count: 4096, ppos: 15[ 339.087018] user_buf: bece4be8, count: 4096, ppos: 20[ 339.098798] user_buf: bece4be8, count: 4096, ppos: 25... ...什么原因呢?我们知道,如果驱动的read返回4KB,表⽰⽤户读到了4KB的数据,但是实际上⽤户的buffer中只有前5个字节是从kernel读到的,其他的都是⽤户的buffer缓冲区中的垃圾数据,由于read返回的⼀直都是4KB,所以会⼀直read,直到返回0,所以刷屏了。其实,kernel提供了清除⽤户buffer的函数:clear_user,这样就不会输出乱码了,但是还是会刷屏。static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int ret, wrinten; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); if (clear_user(user_buf, count)) { printk(KERN_ERR "clear errorn"); return -EIO; } ret = copy_to_user(user_buf, kbuf, wrinten+1); if (ret != 0) { printk(KERN_ERR "read errorn"); return -EIO; } *ppos += wrinten; return count;}上⾯的这种改动只是不会输出乱码了,但是还是会刷屏。read版本4:那该怎么办呢?我们试试kernel提供的simple_read_from_buffer看看⾏不⾏,这个函数专门完成从kernel空间向user空间拷贝数据:static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int wrinten; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); if (clear_user(user_buf, count)) { printk(KERN_ERR "clear errorn"); return -EIO; } return simple_read_from_buffer(user_buf, count, ppos, kbuf, wrinten);}验证:[root@vexpress mnt]# cat /d/demoHello可以看到,cat没有刷屏,确实输出了我们想要的结果,那kernel log呢?[ 479.457637] user_buf: bec61be8, count: 4096, ppos: 0[ 479.458268] user_buf: bec61be8, count: 4096, ppos: 5还不错,驱动的write被调⽤了两次,为什么呢? 我们结合simple_read_from_buffer的实现来看看:/** * simple_read_from_buffer - copy data from the buffer to user space * @to: the user space buffer to read to * @count: the maximum number of bytes to read * @ppos: the current position in the buffer * @from: the buffer to read from * @available: the size of the buffer * * The simple_read_from_buffer() function reads up to @count bytes from the * buffer @from at offset @ppos into the user space address starting at @to. * * On success, the number of bytes read is returned and the offset @ppos is * advanced by this number, or negative value is returned on error. **/ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos, const void *from, size_t available){ loff_t pos = *ppos; size_t ret; if (pos < 0) return -EINVAL; if (pos >= available || !count) return 0; if (count > available - pos) count = available - pos; ret = copy_to_user(to, from + pos, count); if (ret == count) return -EFAULT; count -= ret; *ppos = pos + count; return count;}EXPORT_SYMBOL(simple_read_from_buffer);第⼀次read是ppos是0,读完毕之后,ppos变成了5。我们知道,cat不⽢⼼,因为没有返回0,所以紧接着⼜调⽤了⼀次read,这次的ppos为5,上⾯的第23⾏代码⽣效了,available是5,所以直接返回了0,然后cat就乖乖的退出了。read版本5:因为我们想实现cat的时候,驱动的read只调⽤⼀次,同时还要保证cat能输出读到的内容,我们可以做如下修改:static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int wrinten; if (*ppos) return 0; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); if (clear_user(user_buf, count)) { printk(KERN_ERR "clear errorn"); return -EIO; } return simple_read_from_buffer(user_buf, count, ppos, kbuf, wrinten);}在第6⾏,先判断*ppos的值,我们知道第⼀次调⽤驱动read时,*ppos是0,读完毕后,*ppos会被更新,第⼆次*ppos便不为0.验证:[root@vexpress mnt]# cat /d/demoHello⽤户空间没有刷屏,达到了[ 1217.948729] user_buf: beb88be8, count: 4096, ppos: 0我们的⽬的,kernel的log也只有⼀⾏:也就是驱动的read确实被调⽤了⼀次。其实我们知道,驱动的read还是被调⽤了两次,只不多第⼆次没有什么⼲什么活,直接就返回了,不会影响我们的驱动逻辑。也可以不⽤内核提供的接⼝:read版本6:static ssize_t demo_read(struct file *fp, char __user *user_buf, size_t count, loff_t *ppos){ char kbuf[10]; int ret, wrinten; if (*ppos) return 0; printk(KERN_INFO "user_buf: %p, count: %d, ppos: %lldn", user_buf, count, *ppos); wrinten = snprintf(kbuf, 10, "%s", "Hello"); if (clear_user(user_buf, count)) { printk(KERN_ERR "clear errorn"); return -EIO; } ret = copy_to_user(user_buf, kbuf, wrinten); if (ret != 0) { printk(KERN_ERR "copy errorn"); return -EIO; } *ppos += wrinten; return wrinten;}效果跟前⼀个⼀样。完。
发布者:admin,转转请注明出处:http://www.yc00.com/news/1688592467a153132.html
评论列表(0条)