Incomplete output from printf() called on device(在设备上调用 printf() 的不完整输出)
问题描述
为了测试设备上的 printf() 调用,我编写了一个简单的程序,它将一个中等大小的数组复制到设备并将设备数组的值打印到屏幕上.尽管数组已正确复制到设备,但 printf() 函数无法正常工作,从而丢失了前几百个数字.代码中的数组大小为 4096.这是一个错误还是我没有正确使用这个函数?非常感谢.
For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.
我的 gpu 是 GeForce GTX 550i,计算能力为 2.1
My gpu is GeForce GTX 550i, with compute capability 2.1
我的代码:
#include<stdio.h>
#include<stdlib.h>
#define N 4096
__global__ void Printcell(float *d_Array , int n){
int k = 0;
printf("
=========== data of d_Array on device==============
");
for( k = 0; k < n; k++ ){
printf("%f ", d_Array[k]);
if((k+1)%6 == 0) printf("
");
}
printf("
Totally %d elements has been printed", k);
}
int main(){
int i =0;
float Array[N] = {0}, rArray[N] = {0};
float *d_Array;
for(i=0;i<N;i++)
Array[i] = i;
cudaMalloc((void**)&d_Array, N*sizeof(float));
cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
cudaDeviceSynchronize();
Printcell<<<1,1>>>(d_Array, N); //Print the device array by a kernel
cudaDeviceSynchronize();
/* Copy the device array back to host to see if it was correctly copied */
cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);
printf("
");
for(i=0;i<N;i++){
printf("%f ", rArray[i]);
if((i+1)%6 == 0) printf("
");
}
}
推荐答案
来自设备的 printf 队列有限.它适用于小规模调试式输出,而不是大规模输出.
printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.
参考程序员指南一个>:
printf() 的输出缓冲区在内核启动之前设置为固定大小(请参阅关联的主机端 API).它是循环的,如果在内核执行期间产生的输出超出缓冲区的容量,则会覆盖较旧的输出.
The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.
您的内核中 printf 输出超出了缓冲区,因此在缓冲区转储到标准 I/O 队列之前,第一个打印的元素丢失(覆盖).
Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.
链接的文档表明缓冲区大小也可以增加.
The linked documentation indicates that the buffer size can be increased, also.
这篇关于在设备上调用 printf() 的不完整输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:在设备上调用 printf() 的不完整输出
基础教程推荐
- 通过引用传递 C++ 迭代器有什么问题? 2022-01-01
- 为什么派生模板类不能访问基模板类的标识符? 2021-01-01
- 为什么 RegOpenKeyEx() 在 Vista 64 位上返回错误代码 2021-01-01
- 我应该对 C++ 中的成员变量和函数参数使用相同的名称吗? 2021-01-01
- 非静态 const 成员,不能使用默认赋值运算符 2022-10-09
- GDB 显示调用堆栈上函数地址的当前编译二进制文 2022-09-05
- 如果我为无符号变量分配负值会发生什么? 2022-01-01
- 初始化列表*参数*评估顺序 2021-01-01
- 为什么 typeid.name() 使用 GCC 返回奇怪的字符以及如 2022-09-16
- CString 到 char* 2021-01-01
