Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单张图推论时间~0.004如何被测量的? #44

Closed
PikachuRX78 opened this issue May 11, 2023 · 6 comments
Closed

单张图推论时间~0.004如何被测量的? #44

PikachuRX78 opened this issue May 11, 2023 · 6 comments

Comments

@PikachuRX78
Copy link

你好,最近在复现此模型
以单张3090进行推论
使用img_demo.py进行推论计时
加载best_Epoch_lol_v1.pth的weight

使用以下代码推论600*400的单张图速度量测出来为100毫秒左右,GFLOPs为5.27941248

from fvcore.nn import FlopCountAnalysis, parameter_count_table
import time

...
start_time = time.time_ns()
_, _ ,enhanced_img = model(input)
end_time = time.time_ns()
total_time = end_time - start_time
print("Time taken by network is : %f ms"%(total_time/1000000))
..
flops = FlopCountAnalysis(model, input)
print("GFLOPs: ", flops.total()/1e9)
..

想请问论文里面0.004秒的推论速度是及1.44GFLOPs如何计算及实现
感谢作者

@cuiziteng
Copy link
Owner

cuiziteng commented May 11, 2023

您实验一下多张图像,然后算一下平均,因为GPU的固有问题,一开始第一张图算的时间会多,我当时是测试的LOL数据集15张测试图像,这个0.004是我当时算的15张测试图像的时间,然后算的平均,您可以自己先跑一下LOL-V1的测试集验证一下。

至于Flops,当时测算的是256x256图像尺寸的Flops,并非400x600,与CVPR 2022中MAXIM论文中的计算方法相同,如果按照400*600的尺寸计算,确实是您说的5.27Flops,这一点因为我们的失误没有在论文讲清楚,非常抱歉,如下图:
3561683818449_ pic

还有任何问题都欢迎提问,非常感谢指正~

@cuiziteng
Copy link
Owner

image 已在readme.md中说明。

@PikachuRX78
Copy link
Author

感谢作者的回答
刚刚试了一下
使用evaluation_lol_v1.py
15张大概使用143ms左右
一张大约是9.53ms,确实是缩短了很多,不过似乎还算有点差距
以下为我测试的代码

total_time=0
with torch.no_grad():
    for i, imgs in tqdm(enumerate(val_loader)):
        #print(i)
        low_img, high_img, name = imgs[0].cuda(), imgs[1].cuda(), str(imgs[2][0])
        # print(name)
        #print(low_img.shape)
        start_time = time.time_ns()
        mul, add ,enhanced_img = model(low_img)
        end_time = time.time_ns()
        temp = end_time - start_time
        total_time += temp
        if config.save:
            torchvision.utils.save_image(enhanced_img, result_path + str(name) + '.png')

        ssim_value = ssim(enhanced_img, high_img, as_loss=False).item()
        psnr_value = psnr(enhanced_img, high_img).item()

        ssim_list.append(ssim_value)
        psnr_list.append(psnr_value)
    
print("Average time taken by network is : %f ms"%(total_time/1000000/15))

输出结果

Total examples: 15
15it [00:00, 45.14it/s]
Average time taken by network is : 9.364251 ms
The SSIM Value is: 0.8089913686116537
The PSNR Value is: 23.382731374104818

@cuiziteng
Copy link
Owner

cuiziteng commented May 11, 2023

您再算一下LOL-V2数据集的inference速度试试看,那个有100张图,应该是更合理些,15张图可能前面的也会受机器影响inference慢而影响后面的,0.004当时是我在LOL-V2算出来的。

如果您的这台电脑上有其他人目前也在跑代码,这也会影响到inference的速度。

我手头没有空闲的3090,后续有的话会上传一下截图,anyway,非常感谢关注~

@PikachuRX78
Copy link
Author

感谢作者,看来的确是GPU启动的问题

使用evaluation_lol_v2.py
平均一张2.74ms
测试代码

total_time=0
start_time = time.time_ns()
with torch.no_grad():
    for i, imgs in tqdm(enumerate(val_loader)):
        low_img, high_img = imgs[0].cuda(), imgs[1].cuda()
        start_time = time.time_ns()
        mul, add ,enhanced_img = model(low_img)
        end_time = time.time_ns()
        temp = end_time - start_time
        total_time += temp

        ssim_value = ssim(enhanced_img, high_img, as_loss=False).item()
        psnr_value = psnr(enhanced_img, high_img).item()

        ssim_list.append(ssim_value)
        psnr_list.append(psnr_value)
print("Average time taken by network is : %f ms"%(total_time/1000000/100))

输出结果

Total examples: 100
100it [00:01, 71.37it/s]
Average time taken by network is : 2.745986 ms
The SSIM Value is: 0.8237025141716003
The PSNR Value is: 23.499295816421508

@cuiziteng
Copy link
Owner

OK~ 看来这个比4ms还要快

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants