本帖最后由 nihui 于 2024-4-14 14:44 编辑
介绍 Beetle ESP32 C6
Beetle ESP32-C6是一款基于ESP32-C6芯片设计的迷你体积的低功耗物联网开发板
- 超小体积,尺寸仅25*20.5mm
- 搭载ESP32-C6芯片,支持Wi-Fi、BLE、Zigbee、Thread通讯协议
- 支持Wi-Fi 6协议,更低延迟,更低功耗
- 超低功耗,deep-sleep 14uA
- 集成锂电池充电功能
- 支持电池电压检测,了解设备电量信息
配置编译开发环境,编译 ncnn,使用 ncnn 进行推理
此处步骤省略,请参考上一篇文章 :D
https://mc.dfrobot.com.cn/thread-318402-1-1.html
https://zhuanlan.zhihu.com/p/690982179
ncnn 模型量化过程
参考 ncnn 量化工具使用教程
https://github.com/Tencent/ncnn/wiki/quantized-int8-inference
ncnn 量化工具需要图片数据集做校准,获得量化的 scale 系数,通常使用测试集图片
下载 mnist 测试数据集,转为 png 图片
http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
为了方便,可以下载这里png包直接使用
https://github.com/myleott/mnist_png/blob/master/mnist_png.tar.gz?raw=true
解压缩出来,用 find + shuf 命令生成图片文件列表
- find mnist_png/ -type f | grep testing | shuf > imagelist.txt
复制代码
imagelist.txt 内容如下
- mnist_png/testing/7/5655.png
- mnist_png/testing/5/9234.png
- mnist_png/testing/3/1962.png
- mnist_png/testing/4/7741.png
- mnist_png/testing/0/7231.png
- mnist_png/testing/8/4398.png
- mnist_png/testing/4/7283.png
- ...
复制代码
使用 ncnn2table 工具,输入fp32模型和校准数据集,输出校准table
mnist 模型输入的是 0~1 的浮点数,png 图片解码出 0~255 整数,设置 norm 为 1/255 做预处理变换
- ncnn2table mnist.param mnist.bin imagelist.txt mnist.table mean=[0] norm=[0.00392156862745] shape=[28,28,1] pixel=GRAY method=kl
复制代码
输出信息
- mean = [0.000000]
- norm = [0.003922]
- shape = [28,28,1]
- pixel = GRAY
- thread = 32
- method = kl
- ---------------------------------------
- count the absmax 0.00% [ 0 / 10000 ]
- count the absmax 1.00% [ 100 / 10000 ]
- ...
- count the absmax 99.00% [ 9900 / 10000 ]
- count the absmax 98.00% [ 9800 / 10000 ]
- build histogram 0.00% [ 0 / 10000 ]
- build histogram 1.00% [ 100 / 10000 ]
- ...
- build histogram 96.00% [ 9600 / 10000 ]
- build histogram 97.00% [ 9700 / 10000 ]
- conv0 : max = 1.000000 threshold = 0.997314 scale = 127.341980
- conv1 : max = 3.666233 threshold = 3.002981 scale = 42.291306
- fc : max = 11.118896 threshold = 10.828436 scale = 11.728379
- ncnn int8 calibration table create success, best wish for your int8 inference has a low accuracy loss...\(^0^)/...233...
复制代码
使用 ncnn2int8 工具,输入fp32模型和校准table,生成量化后的int8模型
- ncnn2int8 mnist.param mnist.bin mnist-int8.param mnist-int8.bin mnist.table
复制代码
输出信息,可以看到mnist模型中的2个卷积层和1个fc层都进行了量化
- quantize_convolution conv0
- quantize_convolution conv1
- quantize_innerproduct fc
复制代码
量化后的模型体积减小到原始的 1/4
esp32c6 加载量化后的模型
与fp32模型一样,用 ncnn2mem 转换为静态数组,内存加载模型
- ncnn2mem mnist-int8.param mnist-int8.bin mnist-int8.id.h mnist-int8.mem.h
复制代码
只需要修改加载模型的2行代码即可,其他代码保持原样
- <div class="blockcode"><blockquote>#include "mnist-int8.mem.h"
-
- extern "C" void app_main(void)
- {
- ncnn::Net net;
-
- // net.load_param(mnist_param_bin);
- // net.load_model(mnist_bin);
- net.load_param(mnist_int8_param_bin);
- net.load_model(mnist_int8_bin);
- }
复制代码
效果和性能对比
esp32c3加速9.1倍,esp32c6加速5.4倍,提速明显!
- ESP-ROM:esp32c6-20220919
- Build:Sep 19 2022
- rst:0xc (SW_CPU),boot:0xc (SPI_FAST_FLASH_BOOT)
- Saved PC:0x4001975a
- SPIWP:0xee
- mode:DIO, clock div:2
- load:0x40875720,len:0x1804
- load:0x4086c110,len:0xe2c
- load:0x4086e610,len:0x2e30
- entry 0x4086c11a
- I (22) boot: ESP-IDF v5.3-dev-2815-gbe06a6f5ff 2nd stage bootloader
- I (23) boot: compile time Apr 14 2024 11:52:36
- I (24) boot: chip revision: v0.0
- I (26) boot.esp32c6: SPI Speed : 80MHz
- I (31) boot.esp32c6: SPI Mode : DIO
- I (36) boot.esp32c6: SPI Flash Size : 2MB
- I (41) boot: Enabling RNG early entropy source...
- I (46) boot: Partition Table:
- I (50) boot: ## Label Usage Type ST Offset Length
- I (57) boot: 0 nvs WiFi data 01 02 00009000 00006000
- I (64) boot: 1 phy_init RF data 01 01 0000f000 00001000
- I (72) boot: 2 factory factory app 00 00 00010000 00100000
- I (79) boot: End of partition table
- I (83) esp_image: segment 0: paddr=00010020 vaddr=420b0020 size=0a9e0h ( 43488) map
- I (110) esp_image: segment 1: paddr=0001aa08 vaddr=40800000 size=05610h ( 22032) load
- I (122) esp_image: segment 2: paddr=00020020 vaddr=42000020 size=aa960h (698720) map
- I (407) esp_image: segment 3: paddr=000ca988 vaddr=40805610 size=03e74h ( 15988) load
- I (416) esp_image: segment 4: paddr=000ce804 vaddr=40809490 size=00f64h ( 3940) load
- I (424) boot: Loaded app from partition at offset 0x10000
- I (425) boot: Disabling RNG early entropy source...
- I (436) cpu_start: Unicore app
- I (446) cpu_start: Pro cpu start user code
- I (446) cpu_start: cpu freq: 160000000 Hz
- I (447) app_init: Application information:
- I (449) app_init: Project name: main
- I (454) app_init: App version: a36153d-dirty
- I (459) app_init: Compile time: Apr 14 2024 11:52:33
- I (465) app_init: ELF file SHA256: 0a0e007ce...
- I (470) app_init: ESP-IDF: v5.3-dev-2815-gbe06a6f5ff
- I (477) efuse_init: Min chip rev: v0.0
- I (482) efuse_init: Max chip rev: v0.99
- I (487) efuse_init: Chip rev: v0.0
- I (491) heap_init: Initializing. RAM available for dynamic allocation:
- I (499) heap_init: At 4080B3B0 len 00071260 (452 KiB): RAM
- I (505) heap_init: At 4087C610 len 00002F54 (11 KiB): RAM
- I (511) heap_init: At 50000000 len 00003FE8 (15 KiB): RTCRAM
- I (518) spi_flash: detected chip: generic
- I (522) spi_flash: flash io: dio
- W (526) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
- I (539) sleep: Configure to isolate all GPIO pins in sleep state
- I (546) sleep: Enable automatic switching of GPIO sleep configuration
- I (553) coexist: coex firmware version: d96c1e51f
- I (558) coexist: coexist rom version 5b8dcfa
- I (563) main_task: Started on CPU0
- I (563) main_task: Calling app_main()
- Loading ncnn mnist model...Done.
- Preparing input...Start Mesuring!
- Done!
- 0: -1.63
- 1: 3.05
- 2: 12.36
- 3: 7.95
- 4: -17.89
- 5: -9.99
- 6: -15.13
- 7: 16.36
- 8: 0.45
- 9: -3.34
- I think it is number 7!
- Latency, avg: 78.77ms, max: 79.48, min: 78.68. Avg Flops: 9.90MFlops
- Restarting now.
复制代码
代码已更新到 https://github.com/nihui/ncnn_on_esp32
|