419浏览
查看: 419|回复: 0

[M10项目] 行空板M10 小智AI 聊天机器人

[复制链接]
本帖最后由 云天 于 2025-2-26 11:04 编辑

【项目背景】
       近期,网络上涌现出众多基于 ESP32-S3 设备的“小智 AI聊天机器人”复刻项目。然而,这些项目大多在组装完成后直接下载固件,用户无法对程序进行进一步修改。(也可以在Windows上搭建 ESP IDF 5.3开发环境编译小智)
        最近,有人开源了一种使用 Python 实现“小智 AI”的方案,该方案可在 Windows 电脑和树莓派上运行。鉴于行空板 M10 搭载了 Python 编程环境,我将相关代码移植至该平台,并进行了适配优化,使其能够更好地运行。

【小智AI ESP32S3】

        小智AI聊天机器人是一款基于人工智能技术的交互式语音助手,具备语音识别、自然语言处理、语音合成等功能,能够通过语音与用户进行交流,通常基于ESP32-S3开发板等硬件平台实现,支持Wi-Fi或4G联网,并配备麦克风、扬声器、显示屏等模块。        开源平台:https://xiaozhi.me/
行空板M10 小智AI 聊天机器人图1


【小智AI Python】
       开源网址:https://github.com/zhh827/py-xiaozhi,用python实现的小智客户端,用于代码学习和在没有硬件条件下体验AI小智的语音功能。大家可以选使用Mind+的python环境下试运行。
        注意事项:1.通过Mind+的库管理,安装requirements.txt文件中的相关库。2.将opus.dll拷贝到至C:\Windows\System32目录中。3.需要手动修改py-xiaozhi.py脚本中的全局变量MAC_ADDR(字母小写),运行py-xiaozhi.py后,根据窗口中打印出的注册码,到https://xiaozhi.me/的控制台中进行注册。
【小智AI 行空板】
1.使用行空板上A、B键,实现开始录音、停止录音,与“小智”进行对话。

  1. #!/usr/bin/python
  2. # -*- coding: UTF-8 -*-
  3. import json
  4. import time
  5. import requests
  6. import paho.mqtt.client as mqtt
  7. import threading
  8. import pyaudio
  9. import opuslib  # windwos平台需要将opus.dll 拷贝到C:\Windows\System32
  10. import socket
  11. from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
  12. from cryptography.hazmat.backends import default_backend
  13. from os import urandom
  14. import logging
  15. from pynput import keyboard as pynput_keyboard
  16. from unihiker import GUI
  17. import webrtcvad
  18. from pinpong.board import Pin
  19. from pinpong.board import Board
  20. from pinpong.board import NeoPixel
  21. # 初始化行空板硬件接口
  22. Board().begin()
  23. gui=GUI()
  24. pin1 = Pin(Pin.D23)
  25. np1 = NeoPixel(pin1,24)
  26. np1.brightness(128)
  27. np1.clear()
  28. gui.clear()
  29. fontSize=20
  30. max_lines = 16
  31. max_chars=8
  32. # 图形界面元素
  33. status_label = gui.draw_text(x=80, y=10, text='初始化中...', color='red')
  34. log_text = gui.draw_text(x=10, y=100, text='', font_size=fontSize,color='blue')
  35. emotion = gui.draw_text(x=110, y=50, text='', font_size=fontSize,color='green')
  36. OTA_VERSION_URL = 'https://api.tenclass.net/xiaozhi/ota/'
  37. MAC_ADDR = '7a:da:6b:5c:76:50'
  38. # {"mqtt":{"endpoint":"post-cn-apg3xckag01.mqtt.aliyuncs.com","client_id":"GID_test@@@cc_ba_97_20_b4_bc",
  39. # "username":"Signature|LTAI5tF8J3CrdWmRiuTjxHbF|post-cn-apg3xckag01","password":"0mrkMFELXKyelhuYy2FpGDeCigU=",
  40. # "publish_topic":"device-server","subscribe_topic":"devices"},"firmware":{"version":"0.9.9","url":""}}
  41. mqtt_info = {}
  42. aes_opus_info = {"type": "hello", "version": 3, "transport": "udp",
  43.                  "udp": {"server": "120.24.160.13", "port": 8884, "encryption": "aes-128-ctr",
  44.                          "key": "263094c3aa28cb42f3965a1020cb21a7", "nonce": "01000000ccba9720b4bc268100000000"},
  45.                  "audio_params": {"format": "opus", "sample_rate": 24000, "channels": 1, "frame_duration": 60},
  46.                  "session_id": "b23ebfe9"}
  47. iot_msg = {"session_id": "635aa42d", "type": "iot",
  48.            "descriptors": [{"name": "Speaker", "description": "当前 AI 机器人的扬声器",
  49.                             "properties": {"volume": {"description": "当前音量值", "type": "number"}},
  50.                             "methods": {"SetVolume": {"description": "设置音量",
  51.                                                       "parameters": {
  52.                                                           "volume": {"description": "0到100之间的整数", "type": "number"}
  53.                                                       }
  54.                                                       }
  55.                                         }
  56.                             },
  57.                            {"name": "Lamp", "description": "一个测试用的灯",
  58.                             "properties": {"power": {"description": "灯是否打开", "type": "boolean"}},
  59.                             "methods": {"TurnOn": {"description": "打开灯", "parameters": {}},
  60.                                         "TurnOff": {"description": "关闭灯", "parameters": {}}
  61.                                         }
  62.                             }
  63.                            ]
  64.            }
  65. iot_status_msg = {"session_id": "635aa42d", "type": "iot", "states": [
  66.     {"name": "Speaker", "state": {"volume": 50}}, {"name": "Lamp", "state": {"power": False}}]}
  67. goodbye_msg = {"session_id": "b23ebfe9", "type": "goodbye"}
  68. local_sequence = 0
  69. listen_state = None
  70. tts_state = None
  71. key_state = None
  72. audio = None
  73. udp_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  74. # udp_socket.setblocking(False)
  75. conn_state = False
  76. recv_audio_thread = threading.Thread()
  77. send_audio_thread = threading.Thread()
  78. mqttc = None
  79. def get_ota_version():
  80.     global mqtt_info
  81.     header = {
  82.         'Device-Id': MAC_ADDR,
  83.         'Content-Type': 'application/json'
  84.     }
  85.     post_data = {"flash_size": 16777216, "minimum_free_heap_size": 8318916, "mac_address": f"{MAC_ADDR}",
  86.                  "chip_model_name": "esp32s3", "chip_info": {"model": 9, "cores": 2, "revision": 2, "features": 18},
  87.                  "application": {"name": "xiaozhi", "version": "0.9.9", "compile_time": "Jan 22 2025T20:40:23Z",
  88.                                  "idf_version": "v5.3.2-dirty",
  89.                                  "elf_sha256": "22986216df095587c42f8aeb06b239781c68ad8df80321e260556da7fcf5f522"},
  90.                  "partition_table": [{"label": "nvs", "type": 1, "subtype": 2, "address": 36864, "size": 16384},
  91.                                      {"label": "otadata", "type": 1, "subtype": 0, "address": 53248, "size": 8192},
  92.                                      {"label": "phy_init", "type": 1, "subtype": 1, "address": 61440, "size": 4096},
  93.                                      {"label": "model", "type": 1, "subtype": 130, "address": 65536, "size": 983040},
  94.                                      {"label": "storage", "type": 1, "subtype": 130, "address": 1048576,
  95.                                       "size": 1048576},
  96.                                      {"label": "factory", "type": 0, "subtype": 0, "address": 2097152, "size": 4194304},
  97.                                      {"label": "ota_0", "type": 0, "subtype": 16, "address": 6291456, "size": 4194304},
  98.                                      {"label": "ota_1", "type": 0, "subtype": 17, "address": 10485760,
  99.                                       "size": 4194304}],
  100.                  "ota": {"label": "factory"},
  101.                  "board": {"type": "bread-compact-wifi", "ssid": "mzy", "rssi": -58, "channel": 6,
  102.                            "ip": "192.168.124.38", "mac": "cc:ba:97:20:b4:bc"}}
  103.     response = requests.post(OTA_VERSION_URL, headers=header, data=json.dumps(post_data))
  104.     print('=========================')
  105.     print(response.text)
  106.     logging.info(f"get version: {response}")
  107.     mqtt_info = response.json()['mqtt']
  108. def aes_ctr_encrypt(key, nonce, plaintext):
  109.     cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=default_backend())
  110.     encryptor = cipher.encryptor()
  111.     return encryptor.update(plaintext) + encryptor.finalize()
  112. def aes_ctr_decrypt(key, nonce, ciphertext):
  113.     cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=default_backend())
  114.     decryptor = cipher.decryptor()
  115.     plaintext = decryptor.update(ciphertext) + decryptor.finalize()
  116.     return plaintext
  117. def send_audio():
  118.     global aes_opus_info, udp_socket, local_sequence, listen_state, audio
  119.     key = aes_opus_info['udp']['key']
  120.     nonce = aes_opus_info['udp']['nonce']
  121.     server_ip = aes_opus_info['udp']['server']
  122.     server_port = aes_opus_info['udp']['port']
  123.     # 初始化Opus编码器
  124.     encoder = opuslib.Encoder(16000, 1, opuslib.APPLICATION_AUDIO)
  125.     # 打开麦克风流, 帧大小,应该与Opus帧大小匹配
  126.     mic = audio.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=960)
  127.     try:
  128.         while True:
  129.             if listen_state == "stop":
  130.                 continue
  131.                 time.sleep(0.1)
  132.             # 读取音频数据
  133.             data = mic.read(960)
  134.             # 编码音频数据
  135.             encoded_data = encoder.encode(data, 960)
  136.             # 打印音频数据
  137.             # print(f"Encoded data: {len(encoded_data)}")
  138.             # nonce插入data.size local_sequence_
  139.             local_sequence += 1
  140.             new_nonce = nonce[0:4] + format(len(encoded_data), '04x') + nonce[8:24] + format(local_sequence, '08x')
  141.             # 加密数据,添加nonce
  142.             encrypt_encoded_data = aes_ctr_encrypt(bytes.fromhex(key), bytes.fromhex(new_nonce), bytes(encoded_data))
  143.             data = bytes.fromhex(new_nonce) + encrypt_encoded_data
  144.             sent = udp_socket.sendto(data, (server_ip, server_port))
  145.     except Exception as e:
  146.         print(f"send audio err: {e}")
  147.     finally:
  148.         print("send audio exit()")
  149.         local_sequence = 0
  150.         udp_socket = None
  151.         # 关闭流和PyAudio
  152.         mic.stop_stream()
  153.         mic.close()
  154. def recv_audio():
  155.     global aes_opus_info, udp_socket, audio
  156.     key = aes_opus_info['udp']['key']
  157.     nonce = aes_opus_info['udp']['nonce']
  158.     sample_rate = aes_opus_info['audio_params']['sample_rate']
  159.     frame_duration = aes_opus_info['audio_params']['frame_duration']
  160.     frame_num = int(frame_duration / (1000 / sample_rate))
  161.     print(f"recv audio: sample_rate -> {sample_rate}, frame_duration -> {frame_duration}, frame_num -> {frame_num}")
  162.     # 初始化Opus编码器
  163.     decoder = opuslib.Decoder(sample_rate, 1)
  164.     spk = audio.open(format=pyaudio.paInt16, channels=1, rate=sample_rate, output=True, frames_per_buffer=frame_num)
  165.     try:
  166.         while True:
  167.             data, server = udp_socket.recvfrom(4096)
  168.             # print(f"Received from server {server}: {len(data)}")
  169.             encrypt_encoded_data = data
  170.             # 解密数据,分离nonce
  171.             split_encrypt_encoded_data_nonce = encrypt_encoded_data[:16]
  172.             # 十六进制格式打印nonce
  173.             # print(f"split_encrypt_encoded_data_nonce: {split_encrypt_encoded_data_nonce.hex()}")
  174.             split_encrypt_encoded_data = encrypt_encoded_data[16:]
  175.             decrypt_data = aes_ctr_decrypt(bytes.fromhex(key),
  176.                                            split_encrypt_encoded_data_nonce,
  177.                                            split_encrypt_encoded_data)
  178.             # 解码播放音频数据
  179.             spk.write(decoder.decode(decrypt_data, frame_num))
  180.     # except BlockingIOError:
  181.     #     # 无数据时短暂休眠以减少CPU占用
  182.     #     time.sleep(0.1)
  183.     except Exception as e:
  184.         print(f"recv audio err: {e}")
  185.     finally:
  186.         udp_socket = None
  187.         spk.stop_stream()
  188.         spk.close()
  189. def wrap_hanzi(text, first_line_width=5, other_line_width=16):
  190.     """将字符串格式化为第一行指定宽度,后续行指定宽度"""
  191.     lines = []
  192.    
  193.     # 处理第一行
  194.     if len(text) > first_line_width:
  195.         lines.append(text[:first_line_width])
  196.         remaining_text = text[first_line_width:]
  197.     else:
  198.         lines.append(text)
  199.         remaining_text = ""
  200.    
  201.     # 处理后续行
  202.     for i in range(0, len(remaining_text), other_line_width):
  203.         lines.append(remaining_text[i:i + other_line_width])
  204.    
  205.     return "\n".join(lines)
  206. def get_ascii_emotion(emotion):
  207.     """根据情绪类型返回对应的 ASCII 表情符号"""
  208.     if emotion == "happy":
  209.         return ":)"
  210.     elif emotion == "sad":
  211.         return ":("
  212.     elif emotion == "winking":
  213.         return ";)"
  214.     elif emotion == "surprised":
  215.         return ":O"
  216.     elif emotion == "angry":
  217.         return ">:(("
  218.     elif emotion == "laughing":
  219.         return ":D"
  220.     elif emotion == "cool":
  221.         return "B-)"
  222.     elif emotion == "crying":
  223.         return ":'("
  224.     elif emotion == "shy":
  225.         return "^_^"
  226.     elif emotion == "thinking":
  227.         return ":|"
  228.     elif emotion == "love":
  229.         return "<3"
  230.     elif emotion == "sleepy":
  231.         return "-.-"
  232.     elif emotion == "neutral":
  233.         return ":|"
  234.     elif emotion == "excited":
  235.         return ":D"
  236.     elif emotion == "confused":
  237.         return ":S"
  238.     else:
  239.         return ":("  # 默认表情
  240. def on_message(client, userdata, message):
  241.     global aes_opus_info, udp_socket, tts_state, recv_audio_thread, send_audio_thread,max_chars
  242.     msg = json.loads(message.payload)
  243.     print(f"recv msg: {msg}")
  244.     if msg['type'] == 'hello':
  245.       
  246.         aes_opus_info = msg
  247.         udp_socket.connect((msg['udp']['server'], msg['udp']['port']))
  248.          # 检查recv_audio_thread线程是否启动
  249.         if not recv_audio_thread.is_alive():
  250.             # 启动一个线程,用于接收音频数据
  251.             recv_audio_thread = threading.Thread(target=recv_audio)
  252.             recv_audio_thread.start()
  253.         else:
  254.             print("recv_audio_thread is alive")
  255.         # 检查send_audio_thread线程是否启动
  256.         if not send_audio_thread.is_alive():
  257.             # 启动一个线程,用于发送音频数据
  258.             send_audio_thread = threading.Thread(target=send_audio)
  259.             send_audio_thread.start()
  260.         else:
  261.             print("send_audio_thread is alive")
  262.     if msg['type'] == 'llm':
  263.         ascii_emotion = get_ascii_emotion(msg['emotion'])
  264.         emotion.config(text=ascii_emotion)
  265.     if msg['type'] == 'tts' and msg['state']=='start':
  266.             status_label.config(text="讲话中……")
  267.     if msg['type'] == 'tts' and msg['state']=='stop':
  268.             status_label.config(text="就绪")
  269.     if msg['type'] == 'tts' and msg['state']=='sentence_start':
  270.         tts_state = msg['state']
  271.         text=msg['text']
  272.         text=wrap_hanzi(text, 5,max_chars)
  273.         log_text.config(text="小智: " + text)
  274.         if msg['text'] == '开灯':
  275.            np1.range_color(0,23,0x0000FF)
  276.         if msg['text'] == '关灯':
  277.            np1.clear()
  278.     if msg['type'] == 'stt':
  279.         
  280.         text=msg['text']
  281.         text=wrap_hanzi(text, 5,max_chars)
  282.         log_text.config(text="我: " + text)
  283.     if msg['type'] == 'goodbye' and udp_socket and msg['session_id'] == aes_opus_info['session_id']:
  284.         print(f"recv good bye msg")
  285.         aes_opus_info['session_id'] = None
  286. def on_connect(client, userdata, flags, rs, pr):
  287.     subscribe_topic = mqtt_info['subscribe_topic'].split("/")[0] + '/p2p/GID_test@@@' + MAC_ADDR.replace(':', '_')
  288.     print(f"subscribe topic: {subscribe_topic}")
  289.     # 订阅主题
  290.     client.subscribe(subscribe_topic)
  291. def push_mqtt_msg(message):
  292.     global mqtt_info, mqttc
  293.     mqttc.publish(mqtt_info['publish_topic'], json.dumps(message))
  294. def listen_start():
  295.     global key_state, udp_socket, aes_opus_info, listen_state, conn_state
  296.     if key_state == "press":
  297.         return
  298.     key_state = "press"
  299.     # 判断是否需要发送hello消息
  300.     if conn_state is False or aes_opus_info['session_id'] is None:
  301.         conn_state = True
  302.         # 发送hello消息,建立udp连接
  303.         hello_msg = {"type": "hello", "version": 3, "transport": "udp",
  304.                      "audio_params": {"format": "opus", "sample_rate": 16000, "channels": 1, "frame_duration": 60}}
  305.         push_mqtt_msg(hello_msg)
  306.         print(f"send hello message: {hello_msg}")
  307.     if tts_state == "start" or tts_state == "entence_start":
  308.         # 在播放状态下发送abort消息
  309.         push_mqtt_msg({"type": "abort"})
  310.         print(f"send abort message")
  311.     if aes_opus_info['session_id'] is not None:
  312.         # 发送start listen消息
  313.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "start", "mode": "manual"}
  314.         print(f"send start listen message: {msg}")
  315.         status_label.config(text="聆听中……")
  316.         push_mqtt_msg(msg)
  317. def listen_stop():
  318.     global aes_opus_info, key_state
  319.     key_state = "release"
  320.     # 发送stop listen消息
  321.     if aes_opus_info['session_id'] is not None:
  322.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "stop"}
  323.         print(f"send stop listen message: {msg}")
  324.         push_mqtt_msg(msg)
  325. def run():
  326.     global mqtt_info, mqttc
  327.     # 获取mqtt与版本信息
  328.     get_ota_version()
  329.     # 创建客户端实例
  330.     mqttc = mqtt.Client(callback_api_version=mqtt.CallbackAPIVersion.VERSION2, client_id=mqtt_info['client_id'])
  331.     mqttc.username_pw_set(username=mqtt_info['username'], password=mqtt_info['password'])
  332.     mqttc.tls_set(ca_certs=None, certfile=None, keyfile=None, cert_reqs=mqtt.ssl.CERT_REQUIRED,
  333.                   tls_version=mqtt.ssl.PROTOCOL_TLS, ciphers=None)
  334.     mqttc.on_connect = on_connect
  335.     mqttc.on_message = on_message
  336.     mqttc.connect(host=mqtt_info['endpoint'], port=8883)
  337.     gui.on_a_click(listen_start)
  338.     gui.on_b_click(listen_stop)
  339.     mqttc.loop_forever()
  340. if __name__ == "__main__":
  341.     audio = pyaudio.PyAudio()
  342.     run()
复制代码

(演示视频中,让小智根据对话情景开关灯)

2.通过行空板引脚24外接按钮实现与“小智”对话
行空板M10 小智AI 聊天机器人图2


  1. #增加了24引脚按键,使用舵机模拟双手
  2. #!/usr/bin/python
  3. # -*- coding: UTF-8 -*-
  4. import json
  5. import time
  6. import requests
  7. import paho.mqtt.client as mqtt
  8. import threading
  9. import pyaudio
  10. import opuslib  # windwos平台需要将opus.dll 拷贝到C:\Windows\System32
  11. import socket
  12. from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
  13. from cryptography.hazmat.backends import default_backend
  14. from os import urandom
  15. import logging
  16. from pynput import keyboard as pynput_keyboard
  17. from unihiker import GUI
  18. from pinpong.board import Pin
  19. from pinpong.board import Board
  20. from pinpong.board import Servo
  21. # 初始化行空板硬件接口
  22. Board().begin()
  23. gui=GUI()
  24. gui.clear()
  25. #舵机引脚及按键引脚设置
  26. pin2 = Pin(Pin.D21)
  27. pin3 = Pin(Pin.D22)
  28. servo1 = Servo(pin2)
  29. servo2 = Servo(pin3)
  30. p_p24_in=Pin(Pin.P24, Pin.IN)
  31. #显示内容设置
  32. fontSize=20
  33. max_lines = 16
  34. max_chars=8
  35. # 图形界面元素
  36. status_label = gui.draw_text(x=80, y=10, text='初始化中...', color='red')
  37. log_text = gui.draw_text(x=10, y=100, text='', font_size=fontSize,color='blue')
  38. emotion = gui.draw_text(x=110, y=50, text='', font_size=fontSize,color='green')
  39. OTA_VERSION_URL = 'https://api.tenclass.net/xiaozhi/ota/'
  40. MAC_ADDR = '7a:da:6b:5c:76:50'
  41. # {"mqtt":{"endpoint":"post-cn-apg3xckag01.mqtt.aliyuncs.com","client_id":"GID_test@@@cc_ba_97_20_b4_bc",
  42. # "username":"Signature|LTAI5tF8J3CrdWmRiuTjxHbF|post-cn-apg3xckag01","password":"0mrkMFELXKyelhuYy2FpGDeCigU=",
  43. # "publish_topic":"device-server","subscribe_topic":"devices"},"firmware":{"version":"0.9.9","url":""}}
  44. mqtt_info = {}
  45. aes_opus_info = {"type": "hello", "version": 3, "transport": "udp",
  46.                  "udp": {"server": "120.24.160.13", "port": 8884, "encryption": "aes-128-ctr",
  47.                          "key": "263094c3aa28cb42f3965a1020cb21a7", "nonce": "01000000ccba9720b4bc268100000000"},
  48.                  "audio_params": {"format": "opus", "sample_rate": 24000, "channels": 1, "frame_duration": 60},
  49.                  "session_id": None}
  50. iot_msg = {"session_id": "635aa42d", "type": "iot",
  51.            "descriptors": [{"name": "Speaker", "description": "当前 AI 机器人的扬声器",
  52.                             "properties": {"volume": {"description": "当前音量值", "type": "number"}},
  53.                             "methods": {"SetVolume": {"description": "设置音量",
  54.                                                       "parameters": {
  55.                                                           "volume": {"description": "0到100之间的整数", "type": "number"}
  56.                                                       }
  57.                                                       }
  58.                                         }
  59.                             },
  60.                            {"name": "Lamp", "description": "一个测试用的灯",
  61.                             "properties": {"power": {"description": "灯是否打开", "type": "boolean"}},
  62.                             "methods": {"TurnOn": {"description": "打开灯", "parameters": {}},
  63.                                         "TurnOff": {"description": "关闭灯", "parameters": {}}
  64.                                         }
  65.                             }
  66.                            ]
  67.            }
  68. iot_status_msg = {"session_id": "635aa42d", "type": "iot", "states": [
  69.     {"name": "Speaker", "state": {"volume": 50}}, {"name": "Lamp", "state": {"power": False}}]}
  70. goodbye_msg = {"session_id": "b23ebfe9", "type": "goodbye"}
  71. local_sequence = 0
  72. listen_state = None
  73. tts_state = None
  74. key_state = None
  75. audio = None
  76. udp_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  77. # udp_socket.setblocking(False)
  78. conn_state = False
  79. recv_audio_thread = threading.Thread()
  80. send_audio_thread = threading.Thread()
  81. anjian_thread = threading.Thread()
  82. mqttc = None
  83. # 抬手 放手函数
  84. def handup():
  85.     servo1.write_angle(170)
  86.     servo2.write_angle(10)
  87. def handdown():
  88.     servo1.write_angle(10)
  89.     servo2.write_angle(170)
  90. def get_ota_version():
  91.     global mqtt_info
  92.     header = {
  93.         'Device-Id': MAC_ADDR,
  94.         'Content-Type': 'application/json'
  95.     }
  96.     post_data = {"flash_size": 16777216, "minimum_free_heap_size": 8318916, "mac_address": f"{MAC_ADDR}",
  97.                  "chip_model_name": "esp32s3", "chip_info": {"model": 9, "cores": 2, "revision": 2, "features": 18},
  98.                  "application": {"name": "xiaozhi", "version": "0.9.9", "compile_time": "Jan 22 2025T20:40:23Z",
  99.                                  "idf_version": "v5.3.2-dirty",
  100.                                  "elf_sha256": "22986216df095587c42f8aeb06b239781c68ad8df80321e260556da7fcf5f522"},
  101.                  "partition_table": [{"label": "nvs", "type": 1, "subtype": 2, "address": 36864, "size": 16384},
  102.                                      {"label": "otadata", "type": 1, "subtype": 0, "address": 53248, "size": 8192},
  103.                                      {"label": "phy_init", "type": 1, "subtype": 1, "address": 61440, "size": 4096},
  104.                                      {"label": "model", "type": 1, "subtype": 130, "address": 65536, "size": 983040},
  105.                                      {"label": "storage", "type": 1, "subtype": 130, "address": 1048576,
  106.                                       "size": 1048576},
  107.                                      {"label": "factory", "type": 0, "subtype": 0, "address": 2097152, "size": 4194304},
  108.                                      {"label": "ota_0", "type": 0, "subtype": 16, "address": 6291456, "size": 4194304},
  109.                                      {"label": "ota_1", "type": 0, "subtype": 17, "address": 10485760,
  110.                                       "size": 4194304}],
  111.                  "ota": {"label": "factory"},
  112.                  "board": {"type": "bread-compact-wifi", "ssid": "mzy", "rssi": -58, "channel": 6,
  113.                            "ip": "192.168.124.38", "mac": "cc:ba:97:20:b4:bc"}}
  114.     response = requests.post(OTA_VERSION_URL, headers=header, data=json.dumps(post_data))
  115.     print('=========================')
  116.     print(response.text)
  117.     logging.info(f"get version: {response}")
  118.     mqtt_info = response.json()['mqtt']
  119. def aes_ctr_encrypt(key, nonce, plaintext):
  120.     cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=default_backend())
  121.     encryptor = cipher.encryptor()
  122.     return encryptor.update(plaintext) + encryptor.finalize()
  123. def aes_ctr_decrypt(key, nonce, ciphertext):
  124.     cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=default_backend())
  125.     decryptor = cipher.decryptor()
  126.     plaintext = decryptor.update(ciphertext) + decryptor.finalize()
  127.     return plaintext
  128. def send_audio():
  129.     global aes_opus_info, udp_socket, local_sequence, listen_state, audio
  130.     key = aes_opus_info['udp']['key']
  131.     nonce = aes_opus_info['udp']['nonce']
  132.     server_ip = aes_opus_info['udp']['server']
  133.     server_port = aes_opus_info['udp']['port']
  134.     # 初始化Opus编码器
  135.     encoder = opuslib.Encoder(16000, 1, opuslib.APPLICATION_AUDIO)
  136.     # 打开麦克风流, 帧大小,应该与Opus帧大小匹配
  137.     mic = audio.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=960)
  138.     try:
  139.         while True:
  140.             if listen_state == "stop":
  141.                 continue
  142.                 time.sleep(0.1)
  143.             # 读取音频数据
  144.             data = mic.read(960)
  145.             # 编码音频数据
  146.             encoded_data = encoder.encode(data, 960)
  147.             # 打印音频数据
  148.             # print(f"Encoded data: {len(encoded_data)}")
  149.             # nonce插入data.size local_sequence_
  150.             local_sequence += 1
  151.             new_nonce = nonce[0:4] + format(len(encoded_data), '04x') + nonce[8:24] + format(local_sequence, '08x')
  152.             # 加密数据,添加nonce
  153.             encrypt_encoded_data = aes_ctr_encrypt(bytes.fromhex(key), bytes.fromhex(new_nonce), bytes(encoded_data))
  154.             data = bytes.fromhex(new_nonce) + encrypt_encoded_data
  155.             sent = udp_socket.sendto(data, (server_ip, server_port))
  156.     except Exception as e:
  157.         print(f"send audio err: {e}")
  158.     finally:
  159.         print("send audio exit()")
  160.         local_sequence = 0
  161.         udp_socket = None
  162.         # 关闭流和PyAudio
  163.         mic.stop_stream()
  164.         mic.close()
  165. def recv_audio():
  166.     global aes_opus_info, udp_socket, audio
  167.     key = aes_opus_info['udp']['key']
  168.     nonce = aes_opus_info['udp']['nonce']
  169.     sample_rate = aes_opus_info['audio_params']['sample_rate']
  170.     frame_duration = aes_opus_info['audio_params']['frame_duration']
  171.     frame_num = int(frame_duration / (1000 / sample_rate))
  172.     print(f"recv audio: sample_rate -> {sample_rate}, frame_duration -> {frame_duration}, frame_num -> {frame_num}")
  173.     # 初始化Opus编码器
  174.     decoder = opuslib.Decoder(sample_rate, 1)
  175.     spk = audio.open(format=pyaudio.paInt16, channels=1, rate=sample_rate, output=True, frames_per_buffer=frame_num)
  176.     try:
  177.         while True:
  178.             data, server = udp_socket.recvfrom(4096)
  179.             # print(f"Received from server {server}: {len(data)}")
  180.             encrypt_encoded_data = data
  181.             # 解密数据,分离nonce
  182.             split_encrypt_encoded_data_nonce = encrypt_encoded_data[:16]
  183.             # 十六进制格式打印nonce
  184.             # print(f"split_encrypt_encoded_data_nonce: {split_encrypt_encoded_data_nonce.hex()}")
  185.             split_encrypt_encoded_data = encrypt_encoded_data[16:]
  186.             decrypt_data = aes_ctr_decrypt(bytes.fromhex(key),
  187.                                            split_encrypt_encoded_data_nonce,
  188.                                            split_encrypt_encoded_data)
  189.             # 解码播放音频数据
  190.             spk.write(decoder.decode(decrypt_data, frame_num))
  191.     # except BlockingIOError:
  192.     #     # 无数据时短暂休眠以减少CPU占用
  193.     #     time.sleep(0.1)
  194.     except Exception as e:
  195.         print(f"recv audio err: {e}")
  196.     finally:
  197.         udp_socket = None
  198.         spk.stop_stream()
  199.         spk.close()
  200. def wrap_hanzi(text, first_line_width=5, other_line_width=16):
  201.     """将字符串格式化为第一行指定宽度,后续行指定宽度"""
  202.     lines = []
  203.    
  204.     # 处理第一行
  205.     if len(text) > first_line_width:
  206.         lines.append(text[:first_line_width])
  207.         remaining_text = text[first_line_width:]
  208.     else:
  209.         lines.append(text)
  210.         remaining_text = ""
  211.    
  212.     # 处理后续行
  213.     for i in range(0, len(remaining_text), other_line_width):
  214.         lines.append(remaining_text[i:i + other_line_width])
  215.    
  216.     return "\n".join(lines)
  217. def get_ascii_emotion(emotion):
  218.     """根据情绪类型返回对应的 ASCII 表情符号"""
  219.     if emotion == "happy":
  220.         return ":)"
  221.     elif emotion == "sad":
  222.         return ":("
  223.     elif emotion == "winking":
  224.         return ";)"
  225.     elif emotion == "surprised":
  226.         return ":O"
  227.     elif emotion == "angry":
  228.         return ">:(("
  229.     elif emotion == "laughing":
  230.         return ":D"
  231.     elif emotion == "cool":
  232.         return "B-)"
  233.     elif emotion == "crying":
  234.         return ":'("
  235.     elif emotion == "shy":
  236.         return "^_^"
  237.     elif emotion == "thinking":
  238.         return ":|"
  239.     elif emotion == "love":
  240.         return "<3"
  241.     elif emotion == "sleepy":
  242.         return "-.-"
  243.     elif emotion == "neutral":
  244.         return ":|"
  245.     elif emotion == "excited":
  246.         return ":D"
  247.     elif emotion == "confused":
  248.         return ":S"
  249.     else:
  250.         return ":("  # 默认表情
  251. def on_message(client, userdata, message):
  252.     global aes_opus_info, udp_socket, tts_state, recv_audio_thread, send_audio_thread,max_chars,listen_state
  253.     msg = json.loads(message.payload)
  254.     print(f"recv msg: {msg}")
  255.     if msg['type'] == 'hello':
  256.       
  257.         aes_opus_info = msg
  258.         udp_socket.connect((msg['udp']['server'], msg['udp']['port']))
  259.          # 检查recv_audio_thread线程是否启动
  260.         if not recv_audio_thread.is_alive():
  261.             # 启动一个线程,用于接收音频数据
  262.             recv_audio_thread = threading.Thread(target=recv_audio)
  263.             recv_audio_thread.start()
  264.         else:
  265.             print("recv_audio_thread is alive")
  266.         # 检查send_audio_thread线程是否启动
  267.         if not send_audio_thread.is_alive():
  268.             # 启动一个线程,用于发送音频数据
  269.             send_audio_thread = threading.Thread(target=send_audio)
  270.             send_audio_thread.start()
  271.         else:
  272.             print("send_audio_thread is alive")
  273.     if listen_state is  None:
  274.         listen_state="hello"
  275.         # 发送start listen消息
  276.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "start", "mode": "manual"}
  277.         print(f"send start listen message: {msg}")
  278.         status_label.config(text="聆听中……")
  279.         push_mqtt_msg(msg)
  280.     if msg['type'] == 'tts':
  281.         tts_state = msg['state']
  282.     if msg['type'] == 'llm':
  283.         ascii_emotion = get_ascii_emotion(msg['emotion'])
  284.         emotion.config(text=ascii_emotion)
  285.     if msg['type'] == 'tts' and msg['state']=='start':
  286.             status_label.config(text="讲话中……")
  287.     if msg['type'] == 'tts' and msg['state']=='stop':
  288.             status_label.config(text="就绪")
  289.     if msg['type'] == 'tts' and msg['state']=='sentence_start':
  290.         
  291.         text=msg['text']
  292.         text=wrap_hanzi(text, 5,max_chars)
  293.         log_text.config(text="小智: " + text)
  294.         if msg['text'] == '举手':
  295.           handup()
  296.         if msg['text'] == '放下':
  297.           handdown()
  298.     if msg['type'] == 'stt':
  299.         
  300.         text=msg['text']
  301.         text=wrap_hanzi(text, 5,max_chars)
  302.         log_text.config(text="我: " + text)
  303.     if msg['type'] == 'goodbye' and udp_socket and msg['session_id'] == aes_opus_info['session_id']:
  304.         print(f"recv good bye msg")
  305.         aes_opus_info['session_id'] = None
  306.         log_text.config(text="")
  307.         status_label.config(text="休息中")
  308. def on_connect(client, userdata, flags, rs, pr):
  309.     subscribe_topic = mqtt_info['subscribe_topic'].split("/")[0] + '/p2p/GID_test@@@' + MAC_ADDR.replace(':', '_')
  310.     print(f"subscribe topic: {subscribe_topic}")
  311.     # 订阅主题
  312.     client.subscribe(subscribe_topic)
  313.     status_label.config(text="就绪")
  314. def push_mqtt_msg(message):
  315.     global mqtt_info, mqttc
  316.     mqttc.publish(mqtt_info['publish_topic'], json.dumps(message))
  317. def listen_start():
  318.     global key_state, udp_socket, aes_opus_info, listen_state, conn_state
  319.     if key_state == "press":
  320.         return
  321.     key_state = "press"
  322.     # 判断是否需要发送hello消息
  323.     if conn_state is False or aes_opus_info['session_id'] is None:
  324.         conn_state = True
  325.         # 发送hello消息,建立udp连接
  326.         hello_msg = {"type": "hello", "version": 3, "transport": "udp",
  327.                      "audio_params": {"format": "opus", "sample_rate": 16000, "channels": 1, "frame_duration": 60}}
  328.         push_mqtt_msg(hello_msg)
  329.         print(f"send hello message: {hello_msg}")
  330.     if tts_state == "start" or tts_state == "entence_start":
  331.         # 在播放状态下发送abort消息
  332.         push_mqtt_msg({"type": "abort"})
  333.         print(f"send abort message")
  334.     if aes_opus_info['session_id'] is not None:
  335.         # 发送start listen消息
  336.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "start", "mode": "manual"}
  337.         print(f"send start listen message: {msg}")
  338.         status_label.config(text="聆听中……")
  339.         push_mqtt_msg(msg)
  340. def listen_stop():
  341.     global aes_opus_info, key_state
  342.     key_state = "release"
  343.     # 发送stop listen消息
  344.     if aes_opus_info['session_id'] is not None:
  345.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "stop"}
  346.         print(f"send stop listen message: {msg}")
  347.         push_mqtt_msg(msg)
  348. def run():
  349.     global mqtt_info, mqttc
  350.     # 获取mqtt与版本信息
  351.     get_ota_version()
  352.     # 创建客户端实例
  353.     mqttc = mqtt.Client(callback_api_version=mqtt.CallbackAPIVersion.VERSION2, client_id=mqtt_info['client_id'])
  354.     mqttc.username_pw_set(username=mqtt_info['username'], password=mqtt_info['password'])
  355.     mqttc.tls_set(ca_certs=None, certfile=None, keyfile=None, cert_reqs=mqtt.ssl.CERT_REQUIRED,
  356.                   tls_version=mqtt.ssl.PROTOCOL_TLS, ciphers=None)
  357.     mqttc.on_connect = on_connect
  358.     mqttc.on_message = on_message
  359.     mqttc.connect(host=mqtt_info['endpoint'], port=8883)
  360.     anjian_thread = threading.Thread(target=anjian)
  361.     anjian_thread.start()
  362.     mqttc.loop_forever()
  363. def anjian():
  364.     bs=1
  365.     while True:
  366.      if p_p24_in.read_digital()==True:
  367.          if bs==1:
  368.              listen_start()
  369.          else:
  370.              listen_stop()
  371.          bs=1-bs
  372.          time.sleep(1)
  373. if __name__ == "__main__":
  374.     audio = pyaudio.PyAudio()
  375.     run()
复制代码

(演示视频中让小智根据对话情景实现“举手”、“放下”)

3.使用webrtcvad库进行人声检测,当检测到人声时,开始录音,当“静音”时,停止录音,实现不用手动操作,完成与小智连续对话。

  1. ## pip install paho-mqtt pyaudio opuslib cryptography webrtcvad
  2. #!/usr/bin/python
  3. # -*- coding: UTF-8 -*-
  4. import json
  5. import time
  6. import requests
  7. import paho.mqtt.client as mqtt
  8. import threading
  9. import pyaudio
  10. import opuslib  # windwos平台需要将opus.dll 拷贝到C:\Windows\System32
  11. import socket
  12. from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
  13. from cryptography.hazmat.backends import default_backend
  14. from os import urandom
  15. import logging
  16. from pynput import keyboard as pynput_keyboard
  17. from unihiker import GUI
  18. from pinpong.board import Pin
  19. from pinpong.board import Board
  20. from pinpong.board import Servo
  21. import webrtcvad
  22. vad = webrtcvad.Vad(3)
  23. # 初始化行空板硬件接口
  24. Board().begin()
  25. gui=GUI()
  26. gui.clear()
  27. #舵机引脚及按键引脚设置
  28. pin2 = Pin(Pin.D21)
  29. pin3 = Pin(Pin.D22)
  30. servo1 = Servo(pin2)
  31. servo2 = Servo(pin3)
  32. p_p24_in=Pin(Pin.P24, Pin.IN)
  33. #显示内容设置
  34. fontSize=20
  35. max_lines = 16
  36. max_chars=8
  37. # 图形界面元素
  38. status_label = gui.draw_text(x=80, y=10, text='初始化中...', color='red')
  39. log_text = gui.draw_text(x=10, y=100, text='', font_size=fontSize,color='blue')
  40. emotion = gui.draw_text(x=110, y=50, text='', font_size=fontSize,color='green')
  41. OTA_VERSION_URL = 'https://api.tenclass.net/xiaozhi/ota/'
  42. MAC_ADDR = '7a:da:6b:5c:76:50'
  43. # {"mqtt":{"endpoint":"post-cn-apg3xckag01.mqtt.aliyuncs.com","client_id":"GID_test@@@cc_ba_97_20_b4_bc",
  44. # "username":"Signature|LTAI5tF8J3CrdWmRiuTjxHbF|post-cn-apg3xckag01","password":"0mrkMFELXKyelhuYy2FpGDeCigU=",
  45. # "publish_topic":"device-server","subscribe_topic":"devices"},"firmware":{"version":"0.9.9","url":""}}
  46. mqtt_info = {}
  47. aes_opus_info = {"type": "hello", "version": 3, "transport": "udp",
  48.                  "udp": {"server": "120.24.160.13", "port": 8884, "encryption": "aes-128-ctr",
  49.                          "key": "263094c3aa28cb42f3965a1020cb21a7", "nonce": "01000000ccba9720b4bc268100000000"},
  50.                  "audio_params": {"format": "opus", "sample_rate": 24000, "channels": 1, "frame_duration": 60},
  51.                  "session_id": None}
  52. iot_msg = {"session_id": "635aa42d", "type": "iot",
  53.            "descriptors": [{"name": "Speaker", "description": "当前 AI 机器人的扬声器",
  54.                             "properties": {"volume": {"description": "当前音量值", "type": "number"}},
  55.                             "methods": {"SetVolume": {"description": "设置音量",
  56.                                                       "parameters": {
  57.                                                           "volume": {"description": "0到100之间的整数", "type": "number"}
  58.                                                       }
  59.                                                       }
  60.                                         }
  61.                             },
  62.                            {"name": "Lamp", "description": "一个测试用的灯",
  63.                             "properties": {"power": {"description": "灯是否打开", "type": "boolean"}},
  64.                             "methods": {"TurnOn": {"description": "打开灯", "parameters": {}},
  65.                                         "TurnOff": {"description": "关闭灯", "parameters": {}}
  66.                                         }
  67.                             }
  68.                            ]
  69.            }
  70. iot_status_msg = {"session_id": "635aa42d", "type": "iot", "states": [
  71.     {"name": "Speaker", "state": {"volume": 50}}, {"name": "Lamp", "state": {"power": False}}]}
  72. goodbye_msg = {"session_id": "b23ebfe9", "type": "goodbye"}
  73. local_sequence = 0
  74. listen_state = None
  75. tts_state = None
  76. key_state = None
  77. audio = None
  78. vaddata=None
  79. udp_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  80. # udp_socket.setblocking(False)
  81. conn_state = False
  82. recv_audio_thread = threading.Thread()
  83. send_audio_thread = threading.Thread()
  84. anjian_thread = threading.Thread()
  85. mqttc = None
  86. # 抬手 放手函数
  87. def handup():
  88.     servo1.write_angle(170)
  89.     servo2.write_angle(10)
  90. def handdown():
  91.     servo1.write_angle(10)
  92.     servo2.write_angle(170)
  93. def get_ota_version():
  94.     global mqtt_info
  95.     header = {
  96.         'Device-Id': MAC_ADDR,
  97.         'Content-Type': 'application/json'
  98.     }
  99.     post_data = {"flash_size": 16777216, "minimum_free_heap_size": 8318916, "mac_address": f"{MAC_ADDR}",
  100.                  "chip_model_name": "esp32s3", "chip_info": {"model": 9, "cores": 2, "revision": 2, "features": 18},
  101.                  "application": {"name": "xiaozhi", "version": "0.9.9", "compile_time": "Jan 22 2025T20:40:23Z",
  102.                                  "idf_version": "v5.3.2-dirty",
  103.                                  "elf_sha256": "22986216df095587c42f8aeb06b239781c68ad8df80321e260556da7fcf5f522"},
  104.                  "partition_table": [{"label": "nvs", "type": 1, "subtype": 2, "address": 36864, "size": 16384},
  105.                                      {"label": "otadata", "type": 1, "subtype": 0, "address": 53248, "size": 8192},
  106.                                      {"label": "phy_init", "type": 1, "subtype": 1, "address": 61440, "size": 4096},
  107.                                      {"label": "model", "type": 1, "subtype": 130, "address": 65536, "size": 983040},
  108.                                      {"label": "storage", "type": 1, "subtype": 130, "address": 1048576,
  109.                                       "size": 1048576},
  110.                                      {"label": "factory", "type": 0, "subtype": 0, "address": 2097152, "size": 4194304},
  111.                                      {"label": "ota_0", "type": 0, "subtype": 16, "address": 6291456, "size": 4194304},
  112.                                      {"label": "ota_1", "type": 0, "subtype": 17, "address": 10485760,
  113.                                       "size": 4194304}],
  114.                  "ota": {"label": "factory"},
  115.                  "board": {"type": "bread-compact-wifi", "ssid": "mzy", "rssi": -58, "channel": 6,
  116.                            "ip": "192.168.124.38", "mac": "cc:ba:97:20:b4:bc"}}
  117.     response = requests.post(OTA_VERSION_URL, headers=header, data=json.dumps(post_data))
  118.     print('=========================')
  119.     print(response.text)
  120.     logging.info(f"get version: {response}")
  121.     mqtt_info = response.json()['mqtt']
  122. def aes_ctr_encrypt(key, nonce, plaintext):
  123.     cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=default_backend())
  124.     encryptor = cipher.encryptor()
  125.     return encryptor.update(plaintext) + encryptor.finalize()
  126. def aes_ctr_decrypt(key, nonce, ciphertext):
  127.     cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=default_backend())
  128.     decryptor = cipher.decryptor()
  129.     plaintext = decryptor.update(ciphertext) + decryptor.finalize()
  130.     return plaintext
  131. def send_audio():
  132.     global aes_opus_info, udp_socket, local_sequence, listen_state, audio,vaddata
  133.     key = aes_opus_info['udp']['key']
  134.     nonce = aes_opus_info['udp']['nonce']
  135.     server_ip = aes_opus_info['udp']['server']
  136.     server_port = aes_opus_info['udp']['port']
  137.     # 初始化Opus编码器
  138.     encoder = opuslib.Encoder(16000, 1, opuslib.APPLICATION_AUDIO)
  139.     # 打开麦克风流, 帧大小,应该与Opus帧大小匹配
  140.     mic = audio.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=480)
  141.     try:
  142.         while True:
  143.             if listen_state == "stop":
  144.                 continue
  145.                 time.sleep(0.1)
  146.             # 读取音频数据
  147.             vaddata = mic.read(480)
  148.             data=vaddata+mic.read(480)
  149.             # 编码音频数据
  150.             encoded_data = encoder.encode(data, 960)
  151.             # 打印音频数据
  152.             #print(f"Encoded data: {len(encoded_data)}")
  153.             # nonce插入data.size local_sequence_
  154.             local_sequence += 1
  155.             new_nonce = nonce[0:4] + format(len(encoded_data), '04x') + nonce[8:24] + format(local_sequence, '08x')
  156.             # 加密数据,添加nonce
  157.             encrypt_encoded_data = aes_ctr_encrypt(bytes.fromhex(key), bytes.fromhex(new_nonce), bytes(encoded_data))
  158.             data = bytes.fromhex(new_nonce) + encrypt_encoded_data
  159.             sent = udp_socket.sendto(data, (server_ip, server_port))
  160.     except Exception as e:
  161.         print(f"send audio err: {e}")
  162.     finally:
  163.         print("send audio exit()")
  164.         local_sequence = 0
  165.         udp_socket = None
  166.         # 关闭流和PyAudio
  167.         mic.stop_stream()
  168.         mic.close()
  169. def recv_audio():
  170.     global aes_opus_info, udp_socket, audio,speekstoptime
  171.     key = aes_opus_info['udp']['key']
  172.     nonce = aes_opus_info['udp']['nonce']
  173.     sample_rate = aes_opus_info['audio_params']['sample_rate']
  174.     frame_duration = aes_opus_info['audio_params']['frame_duration']
  175.     frame_num = int(frame_duration / (1000 / sample_rate))
  176.     print(f"recv audio: sample_rate -> {sample_rate}, frame_duration -> {frame_duration}, frame_num -> {frame_num}")
  177.     # 初始化Opus编码器
  178.     decoder = opuslib.Decoder(sample_rate, 1)
  179.     spk = audio.open(format=pyaudio.paInt16, channels=1, rate=sample_rate, output=True, frames_per_buffer=frame_num)
  180.     try:
  181.         while True:
  182.             
  183.             data, server = udp_socket.recvfrom(4096)
  184.             print(f"Received from server {server}: {len(data)}")
  185.             encrypt_encoded_data = data
  186.             # 解密数据,分离nonce
  187.             split_encrypt_encoded_data_nonce = encrypt_encoded_data[:16]
  188.             # 十六进制格式打印nonce
  189.             # print(f"split_encrypt_encoded_data_nonce: {split_encrypt_encoded_data_nonce.hex()}")
  190.             split_encrypt_encoded_data = encrypt_encoded_data[16:]
  191.             decrypt_data = aes_ctr_decrypt(bytes.fromhex(key),
  192.                                            split_encrypt_encoded_data_nonce,
  193.                                            split_encrypt_encoded_data)
  194.             # 解码播放音频数据
  195.             spk.write(decoder.decode(decrypt_data, frame_num))
  196.          
  197.             speekstoptime=time.time()
  198.             
  199.     # except BlockingIOError:
  200.     #     # 无数据时短暂休眠以减少CPU占用
  201.     #     time.sleep(0.1)
  202.     except Exception as e:
  203.         print(f"recv audio err: {e}")
  204.     finally:
  205.         udp_socket = None
  206.         spk.stop_stream()
  207.         spk.close()
  208. def wrap_hanzi(text, first_line_width=5, other_line_width=16):
  209.     """将字符串格式化为第一行指定宽度,后续行指定宽度"""
  210.     lines = []
  211.    
  212.     # 处理第一行
  213.     if len(text) > first_line_width:
  214.         lines.append(text[:first_line_width])
  215.         remaining_text = text[first_line_width:]
  216.     else:
  217.         lines.append(text)
  218.         remaining_text = ""
  219.    
  220.     # 处理后续行
  221.     for i in range(0, len(remaining_text), other_line_width):
  222.         lines.append(remaining_text[i:i + other_line_width])
  223.    
  224.     return "\n".join(lines)
  225. def get_ascii_emotion(emotion):
  226.     """根据情绪类型返回对应的 ASCII 表情符号"""
  227.     if emotion == "happy":
  228.         return ":)"
  229.     elif emotion == "sad":
  230.         return ":("
  231.     elif emotion == "winking":
  232.         return ";)"
  233.     elif emotion == "surprised":
  234.         return ":O"
  235.     elif emotion == "angry":
  236.         return ">:(("
  237.     elif emotion == "laughing":
  238.         return ":D"
  239.     elif emotion == "cool":
  240.         return "B-)"
  241.     elif emotion == "crying":
  242.         return ":'("
  243.     elif emotion == "shy":
  244.         return "^_^"
  245.     elif emotion == "thinking":
  246.         return ":|"
  247.     elif emotion == "love":
  248.         return "<3"
  249.     elif emotion == "sleepy":
  250.         return "-.-"
  251.     elif emotion == "neutral":
  252.         return ":|"
  253.     elif emotion == "excited":
  254.         return ":D"
  255.     elif emotion == "confused":
  256.         return ":S"
  257.     else:
  258.         return ":("  # 默认表情
  259. def on_message(client, userdata, message):
  260.     global aes_opus_info, udp_socket, tts_state, recv_audio_thread, send_audio_thread,max_chars,listen_state,speekstate,speekstoptime
  261.     msg = json.loads(message.payload)
  262.     print(f"recv msg: {msg}")
  263.     if msg['type'] == 'hello':
  264.       
  265.         aes_opus_info = msg
  266.         udp_socket.connect((msg['udp']['server'], msg['udp']['port']))
  267.          # 检查recv_audio_thread线程是否启动
  268.         if not recv_audio_thread.is_alive():
  269.             # 启动一个线程,用于接收音频数据
  270.             recv_audio_thread = threading.Thread(target=recv_audio)
  271.             recv_audio_thread.start()
  272.         else:
  273.             print("recv_audio_thread is alive")
  274.         # 检查send_audio_thread线程是否启动
  275.         if not send_audio_thread.is_alive():
  276.             # 启动一个线程,用于发送音频数据
  277.             send_audio_thread = threading.Thread(target=send_audio)
  278.             send_audio_thread.start()
  279.         else:
  280.             print("send_audio_thread is alive")
  281.     if listen_state is  None:
  282.         listen_state="hello"
  283.         # 发送start listen消息
  284.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "start", "mode": "manual"}
  285.         print(f"send start listen message: {msg}")
  286.         status_label.config(text="聆听中……")
  287.         push_mqtt_msg(msg)
  288.     if msg['type'] == 'tts':
  289.         tts_state = msg['state']
  290.     if msg['type'] == 'llm':
  291.         ascii_emotion = get_ascii_emotion(msg['emotion'])
  292.         emotion.config(text=ascii_emotion)
  293.     if msg['type'] == 'tts' and msg['state']=='start':
  294.             status_label.config(text="讲话中……")
  295.             speekstate=0
  296.     if msg['type'] == 'tts' and msg['state']=='stop':
  297.             status_label.config(text="就绪")
  298.             speekstate=1
  299.     if msg['type'] == 'tts' and msg['state']=='sentence_start':
  300.         
  301.         text=msg['text']
  302.         text=wrap_hanzi(text, 5,max_chars)
  303.         log_text.config(text="小智: " + text)
  304.         if msg['text'] == '举手':
  305.           handup()
  306.         if msg['text'] == '放下':
  307.           handdown()
  308.     if msg['type'] == 'stt':
  309.         
  310.         text=msg['text']
  311.         text=wrap_hanzi(text, 5,max_chars)
  312.         log_text.config(text="我: " + text)
  313.     if msg['type'] == 'goodbye' and udp_socket and msg['session_id'] == aes_opus_info['session_id']:
  314.         print(f"recv good bye msg")
  315.         aes_opus_info['session_id'] = None
  316.         listen_state= None
  317.         log_text.config(text="")
  318.         status_label.config(text="休息中")
  319.         # 关闭 UDP 连接
  320.         if udp_socket:
  321.                 udp_socket.close()
  322.                 udp_socket = None
  323.         for thread in (recv_audio_thread, send_audio_thread):
  324.             if thread and thread.is_alive():
  325.                 thread.join(timeout=1)
  326. def on_connect(client, userdata, flags, rs, pr):
  327.     subscribe_topic = mqtt_info['subscribe_topic'].split("/")[0] + '/p2p/GID_test@@@' + MAC_ADDR.replace(':', '_')
  328.     print(f"subscribe topic: {subscribe_topic}")
  329.     # 订阅主题
  330.     client.subscribe(subscribe_topic)
  331.     status_label.config(text="就绪")
  332. def push_mqtt_msg(message):
  333.     global mqtt_info, mqttc
  334.     mqttc.publish(mqtt_info['publish_topic'], json.dumps(message))
  335. def listen_start():
  336.     global key_state, udp_socket, aes_opus_info, listen_state, conn_state
  337.     if key_state == "press":
  338.         return
  339.     key_state = "press"
  340.    
  341.     # 判断是否需要发送hello消息
  342.     if conn_state is False or aes_opus_info['session_id'] is None:
  343.         # 清理旧连接
  344.         if udp_socket:
  345.                 udp_socket.close()
  346.                 udp_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  347.         conn_state = True
  348.         # 发送hello消息,建立udp连接
  349.         hello_msg = {"type": "hello", "version": 3, "transport": "udp",
  350.                      "audio_params": {"format": "opus", "sample_rate": 16000, "channels": 1, "frame_duration": 60}}
  351.         push_mqtt_msg(hello_msg)
  352.         print(f"send hello message: {hello_msg}")
  353.     if tts_state == "start" or tts_state == "entence_start":
  354.         # 在播放状态下发送abort消息
  355.         push_mqtt_msg({"type": "abort"})
  356.         print(f"send abort message")
  357.     if aes_opus_info['session_id'] is not None:
  358.         # 发送start listen消息
  359.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "start", "mode": "manual"}
  360.         print(f"send start listen message: {msg}")
  361.         
  362.         push_mqtt_msg(msg)
  363. def listen_stop():
  364.     global aes_opus_info, key_state
  365.     key_state = "release"
  366.     # 发送stop listen消息
  367.     if aes_opus_info['session_id'] is not None:
  368.         msg = {"session_id": aes_opus_info['session_id'], "type": "listen", "state": "stop"}
  369.         print(f"send stop listen message: {msg}")
  370.         push_mqtt_msg(msg)
  371. def run():
  372.     global mqtt_info, mqttc
  373.     # 获取mqtt与版本信息
  374.     get_ota_version()
  375.     # 创建客户端实例
  376.     mqttc = mqtt.Client(callback_api_version=mqtt.CallbackAPIVersion.VERSION2, client_id=mqtt_info['client_id'])
  377.     mqttc.username_pw_set(username=mqtt_info['username'], password=mqtt_info['password'])
  378.     mqttc.tls_set(ca_certs=None, certfile=None, keyfile=None, cert_reqs=mqtt.ssl.CERT_REQUIRED,
  379.                   tls_version=mqtt.ssl.PROTOCOL_TLS, ciphers=None)
  380.     mqttc.on_connect = on_connect
  381.     mqttc.on_message = on_message
  382.     mqttc.connect(host=mqtt_info['endpoint'], port=8883)
  383.     anjian_thread = threading.Thread(target=anjian)
  384.     anjian_thread.start()
  385.     mqttc.loop_forever()
  386. speekstate=0
  387. speekstoptime=0
  388. def anjian():
  389.     global vaddata,speekstate,speekstoptime
  390.     bs=0
  391.     last_voice_time=0
  392.    
  393.     while True:
  394.          if vaddata is not None:
  395.             
  396.             if vad.is_speech(vaddata, 16000) and speekstate==1 and time.time()-speekstoptime>1.5:
  397.                 status_label.config(text="聆听中……")
  398.                 print(".",end="")
  399.                 bs=1
  400.                 listen_start()
  401.                 last_voice_time = time.time()
  402.             elif time.time() - last_voice_time >1.5 and bs==1:
  403.                
  404.                 bs=0
  405.                 listen_stop()
  406.          else:
  407.                 time.sleep(2)
  408.                 bs=1
  409.                 listen_start()
  410.                 last_voice_time = time.time()
  411. if __name__ == "__main__":
  412.     audio = pyaudio.PyAudio()
  413.     run()
复制代码

(webrtcvad库检测效果一般,我使用的是模式 3:非常激进模式,灵敏度最高。)


4.借助行空板上的触摸屏,点击“按钮”,与小智对话
行空板M10 小智AI 聊天机器人图3


行空板M10 小智AI 聊天机器人图4

开源网址:https://github.com/Huang-junsen/py-xiaozhi.git,主要修改了“mqtt_client.py”、“gui.py”。


修改的代码文件:下载附件gui及mqtt_client.zip

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

为本项目制作心愿单
购买心愿单
心愿单 编辑
[[wsData.name]]

硬件清单

  • [[d.name]]
btnicon
我也要做!
点击进入购买页面
上海智位机器人股份有限公司 沪ICP备09038501号-4 备案 沪公网安备31011502402448

© 2013-2025 Comsenz Inc. Powered by Discuz! X3.4 Licensed

mail