本帖最后由 云天 于 2024-1-6 15:44 编辑
【项目背景】
2023 年,大模型的落地按下加速键,文生图便是最火热的应用方向之一。自从 Stable Diffusion 诞生以来,海内外的文生图大模型不断涌现,一时有「神仙打架」之感。每一次技术迭代,都带来了模型生成效果和速度的飞速提升。文生图是AIGC领域的核心技术之一,也是体现通用大模型能力的试金石,对模型算法、训练平台、算力设施都有较高的要求。
【项目设计】
今天,作者使用Mind+行空板试用腾讯文成图功能。
【项目亮点】
1.使用讯飞语音识别,设置绘画风格及提示词
2.可设置AI绘画的风格如:
FengGeZiDian = {"不限定风格":"000","水墨画":"101","概念艺术":"102","油画一":"103","油画二":"118","水彩画":"104","像素画":"105","厚涂风格":"106","插图":"107","剪纸风格":"108","印象派1":"109","印象派2":"119","古典肖像画":"111","黑白素描画":"112","赛博朋克":"113","科幻风格":"114","暗黑风格":"115","3D风格":"116","蒸汽波":"117","日系动漫":"201","怪兽风格":"202","唯美古风":"203","复古动漫":"204","游戏卡通手绘":"301","通用写实风格":"401"}
复制代码
3.使用音量大小判断,决定是否录音进行识别。
【项目效果图】
在《泊船瓜洲》这首诗中,「春风又绿江南岸,明月何时照我还」,写出了无数游子的乡愁。分别用这两句使用“油画风格”、“通用写实风格”进行效果测试。由于手机的拍摄效果不佳,配合使用截屏展示。
油画风格:春风又绿江南岸
油画风格:明月何时照我还
通用写实风格 :春风又绿江南岸
通用写实风格 :明月何时照我还
【程序编写:音量大小判定——录音】
编写自定义模块listening.py,使用pyaudio库、wave库、numpy库。最小声音阈值设置为500(可根据实际场景的背景噪声设置),当大于此阈值时,开始录音。当小于此阈值超过1.3秒时,停止录音。
import pyaudio,wave
import numpy as np
def listen():
temp = 20
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 2
WAVE_OUTPUT_FILENAME = 'record.wav'
mindb=500 #最小声音,大于则开始录音,否则结束
delayTime=1.3 #小声1.3秒后自动终止
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
#snowboydecoder.play_audio_file()
print("开始!计时")
frames = []
flag = False # 开始录音节点
stat = True #判断是否继续录音
stat2 = False #判断声音小了
tempnum = 0 #tempnum、tempnum2、tempnum3为时间
tempnum2 = 0
while stat:
data = stream.read(CHUNK,exception_on_overflow = False)
audio_data = np.frombuffer(data, dtype=np.short)
temp = np.max(audio_data)
if temp > mindb and flag==False:
flag =True
print("开始录音")
tempnum2=tempnum
if flag:
frames.append(data)
if(temp < mindb and stat2==False):
stat2 = True
tempnum2 = tempnum
print("声音小,且之前是是大的或刚开始,记录当前点")
if(temp > mindb):
stat2 =False
tempnum2 = tempnum
#刷新
if(tempnum > tempnum2 + delayTime*15 and stat2==True):
print("间隔%.2lfs后开始检测是否还是小声"%delayTime)
if(stat2 and temp < mindb):
stat = False
#还是小声,则stat=True
print("小声!")
else:
stat2 = False
print("大声!")
print(str(temp) + " " + str(tempnum))
tempnum = tempnum + 1
if tempnum > 3600:
tempnum=0 #超时直接退出
#stat = False
print("录音结束")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
复制代码
【程序编写——Mind+图形化编程】
1.Mind+库管理安装tencentcloud库。
2.扩展——用户库增加 base64和讯飞语音。
2.代码tencentcloud初始化
3.风格字典初始化
4.导入自定义音量侦听模块文件,初始化语音识别,行空板屏幕开机显示
5、设置绘画风格
6、设置提示词
7.向腾讯云平台发送提示词及绘画风格(默认为日漫风格)进行AI绘画,并将结果在行空板屏幕上显示。
【Python代码】
# -*- coding: UTF-8 -*-
# MindPlus
# Python
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.aiart.v20221229 import aiart_client, models
from tencentcloud.common import credential
from pinpong.extension.unihiker import *
import requests,json,base64,cv2,time
from pinpong.board import Board,Pin
from unihiker import GUI
import numpy as np
import listening
import xunfeiasr
import time
import cv2
# 自定义函数
def AIPingTaiChuShiHua():
global client
SecretId="AKID1qePFETUPGyUmtBHH4e5TgokW9e3UsSu"
SecretKey="a5RgkeQ7ikecr55542vStP5pRwJWimhY"
cred = credential.Credential(SecretId, SecretKey)
httpProfile = HttpProfile()
httpProfile.endpoint = "aiart.tencentcloudapi.com"
clientProfile = ClientProfile()
clientProfile.httpProfile = httpProfile
client = aiart_client.AiartClient(cred, "ap-shanghai", clientProfile)
def FengGeZiDianChuShiHua():
global FengGeZiDian
FengGeZiDian = {"不限定风格":"000","水墨画":"101","概念艺术":"102","油画一":"103","油画二":"118","水彩画":"104","像素画":"105","厚涂风格":"106","插图":"107","剪纸风格":"108","印象派1":"109","印象派2":"119","古典肖像画":"111","黑白素描画":"112","赛博朋克":"113","科幻风格":"114","暗黑风格":"115","3D风格":"116","蒸汽波":"117","日系动漫":"201","怪兽风格":"202","唯美古风":"203","复古动漫":"204","游戏卡通手绘":"301","通用写实风格":"401"}
u_gui=GUI()
Board().begin()
AIPingTaiChuShiHua()
FengGeZiDianChuShiHua()
标签=u_gui.draw_text(text="行空板",x=0,y=0,font_size=20, color="#0000FF")
识别显示=u_gui.draw_text(text="行空板AI画",x=25,y=120,font_size=30, color="#0000FF")
xunfeiasr.xunfeiasr_set(APPID="**********",APISecret="***************************",APIKey="***************************")
ShiBieNaRong = 0
BiaoShi = 0
AnNiuBiaoShi = 0
FengGe = "000"
time.sleep(5)
识别显示.config(font_size=20)
识别显示.config(x=5)
识别显示.config(text="提示词或设置风格")
while True:
if (not (ShiBieNaRong == 0)):
buzzer.play(buzzer.DADADADUM,buzzer.Once)
listening.listen()
u_gui.clear()
ShiBieNaRong = 0
识别显示=u_gui.draw_text(text="正在识别语音",x=15,y=120,font_size=25, color="#0000FF")
ShiBieNaRong = xunfeiasr.xunfeiasr(r"record.wav")
if ShiBieNaRong in FengGeZiDian:
标签=u_gui.draw_text(text="设置风格:",x=0,y=0,font_size=20, color="#FF0000")
FengGe = (FengGeZiDian[ShiBieNaRong])
BiaoShi = 0
识别显示.config(font_size=20)
识别显示.config(text=ShiBieNaRong)
time.sleep(2)
标签.config(text="设置提示词:")
识别显示.config(text="请说出提示词")
else:
标签=u_gui.draw_text(text="识别提示词结果:",x=0,y=0,font_size=20, color="#FF0000")
BiaoShi = 1
if (len("") < 7):
识别显示.config(font_size=20)
else:
识别显示.config(font_size=10)
识别显示.config(text=ShiBieNaRong)
time.sleep(2)
if (BiaoShi == 1):
if (not len(ShiBieNaRong)):
识别显示.config(font_size=20)
识别显示.config(text="识别内容为空")
识别显示.config(color="#FF0000")
time.sleep(3)
标签.config(text="设置提示词:")
识别显示.config(color="#0000FF")
识别显示.config(text="请说出提示词")
else:
标签.config(text="请等待:")
识别显示.config(font_size=20)
识别显示.config(text="AI绘画中...")
req = models.TextToImageRequest()
TiShiCi = ShiBieNaRong
params = {"Prompt":ShiBieNaRong, "Styles": [ FengGe ], "ResultConfig": {"Resolution": "1024:768"}}
req.from_json_string(json.dumps(params))
resp = client.TextToImage(req)
image_base64 = resp.ResultImage
image_data = base64.b64decode(image_base64)
np_array=np.frombuffer(image_data,np.uint8)
image=cv2.imdecode(np_array,cv2.IMREAD_COLOR)
image=cv2.resize(image,(240,320))
cv2.imwrite("img.png", image)
AI图=u_gui.draw_image(image="img.png",x=0,y=0)
复制代码
【演示视频】