当前位置： > 投稿>正文

机器视觉技术基础知识，高级计算机视觉的手语识别

04-14 互联网未知投稿

关于【机器视觉技术基础知识】，今天犇犇小编给您分享一下，如果对您有所帮助别忘了关注本站哦。

内容导航：
1、机器视觉技术基础知识：高级计算机视觉的手语识别
2、机器视觉技术基础知识，你想了解的都在这里

1、机器视觉技术基础知识：高级计算机视觉的手语识别

文章来源：深度学习与计算机视觉
原文链接：https://mp.weixin.qq.com/s/5kNXV7w4B4S4kej0vLRoZg

机器视觉技术基础知识，高级计算机视觉的手语识别

手语是一种主要由听力困难或耳聋的人使用的交流方式。这种基于手势的语言可以让人们轻松地表达想法和想法，克服听力问题带来的障碍。

这种便捷的交流方式的一个主要问题是，全球绝大多数人缺乏语言知识。就像其他语言一样，学习手语需要花费大量时间和精力，这让人很沮丧，无法被更多的人学习。

然而，在机器学习和图像检测领域，这一问题的一个明显解决方案已经存在。实现预测模型技术来自动分类手语符号可以用于为Zoom会议等虚拟会议创建实时字幕。

这将大大增加听力障碍者获得此类服务的机会，因为它将与基于语音的字幕同步，为听力障碍者创建一个双向在线通信系统。

许多手语的大型训练数据集都可以在Kaggle上找到，Kaggle是一个流行的数据科学资源。该模型中使用的一个被称为“手语MNIST”，是一个公共领域，可免费使用的数据集，其中包含24个ASL字母中每一个的大约1000张图像的像素信息，不包括J和Z，因为它们是基于手势的符号。

https://www.kaggle.com/datasets/datamunge/sign-language-mnist

机器视觉技术基础知识，高级计算机视觉的手语识别

准备用于训练的数据的第一步是将数据集中的所有像素数据转换并整形为图像，以便算法可以读取这些数据。

import matplotlib.pyplot as pltimport seaborn as snsfrom keras.models import Sequentialfrom keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalizationfrom keras.preprocessing.image import ImageDataGeneratorfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_report,confusion_matriximport pandas as pdtrain_df = pd.read_csv("sign_mnist_train.csv")test_df = pd.read_csv("sign_mnist_test.csv")y_train = train_df['label']y_test = test_df['label']del train_df['label']del test_df['label']from sklearn.preprocessing import LabelBinarizerlabel_binarizer = LabelBinarizer()y_train = label_binarizer.fit_transform(y_train)y_test = label_binarizer.fit_transform(y_test)x_train = train_df.valuesx_test = test_df.valuesx_train = x_train / 255x_test = x_test / 255x_train = x_train.reshape(-1,28,28,1)x_test = x_test.reshape(-1,28,28,1)

上面的代码从重塑所有MNIST训练图像文件开始，以便模型理解输入文件。除此之外，LabelBinarizer变量获取数据集中的类并将它们转换为二进制，这一过程大大加快了模型的训练。

下一步是创建数据生成器，以随机实现对数据的更改，增加训练示例的数量，并通过向不同实例添加噪声和变换使图像更真实。

datagen = ImageDataGenerator( featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, samplewise_std_normalization=False, zca_whitening=False, rotation_range=10, zoom_range = 0.1, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=False, vertical_flip=False)datagen.fit(x_train)

在处理图像之后，必须编译CNN模型以识别数据中使用的所有类别的信息，即24个不同的图像组。还必须将数据的标准化添加到数据中，以较少的图像平衡类。

model = Sequential()model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (28,28,1)))model.add(BatchNormalization())model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))model.add(Conv2D(50 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))model.add(Dropout(0.2))model.add(BatchNormalization())model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))model.add(Conv2D(25 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))model.add(BatchNormalization())model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))model.add(Flatten())model.add(Dense(units = 512 , activation = 'relu'))model.add(Dropout(0.3))model.add(Dense(units = 24 , activation = 'softmax'))

请注意，通过添加变量（如Conv2D模型）初始化算法，并将其浓缩为24个特征。我们还使用批处理技术让CNN更有效地处理数据。

最后，定义损失函数和度量，并将模型与数据相匹配

model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])model.summary()history = model.fit(datagen.flow(x_train,y_train, batch_size = 128) ,epochs = 20 , validation_data = (x_test, y_test))model.save('smnist.h5')

这段代码有很多需要解包的地方。让我们分几节来看。

第1行：

model.compile函数接受许多参数，其中三个参数显示在代码中。优化器和损失参数与下一行中的epoch语句一起工作，通过逐步改变数据的计算方法，有效地减少模型中的错误量。

除此之外，要优化的度量标准是精度函数，它确保模型在设定的epoch数之后具有可达到的最大精度。

第4行：

这里运行的函数将设计的模型与第一位代码中开发的图像数据中的数据相匹配。它还定义了模型为提高图像检测的准确性所必须的时期或迭代次数。这里还调用了验证集，以向模型引入测试方面。该模型使用该数据计算精度。

第5行：

在代码位中的所有语句中，model.save函数可能是这段代码中最重要的部分，因为它可以在实现模型时节省数小时的时间。

机器视觉技术基础知识，高级计算机视觉的手语识别

开发的模型准确地检测和分类手语符号，训练准确率约为95%。

现在，使用两个流行的实时视频处理库，即Mediapipe和OpenCV，我们可以获取网络摄像头输入，并在实时视频流上运行我们之前开发的模型。

机器视觉技术基础知识，高级计算机视觉的手语识别

首先，我们需要导入程序所需的包。

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' import tensorflow as tfimport cv2import mediapipe as mpfrom keras.models import load_modelimport numpy as npimport time

开始时运行的OS命令只会阻止Mediapipe使用的Tensorflow库发出不必要的警告。这使程序提供的未来输出更加清晰易懂。

在我们启动代码的主while循环之前，我们需要首先定义一些变量，例如保存的模型和OpenCV相机上的信息。

model = load_model('smnist.h5')mphands = mp.solutions.handshands = mphands.Hands()mp_drawing = mp.solutions.drawing_utilscap = cv2.VideoCapture(0)_, frame = cap.read()h, w, c = frame.shapeanalysisframe = ''letterpred = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y']

这里设置的每个变量都分为四个类别之一。一开始的类别与我们在本文第一部分中训练的模型直接相关。

代码的第二和第三部分定义了运行和启动Mediapipe和OpenCV所需的变量。最终类别主要用于在检测到帧时分析帧，并创建用于图像模型提供的数据的交叉引用的字典。

该程序的下一部分是主while True循环，其中大部分程序都在该循环中运行。

while True: _, frame = cap.read() k = cv2.waitKey(1) if k%256 == 27: # ESC pressed print("Escape hit, closing...") break framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) result = hands.process(framergb) hand_landmarks = result.multi_hand_landmarks if hand_landmarks: for handLMs in hand_landmarks: x_max = 0 y_max = 0 x_min = w y_min = h for lm in handLMs.landmark: x, y = int(lm.x * w), int(lm.y * h) if x > x_max: x_max = x if x < x_min: x_min = x if y > y_max: y_max = y if y < y_min: y_min = y y_min -= 20 y_max += 20 x_min -= 20 x_max += 20 cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2) mp_drawing.draw_landmarks(frame, handLMs, mphands.HAND_CONNECTIONS) cv2.imshow("Frame", frame)cap.release()cv2.destroyAllWindows()

该程序的这一部分从你的相机获取输入，并使用我们导入的图像处理库将设备的输入显示到计算机。这部分代码专注于从相机获取一般信息，并在新窗口中简单地显示出来。然而，使用Mediapipe库，我们可以检测手的主要标志，如手指和手掌，并在手周围创建一个边界框。

机器视觉技术基础知识，高级计算机视觉的手语识别

边界框的概念是所有形式的图像分类和分析的关键组成部分。该框允许模型直接聚焦于功能所需的图像部分。如果没有这一点，算法会在错误的位置找到模式，并可能导致错误的结果。

例如，在训练过程中，缺少边界框可能会导致模型将诸如时钟或椅子等图像的特征与标签相关联。这可能会导致程序注意到图像中的时钟，并仅根据时钟存在的事实来决定显示什么手语字符。

机器视觉技术基础知识，高级计算机视觉的手语识别

快完成了！该程序的倒数第二部分是根据提示捕获单个帧，并将其裁剪到边界框的尺寸。

while True: _, frame = cap.read() k = cv2.waitKey(1) if k%256 == 27: # ESC pressed print("Escape hit, closing...") break elif k%256 == 32: # SPACE pressed # SPACE pressed analysisframe = frame showframe = analysisframe cv2.imshow("Frame", showframe) framergbanalysis = cv2.cvtColor(analysisframe, cv2.COLOR_BGR2RGB) resultanalysis = hands.process(framergbanalysis) hand_landmarksanalysis = resultanalysis.multi_hand_landmarks if hand_landmarksanalysis: for handLMsanalysis in hand_landmarksanalysis: x_max = 0 y_max = 0 x_min = w y_min = h for lmanalysis in handLMsanalysis.landmark: x, y = int(lmanalysis.x * w), int(lmanalysis.y * h) if x > x_max: x_max = x if x < x_min: x_min = x if y > y_max: y_max = y if y < y_min: y_min = y y_min -= 20 y_max += 20 x_min -= 20 x_max += 20 analysisframe = cv2.cvtColor(analysisframe, cv2.COLOR_BGR2GRAY) analysisframe = analysisframe[y_min:y_max, x_min:x_max] analysisframe = cv2.resize(analysisframe,(28,28)) nlist = [] rows,cols = analysisframe.shape for i in range(rows): for j in range(cols): k = analysisframe[i,j] nlist.append(k) datan = pd.DataFrame(nlist).T colname = [] for val in range(784): colname.append(val) datan.columns = colname pixeldata = datan.values pixeldata = pixeldata / 255 pixeldata = pixeldata.reshape(-1,28,28,1)

此代码看起来与程序的最后一部分非常相似。这主要是因为两个部分中涉及生成边界框的过程是相同的。

然而，在代码的这个分析部分，我们使用OpenCV中的图像重塑功能将图像调整到边界框的尺寸，而不是在其周围创建一个视觉对象。

此外，我们还使用NumPy和OpenCV修改图像，使其具有与模型所训练的图像相同的特征。

我们还使用panda使用保存的图像中的像素数据创建一个数据帧，因此我们可以用与创建模型相同的方式规范数据。

机器视觉技术基础知识，高级计算机视觉的手语识别

最后，我们需要在处理后的图像上运行训练后的模型，并处理信息输出。

prediction = model.predict(pixeldata)predarray = np.array(prediction[0])letter_prediction_dict = {letterpred[i]: predarray[i] for i in range(len(letterpred))}predarrayordered = sorted(predarray, reverse=True)high1 = predarrayordered[0]high2 = predarrayordered[1]high3 = predarrayordered[2]for key,value in letter_prediction_dict.items(): if value==high1: print("Predicted Character 1: ", key) print('Confidence 1: ', 100*value) elif value==high2: print("Predicted Character 2: ", key) print('Confidence 2: ', 100*value) elif value==high3: print("Predicted Character 3: ", key) print('Confidence 3: ', 100*value)time.sleep(5)

在代码的这一部分中有很多信息。我们将逐一剖析这部分代码。

前两条线描绘了手部图像是Keras的任何不同类别的预测概率。数据以2个张量的形式呈现，其中第一个张量包含概率信息。张量本质上是特征向量的集合，非常类似于数组。该模型产生的张量是一维的，允许它与线性代数库NumPy一起使用，以将信息解析成更为Python的形式。

从这里开始，我们使用变量letterpred下先前创建的类列表来创建一个字典，将张量的值与关键字进行匹配。这允许我们将每个字符的概率与其对应的类进行匹配。

在这一步之后，我们使用列表生成式对值从最高到最低进行排序。这样，我们就可以获取列表中的前几项，并将它们指定为与所示手语图像最接近的3个字符。

最后，我们使用for循环循环遍历字典中的所有键：值对，以将最高值与其对应的键相匹配，并输出每个字符的概率。

机器视觉技术基础知识，高级计算机视觉的手语识别

如图所示，该模型准确地预测了从相机中显示的角色。除了预测特征，该程序还显示了CNN Keras模型分类的可信度。

所开发的模型可以以各种方式实现，主要用途是用于视频通话（如Facetime）的字幕设备。要创建这样的应用程序，模型必须逐帧运行，预测显示的符号。

该程序允许通过使用Keras图像分析模型，从手语到英语进行简单易行的交流。这个项目的代码可以在GitHub档案中找到，链接如下：

https://github.com/mg343/Sign-Language-Detection

2、机器视觉技术基础知识，你想了解的都在这里

计算机视觉十全大补丸

发起人Jia-Bin Huang

研究方向： physically grounded visual synthesis and analysis.

书

计算机视觉：

Computer Vision: Models, Learning, and

Inference- Simon J. D. Prince 2012

Computer Vision: Theory and Application - Rick Szeliski 2010

Computer Vision: A Modern Approach (2nd edition) - David Forsyth and Jean Ponce 2011

Multiple View Geometry in Computer Vision - Richard Hartley and Andrew Zisserman 2004

Computer Vision - Linda G. Shapiro 2001

Vision Science: Photons to Phenomenology - Stephen E. Palmer 1999

Visual Object Recognition synthesis lecture - Kristen Grauman and Bastian Leibe 2011

Computer Vision for Visual Effects - Richard J. Radke, 2012

High dynamic range imaging: acquisition, display, and image-based lighting - Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., Myszkowski, K 2010

Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics - Justin Solomon 2015

OpenCV Programming：

Learning OpenCV: Computer Vision with the OpenCV Library - Gary Bradski and Adrian Kaehler

Practical Python and OpenCV - Adrian Rosebrock

OpenCV Essentials - Oscar Deniz Suarez, Mª del Milagro

Fernandez Carrobles, Noelia Vallez Enano, Gloria Bueno Garcia, Ismael Serrano Gracia

机器学习：

Pattern Recognition and Machine Learning - Christopher M. Bishop 2007

Neural Networks for Pattern Recognition - Christopher M. Bishop 1995

Probabilistic Graphical Models: Principles and Techniques - Daphne Koller and Nir Friedman 2009

Pattern Classification - Peter E. Hart, David G. Stork, and Richard O. Duda 2000

Machine Learning - Tom M. Mitchell 1997

Gaussian processes for machine learning - Carl Edward Rasmussen and Christopher K. I. Williams 2005

Learning From Data- Yaser S. Abu-Mostafa, Malik Magdon-Ismail and Hsuan-Tien Lin 2012

Neural Networks and Deep Learning - Michael Nielsen 2014

Bayesian Reasoning and Machine Learning - David Barber, Cambridge University Press, 2012

基础：Linear Algebra and Its Applications - Gilbert Strang 1995

课程

计算机视觉：

EENG 512 / CSCI 512 - Computer Vision - William Hoff (Colorado School of Mines)

Visual Object and Activity Recognition - Alexei A. Efros and Trevor Darrell (UC Berkeley)

Computer Vision - Steve Seitz (University of Washington)

Visual Recognition - Kristen Grauman (UT Austin)

Language and Vision - Tamara Berg (UNC Chapel Hill)

Convolutional Neural Networks for Visual Recognition - Fei-Fei Li and Andrej Karpathy (Stanford University)

Computer Vision - Rob Fergus (NYU)

Computer Vision - Derek Hoiem (UIUC)

Computer Vision: Foundations and Applications - Kalanit Grill-Spector and Fei-Fei Li (Stanford University)

High-Level Vision: Behaviors, Neurons and Computational Models - Fei-Fei Li (Stanford University)

Advances in Computer Vision - Antonio Torralba and Bill Freeman (MIT)

Computer Vision - Bastian Leibe (RWTH Aachen University)

Computer Vision 2 - Bastian Leibe (RWTH Aachen University)

Computational Photography：

Image Manipulation and Computational Photography - Alexei A. Efros (UC Berkeley)

Computational Photography - Alexei A. Efros (CMU)

Computational Photography - Derek Hoiem (UIUC)

Computational Photography - James Hays (Brown University)

Digital & Computational Photography - Fredo Durand (MIT)

Computational Camera and Photography - Ramesh Raskar (MIT Media Lab)

Computational Photography - Irfan Essa (Georgia Tech)

Courses in Graphics - Stanford UniversityComputational Photography - Rob Fergus (NYU)

Introduction to Visual Computing - Kyros Kutulakos (University of Toronto)

Computational Photography - Kyros Kutulakos (University of Toronto)

Computer Vision for Visual Effects - Rich Radke (Rensselaer Polytechnic Institute)

Introduction to Image Processing - Rich Radke (Rensselaer Polytechnic Institute)

机器学习：

Machine Learning - Andrew Ng (Stanford University)

Learning from Data - Yaser S. Abu-Mostafa (Caltech)

Statistical Learning - Trevor Hastie and Rob Tibshirani (Stanford University)

Statistical Learning Theory and Applications - Tomaso Poggio, Lorenzo Rosasco, Carlo Ciliberto, Charlie Frogner, Georgios Evangelopoulos, Ben Deen (MIT)

Statistical Learning - Genevera Allen (Rice University)

Practical Machine Learning - Michael Jordan (UC Berkeley)

Course on Information Theory, Pattern Recognition, and Neural Networks - David MacKay (University of Cambridge)

Methods for Applied Statistics: Unsupervised Learning - Lester Mackey (Stanford)

Machine Learning - Andrew Zisserman (University of Oxford)

优化：

Convex Optimization I - Stephen Boyd (Stanford University)

Convex Optimization II - Stephen Boyd (Stanford University)

Convex Optimization - Stephen Boyd (Stanford University)

Optimization at MIT - (MIT)

Convex Optimization - Ryan Tibshirani (CMU)

未完待续

本文关键词：机器视觉技术及应用，机器视觉技术基础知识点，机器视觉基本知识，机器视觉技术介绍，机器视觉技术及应用实例详解。这就是关于《机器视觉技术基础知识，高级计算机视觉的手语识别》的所有内容，希望对您能有所帮助！更多的知识请继续关注《犇涌向乾》百科知识网站：http://www.029ztxx.com！