图片中的物体探测一直是比较热的课题,通过物体的检测可以对图片自动生成标签以及分类,可以很好的用在图片检索的领域。在做的毕业设计(也是论文)就涉及到这方面的。今天看一篇叫Class-Specific Hough Forests for Object Detection的state of the art,发现原来导师叫我用到的方法和这个差不多,只是增加了某些小trick用于对特定的数据集优化。
Hough Forests。 其实就是一个random forests 也就是随机森林,每条数据是一小块图片的description(patch) 树的每个节点对数据的分割通过information gain,努力把相同类的patch和相似的patch分到同一个枝头上去。每次通过数据集,随机采集部分features用于分割,每个这些features的分割点生成一个pool。然后对每个feature的分割点做一个信息量的检测,就像一个标准的random forests一样, 这里信息量的计算通过一下程式(这么说比较台湾腔 夯):
$$
Entropy({c_i}) = -c\cdot \log c -(1-c)\cdot \log(1-c)
$$
这里c就是patch是属于物体c的概率,1-c就是negative cases。o
简单吧,其实在原来的论文里还有一个东西,就是entropy只是衡量split的优差的其中一个feature,还有一个feature是offset也就是patch到中心点的距离,通过增加这个衡量点,相似距离的patch会在一起哦。测试的结果是,加上这个feature,precision会有很好的提高。
怎么计算这个feature,其实很简单,用Root mean square(RMS)把split后的patch到中心点的距离计算下,split之后小的RMS更优。到达叶子的数据,每个patch会有一次投票的机会给出中心点的位置。
训练之后,每个测试图片会被分成不同的patch,通过hough forests 到达叶子的patch生成不同中心点的位置(根据在那叶子里的中心点patch). 最后,通过投票,某个物体的中心点的票数会很高,就这样我们就能找到一个物体的中心点了。 如果一个patch是不是物体的一部分,生成的中心点是错误的,但是中心点的位置其实是随机的,也就是说如果有很多很多非物体patch生成的map是没有一个明显的中心点的。而如果有很多物体中某部位的patch,生成出来的中心点是有一个很大的偏向性的,往往会往某一个中心聚拢。
这里有一个问题就是怎么计算一个patch到中心点的位置。首先训练数据一定得是每张图片只有一个物体。每个patch计算自己中心到物体的中心并保存x轴和y轴的移动。
道理很简单,给我的代码。
opencv这个库让我又爱又恨啊,很好用,很多很复杂的算法只要一行代码就ok了,但是python binding里有很多bug。而且很多错误你不一行行代码拆下来是不会看到的。。。
首先,不想使用论文里的hough forest,因为感觉上在opencv里你得自己hack下,不值得。第二貌似没找到patch的descriptor,所以在自己的实验里,只用到了特征点的descriptors,像sift和surf。使用这个的有点就是,可以做到orientation insensitive和scale insensitive。 其实,论文里的方法是没有能做到scale insensitive的,因为距离是个relative的东西,论文里,通过使用不同的scale进行测试,获得最优的结果作为最终结果。 而view insensitive更没有解决方法,按他们的方法,只有通过增加不同view的训练数据来弥补吧。
但是!导师教给我的方法能避免这个问题。首先sift和surf本身是view insensitive的,而在计算距离的时候通过对特征点本身的大小对距离做个归一化,这样就能让系统获得对不同大小的物体的识别能力。
另外一个trick就是sift之类的特征点自己会带有gradient orientation,也就是梯度方向,这个信息是在计算距离的时候必须使用到的。因为相同descriptor的特征点可能会有不同的梯度方向,如果你只使用了图片原始的x和y轴保存距离的话,预测的中心点会偏离本来的实际位置。下面的图片就是个case:
我有一个圆圈,底部是白色的,如果是sift当特征点的检测的话,每个圆圈边上的点的特征值是一样的,而他们的梯度方向可能会不一样,如果不使用梯度方向的话,a作为训练的数据,而比作为测试数据,中心点预测的结果会在圆圈的外面(细线部分)。
如果要避免以上问题,只有通过rotation matrix 让每个特征点更具自己的梯度方向生成自己的x和y轴。 并且每个特征点的相对于中心点的距离要在新的x和y轴上进行计算。
如果需要从新轴返回到原始的x y轴只需要把angle乘以-1就可以了, 上面的angle是以radian为单位的。
假设上图是更具特征点的梯度方向做的x和y轴的转换,转换之后,可以发现点离圆圈的中心的距离不变,但是在x轴和y轴的移动发生了变化。
下面是测试方法的代码,包括训练和测试。数据集的话是使用ucic里的汽车数据,训练集里有positive cases 和negative cases,这里我只用到了positive cases。也没有是用到hough forest做训练,只用到了最近友邻。 每次选择最相近的友邻做出投票。在训练数据里,所有的车子中心点都在图的当中,因为每张图片里的车子几乎撑满整张图,而所有图片的尺寸都是40x100,在训练数据里,有两部分,一部分是没有scale变化的车子数据库,而另外一部分里车子的大小会有变化,用于测试方法对于scale变化的鲁棒性。
下面是python代码
import cv2
import math
import sys
import os
import numpy as np
kernel=np.array([[0,0,0,0,0,0,0,0.0002,0.0009,0.0014,0.0016,0.0014,0.0009,0.0002,0,0,0,0,0,0,0],[0,0,0,0,0,0.0005,0.0021,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0021,0.0005,0,0,0,0,0],[0,0,0,0.0000,0.0016,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0016,0.0000,0,0,0],[0,0,0.0000,0.0020,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0020,0.0000,0,0],[0,0,0.0016,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0016,0,0],[0,0.0005,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0005,0],[0,0.0021,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0021,0],[0.0002,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0002],[0.0009,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0009],[0.0014,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0014,],[0.0016,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0016],[0.0014,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0014],[0.0009,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0009],[0.0002,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0002],[0,0.0021,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0021,0],[0,0.0005,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0005,0,],[0,0,0.0016,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0016,0,0],[0,0,0.0000,0.0020,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0020,0.0000,0,0],[0,0,0,0.0000,0.0016,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0016,0.0000,0,0,0],[0,0,0,0,0,0.0005,0.0021,0.0031,0.0032,0.0032,0.0032,0.0032,0.0032,0.0031,0.0021,0.0005,0,0,0,0,0],[0,0,0,0,0,0,0,0.0002,0.0009,0.0014,0.0016,0.0014,0.0009,0.0002,0,0,0,0,0,0,0]])
FLANN_INDEX_KDTREE = 1 # bug: flann enums are missing
AUTOTUNED = 255
flann_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 4)
def get_key_points_from_img(img):
if isinstance(img,str):
img = cv2.imread(img)
im = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
surf = cv2.SURF(1000, 4, 2, True)
keypoints, descriptors = surf.detect(im, None, False)
l_blue = cv2.cv.RGB(200, 200, 250)
descriptors.shape = (-1, surf.descriptorSize())
# y x
center = (im.shape[0]/2,im.shape[1]/2)
attributes = []
for i in range(len(keypoints)):
kp = keypoints[i]
# (x y)
scale = kp.size
#cv2.circle(im, (int(kp.pt[0]), int(kp.pt[1])), int(kp.size/2), l_blue, 2, cv2.CV_AA)
gradient = math.radians(kp.angle)
#calculate the distances x and y in the new axis(rotated by the gradient)
d_x = center[1]-kp.pt[0]
d_y = center[0]-kp.pt[1]
tx = math.cos(gradient) * d_x - math.sin(gradient) * d_y
tx /= scale
ty = math.sin(gradient) * d_x + math.cos(gradient) * d_y
ty /= scale
# generate data for further experiments
attributes.append((kp.pt[0], kp.pt[1], scale, gradient, ty, tx, 0, 0, 0, 0, 1))
return (descriptors,attributes)
if __name__ == "__main__":
dir_path = "./CarData/TrainImages/"
files = os.listdir(dir_path)
#build train dataset
train_descriptors = []
train_attributes = []
for f in files:
if f.startswith("pos"):
(descriptors, attributes) = get_key_points_from_img(dir_path+f)
train_descriptors.extend(descriptors)
train_attributes.extend(attributes)
print len(train_descriptors)
train_descriptors = np.array(train_descriptors)
#build pnn model
flann_index = cv2.flann_Index(train_descriptors, flann_params)
dir_path = "./CarData/TrainImages/"
dir_path = "./CarData/TestImages/"
dir_path = "./CarData/TestImages_Scale/"
files = os.listdir(dir_path)
count1 = 0
count2 = 0
for f in files:
img = cv2.imread(dir_path+f)
(descriptors, attributes) = get_key_points_from_img(img)
im = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
mapa = np.zeros(im.shape, dtype=float)
indexes, dist = flann_index.knnSearch(descriptors, 15, params={})
for i in range(len(descriptors)):
for j in range(len(indexes[i])):
scale = attributes[i][2]
gradient = attributes[i][3]
d_y = train_attributes[indexes[i][j]][4]*scale
d_x = train_attributes[indexes[i][j]][5]*scale
tx = math.cos(-gradient) * d_x - math.sin(-gradient) * d_y
ty = math.sin(-gradient) * d_x + math.cos(-gradient) * d_y
predicted_center_x = int(round(attributes[i][0]+tx))
predicted_center_y = int(round(attributes[i][1]+ty))
if (predicted_center_x>=0) and (predicted_center_y>=0) and (predicted_center_x<mapa.shape[1]) and (predicted_center_y<mapa.shape[0]):
mapa[predicted_center_y][predicted_center_x] += 1
count1 += 1
else:
count2 += 1
#apply a matlab disk filter to the output map for visualization
mapa2 = cv2.filter2D(mapa,-1,kernel)*5
cv2.imshow("original",im)
cv2.imshow("image", mapa2)
cv2.waitKey()
一大坨kernel变量,其实就是matlab的disk filter: fspecial(‘disk’, 20) opencv里就没有合适的filter可以用来平滑图片的!。 这里使用到了flann用来做最近友邻的模型,由于数据不是特别大,所以这里也没什么优化。在训练的时候,会把每个特征点距离到中心点的距离计算一下,计算完会根据特征点的大小把距离缩放一下,而当在测试的时候也会根据每个特征点的大小把距离放大。这样就能达到scale insensitive的目的了。 每个特征点会根据友邻做投票,投票的结果会映射到一个和图片一样大小的map上面。 通过filter2d过滤之后,会出现一大坨模糊的东西,哪里最亮哪里的投票数就最多,是中心点的可能性也最大。
下面就是一些实验图片的结果。
貌似结果很好的,但是其实不是的,因为这里没用到false cases,会出现很多false positive,而且,由于是使用sift detector, 本身控制不了哪些特征点是需要的哪些是不需要的,在测试后的时候,如果测试图片的某个非物体部位的特征点很多的话(细节比较多),会产生很多假票, 像最后一张图的两根柱子。在下次的试验力,一定得吧那些false cases包含到训练库里,当测试数据碰到的一个false case的票,直接把权值归0,这样的话应该能对结果有很大的改观。还有就是这里,每张投票的权值都是1,可以根据友邻的距离设置投票的权值,近的投票全职大,这样结果应该会好一些。
opencv用起来真的没有matlab舒服,就连一个类似matlab的imregionalmax函数都没有,这个函数可以计算local maximum,这样的话,选取峰值最高的几个点作为预测结果会有很理想的结果。所以这套代码在这里也不能用来获得和真实中心点做比较。