Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation

1Tencent YouTu Lab,   2CATL
[Paper] [Supplementary Material] [Slides] [Poster] [Code]

Abstract


This paper focus on few-shot object detection (FSOD) and instance segmentation (FSIS), which requires a model to quickly adapt to novel classes with a few labeled instances. The existing methods severely suffer from bias classification because of the missing label issue which naturally exists in a few-shot scenario and is first formally proposed by us. Our analysis suggests that the standard classification head of most FSOD or FSIS models needs to be decoupled to mitigate the bias classification. Therefore, we propose an embarrassingly simple but effective method that decouples the standard classifier into two heads. Then, these two individual heads are capable of independently addressing clear positive samples and noisy negative samples which are caused by the missing label. In this way, the model can effectively learn novel classes while mitigating the effects of noisy negative samples. Without bells and whistles, our model without any additional computation cost and parameters consistently outperforms its baseline and state-of-the-art by a large margin on PASCAL VOC and MS-COCO benchmarks for FSOD and FSIS tasks.

Main Results

Task1: Few-shot object detection

Task2: Few-shot instance segmentation

Disscussion

results visualization.

Visualization results of our method and the strong baseline (Mask-DeFRCN) on MS-COCO validation images under the gFSIS setting with K=10.


These bounding boxes and segmentation masks are visualized using scores larger than 0.6. The top two rows show success cases with our method and the baseline while the middle two rows show success cases with our method but partly failure ones with the baseline. The baseline may tend to incorrectly recognize positive object regions as background due to the biased classification. The bottom row shows some failure cases from left to right, small objects (e.g., the small boats and the person), coarse boundary segmentation (e.g., the surfer), occlusion (e.g., two bears are detected to one), and misclassification of similar appearance objects (e.g., the shadow of wine glass is recognized to wine glass and the train is detected to bus).

Reduce bias classification.

The decoupling classifier is helpful to mitigate the bias classification thus boosting the FSOD and FSIS performance.
Comparison on mRecall and Recall of the proposed decoupling classifier (DC) and standard classification head (CE) under FSIS and gFSIS settings. The mean and standard deviation results are computed on all 10 seeds for each shot.

Core Code with Pytorch

The proposed decoupling classifier is very simple (core implementation only uses one line of code, Eq. 8) but really effective (e.g., 5.6+ AP50 improvements for 5-shot detection and 4.5+ AP50 improvements for 5-shot instance segmentation on challenging MS-COCO.)
def dc_loss(x, y, m):
	"""
	Compute loss for the decoupling classifier.
	Return scalar Tensor for single image.

	Args:
		x: predicted class scores in [-inf, +inf], x's size: N x (1+C), where N is the 
			number of region proposals of one image.
		y: ground-truth classification labels in [0, C-1], y's size: N x 1, where [0,C-1] 
			represent foreground classes and C-1 represents the background class.
		m: image-level label vector and its element is 0 or 1, m's size: 1 x (1+C)

	Returns:
		loss
	"""

	# background class index
	N = x.shape[0]
	bg_label = x.shape[1]-1

	# positive head
	pos_ind = y!=bg_label
	pos_logit = x[pos_ind,:]
	pos_score = F.softmax(pos_logit, dim=1) # Eq. 4
	pos_loss = F.nll_loss(pos_score.log(), y[pos_ind], reduction="sum") #Eq. 5

	# negative head
	neg_ind = y==bg_label
	neg_logit = x[neg_ind,:]
	neg_score = F.softmax(m.expand_as(neg_logit)*neg_logit, dim=1) #Eq. 8
	neg_loss = F.nll_loss(neg_score.log(), y[neg_ind], reduction="sum")  #Eq. 9

	# total loss
	loss = (pos_loss + neg_loss)/N #Eq. 6

return loss

Citation

@inproceedings{gao2022dc,
	title={Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation},
	author={Gao, Bin-Bin and Chen, Xiaochen and Huang, Zhongyi and Nie, Congchong and Liu, Jun and Lai, Jinxiang and Jiang, Guannan and Wang, Xi and Wang, Chengjie},
	booktitle={Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)},
	pages={--},
	year={2022}
	}  

Contact


Please contact Bin-Bin Gao (email) for questions about the paper.