Template Matching with Synthetic Data using Detectron2

Mina
6 min readOct 6, 2023

--

Improving Template Matching with AI: Using synthetic data to find two-dimensional images in different settings.

Template matching, in simple terms, is a computer vision technique that involves comparing a small reference image (template) to different regions of a larger image to find instances where the template closely matches a portion of the larger image.

When it comes to template matching, there are straight forward approaches like CV2’s Template Matching function. However, these functions fall behind when the template image changes in size, orient and directions in the target images.

In this approach, the aim is to make use of a pretrained model (like Detectron2) for detection of two-dimensional templates varying in size orient and placement.

To demonstrate the code, the Tesla icon is used as a template image. Tesla icon is unique and simplistic in design. This results in low visual features, which is hard to capture using generic feature detection like AKAZE or ORB.

Here is a brief recap of the methodology:

  • Gather diverse stock images to act as backgrounds.
  • Randomly distort and place the template foregrounds to collected backgrounds. Create COCO annotations from the synthetic data.
  • Train the Detectron2 model on the annotated data.
  • Test the outcome and diversify the test data according to faulty cases.
  1. Create synthetic images from template images.

The main idea is to make template image present in different backgrounds as much as possible.

The distortions I have used are for scaling, skew and random rotation in both directions up to 30 degrees, since I do not expect the icon to appear upside down. It is also possible to add other optic distortions like Barrel, Pincushion and Mustache according to the template image and where you expect these templates to appear.

def rotate_foreground(foreground):
angle_degrees = random.randint(-30, 30)
return foreground.rotate(angle_degrees, resample=Image.BICUBIC, expand=True)

def scale_foreground(foreground):
scale = random.uniform(0.5, 1) # Pick something between .5 and 1
new_size = (int(foreground.size[0] * scale), int(foreground.size[1] * scale))
return foreground.resize(new_size, resample=Image.BICUBIC)

def skew_image(image):
width, height = image.size
skew_factor = random.uniform(-0.2, 0.2) # Adjust the skew factor range as needed

if skew_factor == 0:
return image

# Define the skew transformation matrix
matrix = (1.0, skew_factor, 0.0, 0.0, 1.0, 0.0)

# Apply the skew transformation
skewed_image = image.transform(image.size, Image.AFFINE, matrix, Image.BICUBIC)

return skewed_image

Randomly distort and place the template image and create COCO annotations :

def main_function(foreground_path, background_path, annotation_id, image_id):
foreground, background = load_and_process_images(foreground_path, background_path)

foreground = rotate_foreground(foreground)
foreground = scale_foreground(foreground)
foreground = skew_image(foreground)

# Choose a random x,y position for the foreground
max_xy_position = (background.size[0] - foreground.size[0], background.size[1] - foreground.size[1])
assert max_xy_position[0] >= 0 and max_xy_position[1] >= 0, \
'foreground {} is to big for the background {}'.format(foreground_path, background_path)
paste_position = (random.randint(0, max_xy_position[0]), random.randint(0, max_xy_position[1]))

# Create a new foreground image as large as the background and paste it on top
new_foreground = Image.new('RGBA', background.size, color = (0, 0, 0, 0))
new_foreground.paste(foreground, paste_position)

# Extract the alpha channel from the foreground and paste it into a new image the size of the background
alpha_mask = foreground.getchannel(3)
new_alpha_mask = Image.new('L', background.size, color=0)
new_alpha_mask.paste(alpha_mask, paste_position)
composite = Image.composite(new_foreground, background, new_alpha_mask)

# Grab the alpha pixels above a specified threshold
alpha_threshold = 200
mask_arr = np.array(np.greater(np.array(new_alpha_mask), alpha_threshold), dtype=np.uint8)
hard_mask = Image.fromarray(np.uint8(mask_arr) * 255, 'L')


contours = measure.find_contours(mask_arr, 0.5, positive_orientation='high')
segmentations = []
polygons = []
for contour in contours:
for i in range(len(contour)):
row, col = contour[i]
contour[i] = (col - 1, row - 1)

# Make a polygon and simplify it
poly = Polygon(contour)
poly = poly.simplify(1.0, preserve_topology=False)
polygons.append(poly)
segmentation = np.array(poly.exterior.coords).ravel().tolist()
segmentations.append(segmentation)

#Create Annotations
nz = np.nonzero(hard_mask)

x_min = int(np.min(segmentations[1]))
x_max = int(np.max(segmentations[1]))
y_min = int(np.min(segmentations[0]))
y_max = int(np.max(segmentations[0]))

bbox = [x_min, y_min, x_max-x_min, y_max-y_min]

multi_poly = MultiPolygon(polygons)
x, y, max_x, max_y = multi_poly.bounds
width = max_x - x
height = max_y - y
bbox = (x, y, width, height)
area = multi_poly.area


annotation = {
'segmentation': segmentations,
'iscrowd': 0,
'image_id': image_id,
'category_id': 1,
'id': annotation_id,
'bbox': bbox,
'area': area
}

return composite, hard_mask, bbox, annotation

Use the main function to create the masks, images and annotations:

dataset_dir = './'
backgrounds_dir = os.path.join(dataset_dir, 'background_images/')
foregrounds_dir = os.path.join(dataset_dir, 'logo_templates/')
foregrounds = ['logo_templates/logo_1.png', 'logo_templates/logo_2.png', 'logo_templates/logo_3.png']
background_dir = 'background_images/'
# Create an output directory
output_dir = os.path.join(dataset_dir, 'generated')
limit_data = 300 # Amount of images to be created

try:
os.mkdir(output_dir)
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
pass

# Create a list to keep track of images and mask annotations
all_annotations = []
images = []

i = 0
# Loop over all background images
for background_file in os.listdir(background_dir):
if not background_file.endswith(".jpg"):
continue

background_path = os.path.join(background_dir, background_file)
foreground_path = random.choice(foregrounds)
composite, mask, bbox, annons = main_function(foreground_path, background_path, i, i)

x = 'images2/image_{0:04d}.png'.format(i)
composite_path = os.path.join(output_dir, 'images2/image_{0:04d}.png'.format(i))
composite.save(composite_path)

mask_path = os.path.join(output_dir, 'masks2/mask_{0:04d}.png'.format(i))
mask.save(mask_path)
all_annotations.append(annons)
wb, hb = composite.size
image = {

"file_name": x,
"id": i,
"width": wb,
"height": hb
}
images.append(image)
i = i+1
if i == limit_data:
break

Here are some examples of synthetic data:

Stock photo of hills, Tesla icon placed randomly.
Stock Photo with Tesla Icon | image: unknown
Stock photo of a spider, Tesla icon placed randomly.
Stock Photo with Tesla Icon | image: unknown
Stock photo of sea, Tesla icon placed randomly.
Stock Photo with Tesla Icon| image: unknown

Export the annotations as JSON:

class NpEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
if isinstance(obj, np.floating):
return float(obj)
if isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)

categories = [{
"id" : 1,
"name": "Logo 1",
"supercategory": "Logos"
}]

coco_file = {
"annotations" : all_annotations,
"images" : images,
"categories" : categories

}


with open("./generated/coco_annotations_.json", 'w') as f:
json.dump(coco_file, f, cls=NpEncoder)

2. Train the model with the annotations:

Detectron2 is built by Facebook AI Research, more information can be found in their GitHub page: https://github.com/facebookresearch/detectron2

!git clone https://github.com/facebookresearch/detectron2 detectron2_repo
pip install -e detectron2_repo
register_coco_instances("tesla_detect", {}, "coco_annotations_.json", ".")
_metadata = MetadataCatalog.get("tesla_detect")
dataset_dicts = DatasetCatalog.get("tesla_detect")

#Create the config, all metrics can be tweaked according to the dataset at hand.
#Since the training is done on CPU, this step will take some time.
cfg.merge_from_file("./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("tesla_detect",)
cfg.DATASETS.TEST = () # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl" # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 300
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 #
cfg.MODEL.DEVICE = "cpu" # For larger datasets, gpu training is advised

#Train the model
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

3. Then we can get the predictions on test data followed by some examples from the outcome:

im_path = "test_data/14.jpg"  
im = cv2.imread(im_path)
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1],
#metadata=_metadata,
scale=2.0,
#instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels
)

v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.imshow(v.get_image()[:, :, ::-1])

print(outputs["instances"].pred_boxes[0])
print(outputs["instances"].scores[0])
print(float(list(outputs["instances"].pred_boxes[0])[0][0]))

Here are some outcomes from unseen data:

Tesla icon on a wallet, identified by a bounding box.
Tesla Wallet BGR| image: ebayimg.com
American car brands, Tesla icon identified in a bounding box.
American Car Brands | image: rent.go
Tesla logo, icon identified by a bounding box.
Tesla Logo | image : dspncdn.com
Tesla headquarters, skewed Tesla icon identified by a bounding box.
Tesla BGR | image: car-images

As can be seen from below, training data needs more skewed images with rotation to more accurately identify the below types of images:

Tesla car, icon misidentified by a bounding box.
Tesla Wallpaper | image: enjpg.com

As seen, multiple iterations might be necessary to create quality synthetic images which will fit the test cases better.

Synthesis of data can also be augmented with identification of regions of interests on the background images rather than random placements; in order to place templates in more possible areas to avoid confusions such as below:

Inside of Tesla 3, one Tesla icon is detected correctly.
Inside of Tesla 3 | image: Tesla

All in all, although this approach is not usable for all image detection cases, it can be useful for detecting two-dimensional target images in photographs, advertisements and documents without manual training.

References

@misc{wu2019detectron2,
author = {Yuxin Wu and Alexander Kirillov and Francisco Massa and
Wan-Yen Lo and Ross Girshick},
title = {Detectron2},
howpublished = {\url{https://github.com/facebookresearch/detectron2}},
year = {2019}
}

--

--

Mina

Coming from a graphic design background, I am a Computer Engineer currently working as a Data Scientist/DataOps Engineer.