The Operations You Will Use in Every Project
In the last post we covered how images are stored as NumPy arrays and what that means for how we work with them. Now we can start actually doing things with those arrays. Resizing, cropping, flipping, rotating, drawing on top of images. These are not the flashy parts of computer vision, but they are the ones you reach for constantly. Preprocessing a dataset, preparing input for a model, annotating results, all of it comes back to these basics.
Table Of Content
None of this is complicated, but there are a few details worth knowing so you do not run into surprises later.
Resizing
Resizing is probably the single most common operation in any computer vision pipeline. Models expect fixed input sizes, thumbnails need specific dimensions, and sometimes you just need to scale something down to make it faster to process. OpenCV handles it with cv2.resize():
1 2 3 4 5 6 7 | import cv2 image = cv2.imread("photo.jpg") resized = cv2.resize(image, (320, 240)) print(resized.shape) # Output: (240, 320, 3) |
Notice the argument order: cv2.resize() takes the target size as (width, height), but image.shape returns (height, width, channels). That swap is consistent throughout OpenCV and it catches people off guard at least once. Width first when you set a size, height first when you read a shape.
If you want to resize by a scale factor instead of an absolute size, you can use the fx and fy parameters:
1 | half = cv2.resize(image, (0, 0), fx=0.5, fy=0.5) |
Passing (0, 0) as the target size tells OpenCV to ignore it and use the scale factors instead. This halves both dimensions. You can also use different values for fx and fy if you want to stretch or squash the image, though that is rarely what you want.

Cropping
Cropping does not have its own function in OpenCV because it does not need one. Since the image is a NumPy array, you just slice it:
1 | cropped = image[50:200, 100:400] |
The format is image[y1:y2, x1:x2], where y1 and y2 are the top and bottom row, and x1 and x2 are the left and right column. So the example above cuts a region starting 50 pixels from the top, ending at row 200, between columns 100 and 400.
One thing to watch out for: the result is a view into the original array, not a copy. If you modify cropped, you will also modify image. If that is not what you want, add a .copy():
1 | cropped = image[50:200, 100:400].copy() |

Flipping
Flipping is straightforward with cv2.flip(). You pass the image and a flip code: 0 flips vertically, 1 flips horizontally, and -1 flips both at once:
1 2 3 | flipped_h = cv2.flip(image, 1) # horizontal flipped_v = cv2.flip(image, 0) # vertical flipped_hv = cv2.flip(image, -1) # both |
Horizontal flipping in particular shows up a lot in data augmentation, randomly flipping training images is a cheap way to double your dataset and make a model less sensitive to left-right orientation.

Rotating
Rotating an image takes a couple more steps than the other operations. OpenCV uses an affine transformation for this, which sounds fancier than it is. You first build a rotation matrix with cv2.getRotationMatrix2D(), then apply it with cv2.warpAffine():
1 2 3 4 5 | h, w = image.shape[:2] center = (w // 2, h // 2) matrix = cv2.getRotationMatrix2D(center, 45, 1.0) rotated = cv2.warpAffine(image, matrix, (w, h)) |
getRotationMatrix2D takes three arguments: the center point to rotate around, the angle in degrees (positive values rotate counter-clockwise), and a scale factor where 1.0 keeps the original size. The last argument in warpAffine is the output size, here kept the same as the original.
Worth knowing: when you rotate an image that is not square, the corners will get cut off unless you resize the output canvas to fit. For most use cases keeping the same size is fine, but if you need the full rotated image without clipping, you will need to calculate the new bounding box size and adjust accordingly.

Drawing on Images
OpenCV has a set of drawing functions that write directly onto the image array. They are useful for visualizing results, marking detections, or just annotating what your algorithm found. All of them modify the image in place, so work on a copy if you want to keep the original.
Drawing a rectangle, the most common one by far since bounding boxes are everywhere in computer vision:
1 | cv2.rectangle(image, (100, 50), (300, 200), (0, 255, 0), 2) |
The arguments are: the image, top-left corner, bottom-right corner, color in BGR, and line thickness. Passing -1 as thickness fills the rectangle instead of just drawing the outline.
Drawing a circle:
1 | cv2.circle(image, (200, 150), 50, (255, 0, 0), 2) |
Center point, radius, color, thickness. Same pattern as rectangle, same -1 trick for a filled circle.
And putting text on an image:
1 | cv2.putText(image, "Hello", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2) |
The arguments are: image, text string, bottom-left position of the text, font, font scale, color, and thickness. FONT_HERSHEY_SIMPLEX is the most commonly used font, clean and readable at most sizes.

Blurring
Blurring is one of those operations that seems cosmetic but is actually used heavily in real pipelines. The reason is noise. Real-world images, especially from cameras or scanned documents, carry high-frequency noise that can interfere with edge detection, thresholding, and other algorithms that are sensitive to sharp pixel transitions. Applying a blur before those steps smooths out the noise and makes the results more stable.
The most common choice is Gaussian blur, which applies a weighted average to each pixel using a bell-shaped kernel. Pixels closer to the center contribute more, pixels further away contribute less. The result is a smooth, natural-looking blur:
1 | blurred = cv2.GaussianBlur(image, (15, 15), 0) |
The second argument is the kernel size, which controls how much area around each pixel is included in the average. It must be an odd number, so (3, 3), (5, 5), (15, 15), and so on. Larger kernels produce stronger blur. The third argument is the standard deviation of the Gaussian, passing 0 tells OpenCV to calculate it automatically from the kernel size, which is usually fine.
If you ever see a detection algorithm producing jittery or inconsistent results on clean input, a small Gaussian blur applied beforehand is often one of the first things worth trying.

Wrapping Up
None of these operations are complex on their own, but that is kind of the point. The real skill is knowing which one to reach for and when. You will rarely use just one in isolation. A typical preprocessing step might resize the image, crop a region of interest, convert to grayscale, and apply a blur, all before the actual algorithm even runs.
Get comfortable with these. They will show up in pretty much everything from here on.
See you in the next one.





No Comment! Be the first one.