Menu

© 2026 Furkanul Islam

}
{
</>

Computer Vision Projects That Changed My Perspective: Real-World Applications

Explore real-world computer vision applications from defect detection to medical imaging. Learn practical insights on YOLO, U-Net, model deployment, and lessons from production CV systems.

Computer vision has been one of the most fascinating areas of my AI journey. The ability to teach machines to “see” and interpret visual information feels like magic, even after years of working with it. In this post, I’ll share three computer vision projects that profoundly shaped my understanding, along with the technical details and lessons learned.

Project 1: Real-Time Defect Detection for Manufacturing

The Challenge

A manufacturing client needed to automatically detect surface defects on metal components moving along a production line. The requirements were demanding:

  • Speed: Process 60+ components per minute
  • Accuracy: 95%+ defect detection rate
  • False positives: Less than 2% (to avoid unnecessary manual inspections)
  • Defect types: Scratches, dents, pits, cracks (varying sizes from 0.5mm to 10mm)

The Solution Architecture

┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
│   Industrial │ -> │   Image      │ -> │   YOLOv5    │ -> │   PLC        │
│   Camera     │    │   Capture    │    │   Inference │    │   Reject     │
│   (5MP)      │    │   System     │    │   (50ms)    │    │   Actuator   │
└─────────────┘    └──────────────┘    └─────────────┘    └──────────────┘

Data Collection and Annotation

import cv2
import numpy as np
from pathlib import Path
import xml.etree.ElementTree as ET

class DefectDataCollector:
    """Collect and annotate defect images from production line."""

    def __init__(self, image_dir, annotation_dir):
        self.image_dir = Path(image_dir)
        self.annotation_dir = Path(annotation_dir)

    def capture_from_camera(self, camera_id=0, num_samples=1000):
        """Capture images from industrial camera."""
        cap = cv2.VideoCapture(camera_id)

        # Configure camera for optimal capture
        cap.set(cv2.CAP_PROP_FRAME_WIDTH, 2592)  # 5MP
        cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 1944)
        cap.set(cv2.CAP_PROP_EXPOSURE, -6)  # Fixed exposure
        cap.set(cv2.CAP_PROP_GAIN, 10)

        for i in range(num_samples):
            ret, frame = cap.read()
            if ret:
                # Apply preprocessing for better defect visibility
                enhanced = self._enhance_image(frame)
                cv2.imwrite(str(self.image_dir / f"defect_{i:04d}.jpg"), enhanced)

        cap.release()

    def _enhance_image(self, image):
        """Enhance image for better defect detection."""
        # Convert to grayscale
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # Apply CLAHE (Contrast Limited Adaptive Histogram Equalization)
        clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8, 8))
        enhanced = clahe.apply(gray)

        # Denoise
        denoised = cv2.fastNlMeansDenoising(enhanced, None, 10, 7, 21)

        return cv2.cvtColor(denoised, cv2.COLOR_GRAY2BGR)

    def convert_to_yolo_format(self, xml_file, img_width, img_height):
        """Convert Pascal VOC annotations to YOLO format."""
        tree = ET.parse(xml_file)
        root = tree.getroot()

        annotations = []
        for obj in root.findall('object'):
            class_name = obj.find('name').text
            bbox = obj.find('bndbox')

            # Convert to YOLO format (x_center, y_center, width, height) normalized
            xmin = float(bbox.find('xmin').text)
            ymin = float(bbox.find('ymin').text)
            xmax = float(bbox.find('xmax').text)
            ymax = float(bbox.find('ymax').text)

            x_center = (xmin + xmax) / 2 / img_width
            y_center = (ymin + ymax) / 2 / img_height
            width = (xmax - xmin) / img_width
            height = (ymax - ymin) / img_height

            class_id = self._class_name_to_id(class_name)
            annotations.append(f"{class_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}")

        return annotations

    def _class_name_to_id(self, name):
        class_map = {
            'scratch': 0,
            'dent': 1,
            'pit': 2,
            'crack': 3
        }
        return class_map.get(name, -1)

Model Training with YOLOv5

import torch
import yaml
from pathlib import Path

class DefectDetector:
    """Train and run inference with YOLOv5 for defect detection."""

    def __init__(self, model_size='medium', img_size=640):
        self.img_size = img_size
        self.model_size = model_size
        self.model = None

    def create_dataset_yaml(self, dataset_path, classes):
        """Create YAML configuration for YOLOv5 training."""
        config = {
            'path': str(dataset_path),
            'train': 'images/train',
            'val': 'images/val',
            'test': 'images/test',
            'nc': len(classes),
            'names': classes
        }

        yaml_file = Path(dataset_path) / 'dataset.yaml'
        with open(yaml_file, 'w') as f:
            yaml.dump(config, f, default_flow_style=None)

        return yaml_file

    def train(self, dataset_path, epochs=100, batch_size=16):
        """Train YOLOv5 model."""
        classes = ['scratch', 'dent', 'pit', 'crack']
        yaml_path = self.create_dataset_yaml(dataset_path, classes)

        # Load YOLOv5 model
        self.model = torch.hub.load(
            'ultralytics/yolov5',
            f'yolov5{self.model_size}',  # Options: n, s, m, l, x
            pretrained=True
        )

        # Training configuration
        train_params = {
            'data': str(yaml_path),
            'epochs': epochs,
            'batch_size': batch_size,
            'imgsz': self.img_size,
            'workers': 8,
            'optimizer': 'AdamW',
            'lr0': 0.01,
            'lrf': 0.1,
            'warmup_epochs': 5,
            'warmup_momentum': 0.8,
            'box': 0.05,  # Box loss gain
            'cls': 0.5,   # Class loss gain
            'cls_pw': 1.0,
            'obj': 1.0,
            'obj_pw': 1.0,
            'iou_t': 0.2,
            'anchor_t': 4.0,
            'fl_gamma': 0.0,
            'hsv_h': 0.015,
            'hsv_s': 0.7,
            'hsv_v': 0.4,
            'degrees': 0.0,
            'translate': 0.1,
            'scale': 0.5,
            'shear': 0.0,
            'perspective': 0.0,
            'flipud': 0.0,
            'fliplr': 0.5,
            'mosaic': 1.0,
            'mixup': 0.0,
        }

        # Train
        results = self.model.train(**train_params)

        # Save best model
        self.model.save('defect_detection_best.pt')

        return results

    def inference(self, image_path, confidence_threshold=0.5):
        """Run inference on single image."""
        if self.model is None:
            self.model = torch.hub.load('ultralytics/yolov5', 'yolov5m', pretrained=False)
            self.model.load('defect_detection_best.pt')

        # Run inference
        results = self.model(image_path)

        # Filter by confidence
        detections = results.pandas().xyxy[0]
        high_conf_detections = detections[detections['confidence'] > confidence_threshold]

        return high_conf_detections

    def inference_on_video(self, video_path, output_path):
        """Run inference on video stream."""
        cap = cv2.VideoCapture(video_path)

        # Get video properties
        fps = cap.get(cv2.CAP_PROP_FPS)
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

        # Video writer
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

        frame_count = 0
        while True:
            ret, frame = cap.read()
            if not ret:
                break

            # Run inference
            results = self.model(frame)

            # Draw detections
            annotated_frame = results.render()[0]

            # Write output
            out.write(annotated_frame)
            frame_count += 1

            if frame_count % 100 == 0:
                print(f"Processed {frame_count} frames")

        cap.release()
        out.release()

Deployment with TensorRT Optimization

import torch
import tensorrt as trt
import numpy as np

class TensorRTDeploy:
    """Deploy optimized model with TensorRT."""

    def __init__(self, onnx_path, engine_path):
        self.onnx_path = onnx_path
        self.engine_path = engine_path
        self.logger = trt.Logger(trt.Logger.WARNING)
        self.engine = None
        self.context = None

    def build_engine(self, max_batch_size=8, fp16_mode=True):
        """Build TensorRT engine from ONNX."""
        builder = trt.Builder(self.logger)
        network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
        parser = trt.OnnxParser(network, self.logger)

        # Parse ONNX model
        with open(self.onnx_path, 'rb') as f:
            if not parser.parse(f.read()):
                for error in range(parser.num_errors):
                    print(f"TensorRT Error: {parser.get_error(error)}")
                return None

        # Configure builder
        builder.max_batch_size = max_batch_size
        config = builder.create_builder_config()
        config.max_workspace_size = 1 << 30  # 1GB

        if fp16_mode:
            config.set_flag(trt.BuilderFlag.FP16)

        # Build engine
        engine_bytes = builder.build_serialized_network(network, config).serialize()

        # Save engine
        with open(self.engine_path, 'wb') as f:
            f.write(engine_bytes)

        return engine_bytes

    def load_engine(self):
        """Load serialized TensorRT engine."""
        with open(self.engine_path, 'rb') as f:
            runtime = trt.Runtime(self.logger)
            self.engine = runtime.deserialize_cuda_engine(f.read())
            self.context = self.engine.create_execution_context()

    def infer(self, input_tensor):
        """Run TensorRT inference."""
        # Get bindings
        bindings = [0] * self.engine.num_bindings
        for i in range(self.engine.num_bindings):
            bindings[i] = self.context.get_binding_address(i)

        # Create CUDA stream
        stream = trt.cuda.Stream()

        # Transfer input to GPU
        input_tensor_gpu = torch.from_numpy(input_tensor).cuda()

        # Run inference
        self.context.execute_async_v2(
            bindings=bindings,
            stream_handle=stream.handle
        )

        # Get output
        output = self.context.get_binding_output(0)

        return output

Results and Impact

MetricBefore (Manual)After (Automated)
Detection rate~85%96.3%
False positives~8%1.4%
Processing speed20/min65/min
Manual inspection needed100%5% (edge cases only)
Cost savings-$180K/year

Key Learnings

  1. Data quality matters more than model size: After experimenting with YOLOv5n (nano) through YOLOv5x (extra large), the medium model with high-quality annotated data outperformed the extra large model with noisy data by 8%.

  2. Lighting is critical: We spent 2 weeks optimizing the lighting setup before collecting any training data. This single decision reduced false positives by 60%.

  3. Edge cases need special handling: We implemented a secondary classifier for borderline cases (confidence 0.3-0.5), routing them to manual inspection. This hybrid approach maintained accuracy while minimizing false rejections.

Project 2: Medical Image Segmentation for Radiology

The Challenge

Collaborating with healthcare professionals, we tackled the problem of automatically segmenting lung nodules in CT scans for early cancer detection:

  • Input: 3D CT scan volumes (512×512×~300 slices)
  • Output: Precise segmentation masks for each nodule
  • Constraints: High recall (>98%) - missing a nodule is worse than false positives
  • Challenge: Extreme class imbalance (nodules are <0.1% of total volume)

The U-Net Architecture

import torch
import torch.nn as nn
import torch.nn.functional as F

class DoubleConv(nn.Module):
    """Double convolution block for U-Net."""

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.double_conv = nn.Sequential(
            nn.Conv3d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm3d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv3d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm3d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.double_conv(x)

class Down(nn.Module):
    """Downsampling with maxpool."""

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.maxpool_conv = nn.Sequential(
            nn.MaxPool3d(2),
            DoubleConv(in_channels, out_channels)
        )

    def forward(self, x):
        return self.maxpool_conv(x)

class Up(nn.Module):
    """Upsampling with transpose convolution."""

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.up = nn.ConvTranspose3d(
            in_channels, in_channels // 2,
            kernel_size=2, stride=2
        )
        self.conv = DoubleConv(in_channels, out_channels)

    def forward(self, x1, x2):
        x1 = self.up(x1)

        # Handle size mismatch
        diff = torch.tensor([
            x2.size(2) - x1.size(2),
            x2.size(3) - x1.size(3),
            x2.size(4) - x1.size(4)
        ])

        x1 = F.pad(x1, [
            diff[2] // 2, diff[2] - diff[2] // 2,
            diff[1] // 2, diff[1] - diff[1] // 2,
            diff[0] // 2, diff[0] - diff[0] // 2
        ])

        x = torch.cat([x2, x1], dim=1)
        return self.conv(x)

class MedicalUNet(nn.Module):
    """3D U-Net for medical image segmentation."""

    def __init__(self, in_channels=1, out_channels=1, features=[32, 64, 128, 256]):
        super().__init__()

        self.initial_conv = DoubleConv(in_channels, features[0])

        self.down1 = Down(features[0], features[1])
        self.down2 = Down(features[1], features[2])
        self.down3 = Down(features[2], features[3])
        self.down4 = Down(features[3], features[3])

        self.up1 = Up(features[3] * 2, features[2])
        self.up2 = Up(features[2] * 2, features[1])
        self.up3 = Up(features[1] * 2, features[0])
        self.up4 = Up(features[0] * 2, features[0])

        self.final_conv = nn.Conv3d(features[0], out_channels, kernel_size=1)

    def forward(self, x):
        # Encoder
        x1 = self.initial_conv(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)

        # Decoder with skip connections
        x = self.up1(x5, x4)
        x = self.up2(x, x3)
        x = self.up3(x, x2)
        x = self.up4(x, x1)

        return self.final_conv(x)

Handling Class Imbalance with Focal Loss

import torch
import torch.nn as nn

class FocalLoss(nn.Module):
    """Focal Loss for handling class imbalance."""

    def __init__(self, alpha=0.25, gamma=2.0, reduction='mean'):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.reduction = reduction

    def forward(self, inputs, targets):
        # Apply sigmoid
        p = torch.sigmoid(inputs)

        # Calculate binary cross entropy
        bce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')

        # Calculate focal weight
        p_t = p * targets + (1 - p) * (1 - targets)
        focal_weight = self.alpha * (1 - p_t) ** self.gamma

        # Apply focal weight
        focal_loss = focal_weight * bce_loss

        if self.reduction == 'mean':
            return focal_loss.mean()
        elif self.reduction == 'sum':
            return focal_loss.sum()
        return focal_loss

class DiceLoss(nn.Module):
    """Dice Loss for segmentation."""

    def __init__(self, smooth=1.0):
        super().__init__()
        self.smooth = smooth

    def forward(self, inputs, targets):
        # Apply sigmoid
        p = torch.sigmoid(inputs)

        # Flatten tensors
        p = p.view(-1)
        targets = targets.view(-1)

        # Calculate Dice coefficient
        intersection = (p * targets).sum()
        dice = (2. * intersection + self.smooth) / (p.sum() + targets.sum() + self.smooth)

        return 1 - dice

class CombinedLoss(nn.Module):
    """Combine Focal Loss and Dice Loss."""

    def __init__(self, focal_weight=0.5, dice_weight=0.5):
        super().__init__()
        self.focal = FocalLoss(alpha=0.25, gamma=2.0)
        self.dice = DiceLoss()
        self.focal_weight = focal_weight
        self.dice_weight = dice_weight

    def forward(self, inputs, targets):
        focal_loss = self.focal(inputs, targets)
        dice_loss = self.dice(inputs, targets)

        return self.focal_weight * focal_loss + self.dice_weight * dice_loss

Training Pipeline

import torch
from torch.utils.data import Dataset, DataLoader
import nibabel as nib
from pathlib import Path
import numpy as np

class LungCTDataset(Dataset):
    """Dataset for lung CT segmentation."""

    def __init__(self, ct_dir, mask_dir, augment=False):
        self.ct_paths = list(Path(ct_dir).glob('*.nii.gz'))
        self.mask_dir = Path(mask_dir)
        self.augment = augment

    def __len__(self):
        return len(self.ct_paths)

    def __getitem__(self, idx):
        # Load CT scan
        ct_path = self.ct_paths[idx]
        ct = nib.load(ct_path).get_fdata()

        # Load corresponding mask
        mask_path = self.mask_dir / ct_path.name
        mask = nib.load(mask_path).get_fdata()

        # Normalize CT values (Hounsfield units)
        ct = np.clip(ct, -1000, 400)  # Lung window
        ct = (ct - (-1000)) / (400 - (-1000))  # Normalize to [0, 1]

        # Convert to torch tensors
        ct = torch.FloatTensor(ct).unsqueeze(0)  # Add channel dimension
        mask = torch.FloatTensor(mask).unsqueeze(0)

        # Data augmentation
        if self.augment and np.random.random() > 0.5:
            ct, mask = self._augment(ct, mask)

        return ct, mask

    def _augment(self, ct, mask):
        """Apply random augmentations."""
        # Random rotation
        if np.random.random() > 0.5:
            angle = np.random.uniform(-15, 15)
            ct = self._rotate(ct, angle)
            mask = self._rotate(mask, angle)

        # Random flip
        if np.random.random() > 0.5:
            ct = torch.flip(ct, dims=[3])
            mask = torch.flip(mask, dims=[3])

        # Random zoom
        if np.random.random() > 0.5:
            scale = np.random.uniform(0.9, 1.1)
            ct = self._zoom(ct, scale)
            mask = self._zoom(mask, scale)

        return ct, mask

    def _rotate(self, tensor, angle):
        # Implementation omitted for brevity
        pass

    def _zoom(self, tensor, scale):
        # Implementation omitted for brevity
        pass

# Training loop
def train_unet(model, train_loader, val_loader, epochs=100):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)

    criterion = CombinedLoss(focal_weight=0.5, dice_weight=0.5)
    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-5)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)

    best_dice = 0.0

    for epoch in range(epochs):
        # Training
        model.train()
        train_loss = 0.0

        for ct_batch, mask_batch in train_loader:
            ct_batch = ct_batch.to(device)
            mask_batch = mask_batch.to(device)

            optimizer.zero_grad()
            outputs = model(ct_batch)
            loss = criterion(outputs, mask_batch)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        # Validation
        model.eval()
        val_loss = 0.0
        val_dice = 0.0

        with torch.no_grad():
            for ct_batch, mask_batch in val_loader:
                ct_batch = ct_batch.to(device)
                mask_batch = mask_batch.to(device)

                outputs = model(ct_batch)
                loss = criterion(outputs, mask_batch)

                val_loss += loss.item()

                # Calculate Dice coefficient
                preds = torch.sigmoid(outputs) > 0.5
                dice = (2 * (preds * mask_batch).sum() + 1e-6) / (preds.sum() + mask_batch.sum() + 1e-6)
                val_dice += dice.item()

        avg_train_loss = train_loss / len(train_loader)
        avg_val_loss = val_loss / len(val_loader)
        avg_val_dice = val_dice / len(val_loader)

        print(f"Epoch {epoch+1}/{epochs}:")
        print(f"  Train Loss: {avg_train_loss:.4f}")
        print(f"  Val Loss: {avg_val_loss:.4f}")
        print(f"  Val Dice: {avg_val_dice:.4f}")

        # Save best model
        if avg_val_dice > best_dice:
            best_dice = avg_val_dice
            torch.save(model.state_dict(), 'best_lung_unet.pth')
            print(f"  New best model! Dice: {best_dice:.4f}")

        scheduler.step()

    return best_dice

Results

MetricValue
Dice Coefficient0.87
Sensitivity (Recall)98.2%
Specificity94.5%
False Positive Rate5.5%
Inference Time2.3 seconds/volume

Key Learnings

  1. Focal Loss is essential for imbalanced data: Standard cross-entropy resulted in the model predicting everything as background (99.9% accuracy, 0% useful). Focal Loss with gamma=2.0 focused training on hard examples.

  2. 3D context matters: We compared 2D U-Net (slice-by-slice) vs 3D U-Net. The 3D version improved Dice score by 12% because it captures nodule shape across slices.

  3. Clinical validation is different from test metrics: A model with 85% Dice but consistent behavior was preferred over a model with 87% Dice but unpredictable edge cases.

Project 3: Real-Time Object Detection for Warehouse Robotics

The Challenge

For an autonomous warehouse robot project, we needed to detect and localize packages of various sizes in real-time:

  • Latency: <30ms per frame (30+ FPS) for safe navigation
  • Platform: NVIDIA Jetson Xavier (edge device)
  • Objects: Boxes, pallets, people, forklifts (15 classes)
  • Conditions: Variable lighting, occlusions, motion blur

Solution: Optimized YOLOv8

from ultralytics import YOLO
import torch

class WarehouseDetector:
    """Optimized object detection for warehouse robotics."""

    def __init__(self, model_path='yolov8m.pt'):
        # Load pretrained model
        self.model = YOLO(model_path)

        # Optimize for deployment
        self._optimize_for_edge()

    def _optimize_for_edge(self):
        """Apply optimizations for edge deployment."""
        # Fuse Conv + BatchNorm layers
        self.model.fuse()

        # Convert to TensorRT (if available)
        try:
            self.model.export(format='engine', device=0, half=True)
        except Exception as e:
            print(f"TensorRT export failed: {e}")

    def train_custom(self, data_path, epochs=50, imgsz=640):
        """Fine-tune on warehouse data."""
        results = self.model.train(
            data=data_path,
            epochs=epochs,
            imgsz=imgsz,
            batch=16,
            device=0,
            workers=8,
            optimizer='SGD',
            lr0=0.01,
            lrf=0.01,
            momentum=0.937,
            weight_decay=0.0005,
            warmup_epochs=3.0,
            warmup_momentum=0.8,
            box=7.5,
            cls=0.5,
            dfl=1.5,
            close_mosaic=10,
            amp=True,  # Mixed precision
        )
        return results

    def detect(self, frame):
        """Run detection on single frame."""
        results = self.model(frame, verbose=False)

        # Parse results
        detections = []
        for box in results[0].boxes:
            detections.append({
                'class': int(box.cls),
                'class_name': self.model.names[int(box.cls)],
                'confidence': float(box.conf),
                'bbox': box.xyxy[0].cpu().numpy().tolist()
            })

        return detections

    def detect_stream(self, video_source=0):
        """Process video stream."""
        import cv2

        cap = cv2.VideoCapture(video_source)

        while True:
            ret, frame = cap.read()
            if not ret:
                break

            # Run detection
            results = self.model(frame, verbose=False)

            # Annotate frame
            annotated = results[0].plot()

            cv2.imshow('Warehouse Detection', annotated)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        cap.release()
        cv2.destroyAllWindows()

Integration with Robot Navigation

class RobotNavigation:
    """Integrate vision with robot navigation."""

    def __init__(self, detector):
        self.detector = detector
        self.safety_distance = 2.0  # meters
        self.stop_distance = 0.5  # meters

    def process_frame_for_navigation(self, frame, depth_map):
        """Process frame and generate navigation commands."""
        # Run detection
        detections = self.detector.detect(frame)

        # Classify objects by risk level
        critical_classes = ['person', 'forklift']
        obstacle_classes = ['box', 'pallet', 'cart']

        commands = {
            'stop': False,
            'slow_down': False,
            'steer_angle': 0.0,
            'max_speed': 1.0
        }

        for det in detections:
            # Estimate distance from depth map
            bbox = det['bbox']
            center_x = int((bbox[0] + bbox[2]) / 2)
            center_y = int((bbox[1] + bbox[3]) / 2)

            distance = depth_map[center_y, center_x]

            # Check for critical obstacles
            if det['class_name'] in critical_classes:
                if distance < self.stop_distance:
                    commands['stop'] = True
                elif distance < self.safety_distance:
                    commands['slow_down'] = True
                    commands['max_speed'] = 0.3

            # Calculate steering adjustment
            elif det['class_name'] in obstacle_classes:
                if distance < self.safety_distance:
                    # Steer away from obstacle
                    image_center = frame.shape[1] / 2
                    object_center = (bbox[0] + bbox[2]) / 2

                    if object_center < image_center:
                        commands['steer_angle'] = 0.3  # Steer right
                    else:
                        commands['steer_angle'] = -0.3  # Steer left

        return commands

Results

MetricValue
mAP@5094.2%
mAP@50-9587.5%
Inference Time (Jetson)18ms
FPS55 FPS
Collision Avoidance99.7% success

Key Learnings

  1. Edge deployment requires trade-offs: We tested YOLOv8n (nano) through YOLOv8x. The medium model offered the best balance of accuracy and speed on Jetson Xavier.

  2. Data augmentation is critical: Warehouse environments have challenging lighting. Augmentations (brightness, contrast, motion blur) improved robustness significantly.

  3. Multi-sensor fusion: Vision alone wasn’t enough. Combining camera with depth sensors and LiDAR reduced false positives by 40%.

Overall Lessons from Computer Vision Projects

Data Quality > Model Complexity

This can’t be overstated. I’ve seen teams spend weeks tuning hyperparameters on a model trained with mediocre data, when better annotations would have yielded 10x improvement.

# My data quality checklist
def assess_data_quality(dataset_path):
    checks = {
        'annotation_consistency': check_annotation_overlap(dataset_path),
        'class_balance': check_class_distribution(dataset_path),
        'image_quality': check_image_sharpness(dataset_path),
        'label_accuracy': sample_and_verify_annotations(dataset_path),
    }

    for check, result in checks.items():
        if result['score'] < 0.8:
            print(f"⚠️  {check}: {result['score']:.2f} - Needs attention")
        else:
            print(f"✅ {check}: {result['score']:.2f}")

Deployment Considerations Early

Always design with deployment in mind:

  • What’s the latency budget?
  • What hardware will run inference?
  • What’s the acceptable false positive/negative rate?
  • How will the model be updated?

The “Last Mile” Problem

Getting from 90% to 98% accuracy often requires:

  • Custom post-processing logic
  • Ensemble methods
  • Human-in-the-loop for edge cases
  • Continuous monitoring and retraining

Conclusion

Computer vision continues to evolve rapidly. What I learned on these projects:

  1. Start simple: Baseline with pretrained models before custom architectures
  2. Invest in data: Annotation quality determines your ceiling
  3. Understand your constraints: Latency, hardware, and business requirements shape the solution
  4. Plan for iteration: Models degrade; build pipelines for continuous improvement

The projects I’ve shared here represent thousands of hours of experimentation, failure, and learning. But the fundamentals remain consistent: good data, appropriate architecture, and thoughtful deployment.


Questions about computer vision projects? Reach out through the contact page or connect on LinkedIn.

MD Furkanul Islam

MD Furkanul Islam

Data Engineer & AI/ML Specialist

9+ years building intelligent data systems at scale. Passionate about bridging the gap between data engineering, AI, and robotics.