Florent Poux, Ph.D., Author at Towards Data Science

Master the 3D Reconstruction Process: A Step-by-Step Guide

Florent Poux, Ph.D. — Fri, 28 Mar 2025 20:25:57 +0000

The 3d Reconstruction journey from 2D photographs to 3D models follows a structured path.

This path consists of distinct steps that build upon each other to transform flat images into spatial information.

Understanding this pipeline is crucial for anyone looking to create high-quality 3D reconstructions.

Let me explain…

Most people think 3D reconstruction means:

Taking random photos around an object
Pressing a button in expensive software
Waiting for magic to happen
Getting perfect results every time
Skipping the fundamentals

No thanks.

The most successful 3D Reconstruction I have seen are built on three core principles:

They use pipelines that work with fewer images but position them better.
They make sure users spend less time processing but achieve cleaner results.
They permit troubleshooting faster because users know exactly where to look.

Therefore, this hints at a nice lesson:

Your 3D models can only be as good as your understanding of how they’re created.

Looking at this from a scientific perspective is really key.

Let us dive right into it!

If you are new to my (3D) writing world, welcome! We are going on an exciting adventure that will allow you to master an essential 3D Python skill.

Once the scene is laid out, we embark on the Python journey. Everything is provided, included resources at the end. You will see Tips (Notes and Growing) to help you get the most out of this article. Thanks to the 3D Geodata Academy for supporting the endeavor. This article is inspired by a small section of Module 1 of the 3D Reconstructor OS Course.

The Complete 3D Reconstruction Workflow

Let me highlight the 3D Reconstruction pipeline with Photogrammetry. The process follows a logical sequence of steps, as illustrated below.

What is important to note, is that each step builds upon the previous one. Therefore, the quality of each stage directly impacts the final result, which is very important to have in mind!

Understanding the entire process is crucial for troubleshooting workflows due to its sequential nature.

With that in mind, let’s detail each step, focusing on both the theory and practical implementation.

Natural Feature Extraction: Finding the Distinctive Points

Natural feature extraction is the foundation of the photogrammetry process. It identifies distinctive points in images that can be reliably located across multiple photographs.

These points serve as anchors that tie different views together.

When working with low-texture objects, consider adding temporary markers or texture patterns to improve feature extraction results.

Common feature extraction algorithms include:

Algorithm	Strengths	Weaknesses	Best For
SIFT	Scale and rotation invariant	Computationally expensive	High-quality, general-purpose reconstruction
SURF	Faster than SIFT	Less accurate than SIFT	Quick prototyping
ORB	Very fast, no patent restrictions	Less robust to viewpoint changes	Real-time applications

Let’s implement a simple feature extraction using OpenCV:

#%% SECTION 1: Natural Feature Extraction
import cv2
import numpy as np
import matplotlib.pyplot as plt

def extract_features(image_path, feature_method='sift', max_features=2000):
    """
    Extract features from an image using different methods.
    """

    # Read the image in color and convert to grayscale
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f"Could not read image at {image_path}")
    
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # Initialize feature detector based on method
    if feature_method.lower() == 'sift':
        detector = cv2.SIFT_create(nfeatures=max_features)
    elif feature_method.lower() == 'surf':
        # Note: SURF is patented and may not be available in all OpenCV distributions
        detector = cv2.xfeatures2d.SURF_create(400)  # Adjust threshold as needed
    elif feature_method.lower() == 'orb':
        detector = cv2.ORB_create(nfeatures=max_features)
    else:
        raise ValueError(f"Unsupported feature method: {feature_method}")
    
    # Detect and compute keypoints and descriptors
    keypoints, descriptors = detector.detectAndCompute(gray, None)
    
    # Create visualization
    img_with_features = cv2.drawKeypoints(
        img, keypoints, None, 
        flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
    )
    
    print(f"Extracted {len(keypoints)} {feature_method.upper()} features")
    
    return keypoints, descriptors, img_with_features

image_path = "sample_image.jpg"  # Replace with your image path

# Extract features with different methods
kp_sift, desc_sift, vis_sift = extract_features(image_path, 'sift')
kp_orb, desc_orb, vis_orb = extract_features(image_path, 'orb')

What I do here is run through an image, and hunt for distinctive patterns that stand out from their surroundings.

These patterns create mathematical “signatures” called descriptors that remain recognizable even when viewed from different angles or distances.

Think of them as unique fingerprints that can be matched across multiple photographs.

The visualization step reveals exactly what the algorithm finds important in your image.

# Display results
plt.figure(figsize=(12, 6))
    
plt.subplot(1, 2, 1)
plt.title(f'SIFT Features ({len(kp_sift)})')
plt.imshow(cv2.cvtColor(vis_sift, cv2.COLOR_BGR2RGB))
plt.axis('off')
    
plt.subplot(1, 2, 2)
plt.title(f'ORB Features ({len(kp_orb)})')
plt.imshow(cv2.cvtColor(vis_orb, cv2.COLOR_BGR2RGB))
plt.axis('off')
    
plt.tight_layout()
plt.show()

Notice how corners, edges, and textured areas attract more keypoints, while smooth or uniform regions remain largely ignored.

This visual feedback is invaluable for understanding why some objects reconstruct better than others.

Geeky Note: The max_features parameter is critical. Setting it too high can dramatically slow processing and capture noise, while setting it too low might miss important details. For most objects, 2000-5000 features provide a good balance, but I’ll push it to 10,000+ for highly detailed architectural reconstructions.

Feature Matching: Connecting Images Together

Once features are extracted, the next step is to find correspondences between images. This process identifies which points in different images represent the same physical point in the real world. Feature matching creates the connections needed to determine camera positions.

I’ve seen countless attempts fail because the algorithm couldn’t reliably connect the same points across different images.

The ratio test is the silent hero that weeds out ambiguous matches before they poison your reconstruction.

#%% SECTION 2: Feature Matching
import cv2
import numpy as np
import matplotlib.pyplot as plt

def match_features(descriptors1, descriptors2, method='flann', ratio_thresh=0.75):
    """
    Match features between two images using different methods.
    """

    # Convert descriptors to appropriate type if needed
    if descriptors1 is None or descriptors2 is None:
        return []
    
    if method.lower() == 'flann':
        # FLANN parameters
        if descriptors1.dtype != np.float32:
            descriptors1 = np.float32(descriptors1)
        if descriptors2.dtype != np.float32:
            descriptors2 = np.float32(descriptors2)
            
        FLANN_INDEX_KDTREE = 1
        index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
        search_params = dict(checks=50)  # Higher values = more accurate but slower
        
        flann = cv2.FlannBasedMatcher(index_params, search_params)
        matches = flann.knnMatch(descriptors1, descriptors2, k=2)
    else:  # Brute Force
        # For ORB descriptors
        if descriptors1.dtype == np.uint8:
            bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
        else:  # For SIFT and SURF descriptors
            bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
        
        matches = bf.knnMatch(descriptors1, descriptors2, k=2)
    
    # Apply Lowe's ratio test
    good_matches = []
    for match in matches:
        if len(match) == 2:  # Sometimes fewer than 2 matches are returned
            m, n = match
            if m.distance < ratio_thresh * n.distance:
                good_matches.append(m)
    
    return good_matches

def visualize_matches(img1, kp1, img2, kp2, matches, max_display=100):
    """
    Create a visualization of feature matches between two images.
    """

    # Limit the number of matches to display
    matches_to_draw = matches[:min(max_display, len(matches))]
    
    # Create match visualization
    match_img = cv2.drawMatches(
        img1, kp1, img2, kp2, matches_to_draw, None,
        flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
    )
    
    return match_img

# Load two images
img1_path = "image1.jpg"  # Replace with your image paths
img2_path = "image2.jpg"
    
# Extract features using SIFT (or your preferred method)
kp1, desc1, _ = extract_features(img1_path, 'sift')
kp2, desc2, _ = extract_features(img2_path, 'sift')
    
# Match features
good_matches = match_features(desc1, desc2, method='flann')
    
print(f"Found {len(good_matches)} good matches")

The matching process works by comparing feature descriptors between two images, measuring their mathematical similarity. For each feature in the first image, we find its two closest matches in the second image and assess their relative distances.

If the closest match is significantly better than the second-best (as controlled by the ratio threshold), we consider it reliable.

# Visualize matches
img1 = cv2.imread(img1_path)
img2 = cv2.imread(img2_path)
match_visualization = visualize_matches(img1, kp1, img2, kp2, good_matches)
    
plt.figure(figsize=(12, 8))
plt.imshow(cv2.cvtColor(match_visualization, cv2.COLOR_BGR2RGB))
plt.title(f"Feature Matches: {len(good_matches)}")
plt.axis('off')
plt.tight_layout()
plt.show()

Visualizing these matches reveals the spatial relationships between your images.

Good matches form a consistent pattern that reflects the transform between viewpoints, while outliers appear as random connections.

This pattern provides immediate feedback on image quality and camera positioning—clustered, consistent matches suggest good reconstruction potential.

Geeky Note: The ratio_thresh parameter (0.75) is Lowe’s original recommendation and works well in most situations. Lower values (0.6-0.7) produce fewer but more reliable matches, which is preferable for scenes with repetitive patterns. Higher values (0.8-0.9) yield more matches but increase the risk of outliers contaminating your reconstruction.

Beautiful, now, let us move at the main stage: the Structure from Motion node.

Structure From Motion: Placing Cameras in Space

Structure from Motion (SfM) reconstructs both the 3D scene structure and camera motion from the 2D image correspondences. This process determines where each photo was taken from and creates an initial sparse point cloud of the scene.

Key steps in SfM include:

Estimating the fundamental or essential matrix between image pairs
Recovering camera poses (position and orientation)
Triangulating 3D points from 2D correspondences
Building a track graph to connect observations across multiple images

The essential matrix encodes the geometric relationship between two camera viewpoints, revealing how they’re positioned relative to each other in space.

This mathematical relationship is the foundation for reconstructing both the camera positions and the 3D structure they observed.

#%% SECTION 3: Structure from Motion
import cv2
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def estimate_pose(kp1, kp2, matches, K, method=cv2.RANSAC, prob=0.999, threshold=1.0):
    """
    Estimate the relative pose between two cameras using matched features.
    """

    # Extract matched points
    pts1 = np.float32([kp1[m.queryIdx].pt for m in matches])
    pts2 = np.float32([kp2[m.trainIdx].pt for m in matches])
    
    # Estimate essential matrix
    E, mask = cv2.findEssentialMat(pts1, pts2, K, method, prob, threshold)
    
    # Recover pose from essential matrix
    _, R, t, mask = cv2.recoverPose(E, pts1, pts2, K, mask=mask)
    
    inlier_matches = [matches[i] for i in range(len(matches)) if mask[i] > 0]
    print(f"Estimated pose with {np.sum(mask)} inliers out of {len(matches)} matches")
    
    return R, t, mask, inlier_matches

def triangulate_points(kp1, kp2, matches, K, R1, t1, R2, t2):
    """
    Triangulate 3D points from two views.
    """

    # Extract matched points
    pts1 = np.float32([kp1[m.queryIdx].pt for m in matches])
    pts2 = np.float32([kp2[m.trainIdx].pt for m in matches])
    
    # Create projection matrices
    P1 = np.dot(K, np.hstack((R1, t1)))
    P2 = np.dot(K, np.hstack((R2, t2)))
    
    # Triangulate points
    points_4d = cv2.triangulatePoints(P1, P2, pts1.T, pts2.T)
    
    # Convert to 3D points
    points_3d = points_4d[:3] / points_4d[3]
    
    return points_3d.T

def visualize_points_and_cameras(points_3d, R1, t1, R2, t2):
    """
    Visualize 3D points and camera positions.
    """

    fig = plt.figure(figsize=(10, 8))
    ax = fig.add_subplot(111, projection='3d')
    
    # Plot points
    ax.scatter(points_3d[:, 0], points_3d[:, 1], points_3d[:, 2], c='b', s=1)
    
    # Helper function to create camera visualization
    def plot_camera(R, t, color):
        # Camera center
        center = -R.T @ t
        ax.scatter(center[0], center[1], center[2], c=color, s=100, marker='o')
        
        # Camera axes (showing orientation)
        axes_length = 0.5  # Scale to make it visible
        for i, c in zip(range(3), ['r', 'g', 'b']):
            axis = R.T[:, i] * axes_length
            ax.quiver(center[0], center[1], center[2], 
                      axis[0], axis[1], axis[2], 
                      color=c, arrow_length_ratio=0.1)
    
    # Plot cameras
    plot_camera(R1, t1, 'red')
    plot_camera(R2, t2, 'green')
    
    ax.set_title('3D Reconstruction: Points and Cameras')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.set_zlabel('Z')
    
    # Try to make axes equal
    max_range = np.max([
        np.max(points_3d[:, 0]) - np.min(points_3d[:, 0]),
        np.max(points_3d[:, 1]) - np.min(points_3d[:, 1]),
        np.max(points_3d[:, 2]) - np.min(points_3d[:, 2])
    ])
    
    mid_x = (np.max(points_3d[:, 0]) + np.min(points_3d[:, 0])) * 0.5
    mid_y = (np.max(points_3d[:, 1]) + np.min(points_3d[:, 1])) * 0.5
    mid_z = (np.max(points_3d[:, 2]) + np.min(points_3d[:, 2])) * 0.5
    
    ax.set_xlim(mid_x - max_range * 0.5, mid_x + max_range * 0.5)
    ax.set_ylim(mid_y - max_range * 0.5, mid_y + max_range * 0.5)
    ax.set_zlim(mid_z - max_range * 0.5, mid_z + max_range * 0.5)
    
    plt.tight_layout()
    plt.show()

Geeky Note: The RANSAC threshold parameter (threshold=1.0) determines how strict we are about geometric consistency. I’ve found that 0.5-1.0 works well for controlled environments, but increasing to 1.5-2.0 helps with outdoor scenes where wind might cause slight camera movements. The probability parameter (prob=0.999) ensures high confidence but increases computation time; 0.95 is sufficient for prototyping.

The essential matrix estimation uses matched feature points and the camera’s internal parameters to calculate the geometric relationship between images.

This relationship is then decomposed to extract rotation and translation information – essentially determining where each photo was taken from in 3D space. The accuracy of this step directly affects everything that follows.


# This is a simplified example - in practice you would use images and matches
# from the previous steps
    
# Example camera intrinsic matrix (replace with your calibrated values)
K = np.array([
        [1000, 0, 320],
        [0, 1000, 240],
        [0, 0, 1]
])
    
# For first camera, we use identity rotation and zero translation
R1 = np.eye(3)
t1 = np.zeros((3, 1))
    
# Load images, extract features, and match as in previous sections
img1_path = "image1.jpg"  # Replace with your image paths
img2_path = "image2.jpg"
    
img1 = cv2.imread(img1_path)
img2 = cv2.imread(img2_path)
    
kp1, desc1, _ = extract_features(img1_path, 'sift')
kp2, desc2, _ = extract_features(img2_path, 'sift')
    
matches = match_features(desc1, desc2, method='flann')
    
# Estimate pose of second camera relative to first
R2, t2, mask, inliers = estimate_pose(kp1, kp2, matches, K)
    
# Triangulate points
points_3d = triangulate_points(kp1, kp2, inliers, K, R1, t1, R2, t2)

Once camera positions are established, triangulation projects rays from matched points in multiple images to determine where they intersect in 3D space.

# Visualize the result
visualize_points_and_cameras(points_3d, R1, t1, R2, t2)

These intersections form the initial sparse point cloud, providing the skeleton upon which dense reconstruction will later build. The visualization shows both the reconstructed points and the camera positions, helping you understand the spatial relationships in your dataset.

SfM works best with a good network of overlapping images. Aim for at least 60% overlap between adjacent images for reliable reconstruction.

Bundle Adjustment: Optimizing for Accuracy

There is an extra optimization stage that comes in within the Structure from Motion “compute node”.

This is called: Bundle adjustment.

It is a refinement step that jointly optimizes camera parameters and 3D point positions. What that means, is that it minimizes the reprojection error, i.e. the difference between observed image points and the projection of their corresponding 3D points.

Does this make sense to you? Essentially, this optimization is great as it permits to:

improves the accuracy of the reconstruction
correct for accumulated drift
Ensures global consistency of the model

At this stage, this should be enough to get a good intuition of how it works.

In larger projects, incremental bundle adjustment (optimizing after adding each new camera) can improve both speed and stability compared to global adjustment at the end.

Dense Matching: Creating Detailed Reconstructions

After establishing camera positions and sparse points, the final step is dense matching to create a detailed representation of the scene.

Dense matching uses the known camera parameters to match many more points between images, resulting in a complete point cloud.

Common approaches include:

Multi-View Stereo (MVS)
Patch-based Multi-View Stereo (PMVS)
Semi-Global Matching (SGM)

Putting It All Together: Practical Tools

The theoretical pipeline is implemented in several open-source and commercial software packages. Each offers different features and capabilities:

Tool	Strengths	Use Case	Pricing
COLMAP	Highly accurate, customizable	Research, precise reconstructions	Free, open-source
OpenMVG	Modular, extensive documentation	Education, integration with custom pipelines	Free, open-source
Meshroom	User-friendly, node-based interface	Artists, beginners	Free, open-source
RealityCapture	Extremely fast, high-quality results	Professional, large-scale projects	Commercial

These tools package the various pipeline steps described above into a more user-friendly interface, but understanding the underlying processes is still essential for troubleshooting and optimization.

Automating the reconstruction pipeline saves countless hours of manual work.

The real productivity boost comes from scripting the entire process end-to-end, from raw photos to dense point cloud.

COLMAP’s command-line interface makes this automation possible, even for complex reconstruction tasks.

#%% SECTION 4: Complete Pipeline Automation with COLMAP
import os
import subprocess
import glob
import numpy as np

def run_colmap_pipeline(image_folder, output_folder, colmap_path="colmap"):
    """
    Run the complete COLMAP pipeline from feature extraction to dense reconstruction.
    """

    # Create output directories if they don't exist
    sparse_folder = os.path.join(output_folder, "sparse")
    dense_folder = os.path.join(output_folder, "dense")
    database_path = os.path.join(output_folder, "database.db")
    
    os.makedirs(output_folder, exist_ok=True)
    os.makedirs(sparse_folder, exist_ok=True)
    os.makedirs(dense_folder, exist_ok=True)
    
    # Step 1: Feature extraction
    print("Step 1: Feature extraction")
    feature_cmd = [
        colmap_path, "feature_extractor",
        "--database_path", database_path,
        "--image_path", image_folder,
        "--ImageReader.camera_model", "SIMPLE_RADIAL",
        "--ImageReader.single_camera", "1",
        "--SiftExtraction.use_gpu", "1"
    ]
    
    try:
        subprocess.run(feature_cmd, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Feature extraction failed: {e}")
        return False
    
    # Step 2: Match features
    print("Step 2: Feature matching")
    match_cmd = [
        colmap_path, "exhaustive_matcher",
        "--database_path", database_path,
        "--SiftMatching.use_gpu", "1"
    ]
    
    try:
        subprocess.run(match_cmd, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Feature matching failed: {e}")
        return False
    
    # Step 3: Sparse reconstruction (Structure from Motion)
    print("Step 3: Sparse reconstruction")
    sfm_cmd = [
        colmap_path, "mapper",
        "--database_path", database_path,
        "--image_path", image_folder,
        "--output_path", sparse_folder
    ]
    
    try:
        subprocess.run(sfm_cmd, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Sparse reconstruction failed: {e}")
        return False
    
    # Find the largest sparse model
    sparse_models = glob.glob(os.path.join(sparse_folder, "*/"))
    if not sparse_models:
        print("No sparse models found")
        return False
    
    # Sort by model size (using number of images as proxy)
    largest_model = 0
    max_images = 0
    for i, model_dir in enumerate(sparse_models):
        images_txt = os.path.join(model_dir, "images.txt")
        if os.path.exists(images_txt):
            with open(images_txt, 'r') as f:
                num_images = sum(1 for line in f if line.strip() and not line.startswith("#"))
                num_images = num_images // 2  # Each image has 2 lines
                if num_images > max_images:
                    max_images = num_images
                    largest_model = i
    
    selected_model = os.path.join(sparse_folder, str(largest_model))
    print(f"Selected model {largest_model} with {max_images} images")
    
    # Step 4: Image undistortion
    print("Step 4: Image undistortion")
    undistort_cmd = [
        colmap_path, "image_undistorter",
        "--image_path", image_folder,
        "--input_path", selected_model,
        "--output_path", dense_folder,
        "--output_type", "COLMAP"
    ]
    
    try:
        subprocess.run(undistort_cmd, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Image undistortion failed: {e}")
        return False
    
    # Step 5: Dense reconstruction (Multi-View Stereo)
    print("Step 5: Dense reconstruction")
    mvs_cmd = [
        colmap_path, "patch_match_stereo",
        "--workspace_path", dense_folder,
        "--workspace_format", "COLMAP",
        "--PatchMatchStereo.geom_consistency", "true"
    ]
    
    try:
        subprocess.run(mvs_cmd, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Dense reconstruction failed: {e}")
        return False
    
    # Step 6: Stereo fusion
    print("Step 6: Stereo fusion")
    fusion_cmd = [
        colmap_path, "stereo_fusion",
        "--workspace_path", dense_folder,
        "--workspace_format", "COLMAP",
        "--input_type", "geometric",
        "--output_path", os.path.join(dense_folder, "fused.ply")
    ]
    
    try:
        subprocess.run(fusion_cmd, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Stereo fusion failed: {e}")
        return False
    
    print("Pipeline completed successfully!")
    return True

The script orchestrates a series of COLMAP operations that would normally require manual intervention at each stage. It handles the progression from feature extraction through matching, sparse reconstruction, and finally dense reconstruction – maintaining the correct data flow between steps. This automation becomes invaluable when processing multiple datasets or when iteratively refining reconstruction parameters.

# Replace with your image and output folder paths
image_folder = "path/to/images"
output_folder = "path/to/output"
    
# Path to COLMAP executable (may be just "colmap" if it's in your PATH)
colmap_path = "colmap"
    
run_colmap_pipeline(image_folder, output_folder, colmap_path)

One key aspect is the automatic selection of the largest reconstructed model. In challenging datasets, COLMAP sometimes creates multiple disconnected reconstructions rather than a single cohesive model.

The script intelligently identifies and continues with the most complete reconstruction, using image count as a proxy for model quality and completeness.

Geeky Note: The –SiftExtraction.use_gpu and –SiftMatching.use_gpu flags enable GPU acceleration, speeding up processing by 5-10x. For dense reconstruction, the –PatchMatchStereo.geom_consistency true parameter significantly improves quality by enforcing consistency across multiple views, at the cost of longer processing time.

The Power of Understanding the Pipeline

Understanding the full reconstruction pipeline gives you control over your 3D modeling process. When you encounter issues, knowing which stage might be causing problems allows you to target your troubleshooting efforts effectively.

As illustrated, common issues and their sources include:

Missing or incorrect camera poses: Feature extraction and matching problems
Incomplete reconstruction: Insufficient image overlap
Noisy point clouds: Poor bundle adjustment or camera calibration
Failed reconstruction: Problematic images (motion blur, poor lighting)

The ability to diagnose these issues comes from a deep understanding of how each pipeline component works and interacts with others.

Next Steps: Practice and Automation

Now that you understand the pipeline, it’s time to put it into practice. Experiment with the provided code examples and try automating the process for your own datasets.

Start with small, well-controlled scenes and gradually tackle more complex environments as you gain confidence.

Remember that the quality of your input images dramatically affects the final result. Take time to capture high-quality photographs with good overlap, consistent lighting, and minimal motion blur.

Consider starting a small personal project to reconstruct an object you own. Document your process, including the issues you encounter and how you solve them – this practical experience is invaluable.

If you want to build proper expertise, consider
the 3D Reconstructor OS Course ,
or 3D Data Science with Python (O’Reilly)

References and useful resources

I compiled for you some interesting software, tools, and useful algorithm extended documentation:

Software and Tools

COLMAP – Free, open-source 3D reconstruction software
OpenMVG – Open Multiple View Geometry library
Meshroom – Free node-based photogrammetry software
RealityCapture – Commercial high-performance photogrammetry software
Agisoft Metashape – Commercial photogrammetry and 3D modeling software
OpenCV – Computer vision library with feature detection implementations
3DF Zephyr – Photogrammetry software for 3D reconstruction
Python – Programming language ideal for 3D reconstruction automation

Algorithms

SIFT (Scale-Invariant Feature Transform) – Robust feature detection algorithm
SURF (Speeded-Up Robust Features) – Fast feature detection algorithm
ORB (Oriented FAST and Rotated BRIEF) – Efficient alternative to SIFT and SURF
RANSAC (Random Sample Consensus) – Used for outlier rejection in matching
Structure from Motion (SfM) – Algorithm for recovering 3D structure from 2D images
Multi-View Stereo (MVS) – Dense reconstruction algorithm
Bundle Adjustment – Optimization technique for camera poses and 3D points
FLANN (Fast Library for Approximate Nearest Neighbors) – Fast matching algorithm for feature descriptors

About the author

Florent Poux, Ph.D. is a Scientific and Course Director focused on educating engineers on leveraging AI and 3D Data Science. He leads research teams and teaches 3D Computer Vision at various universities. His current aim is to ensure humans are correctly equipped with the knowledge and skills to tackle 3D challenges for impactful innovations.

Resources

The post Master the 3D Reconstruction Process: A Step-by-Step Guide appeared first on Towards Data Science.

How To Generate GIFs from 3D Models with Python

Florent Poux, Ph.D. — Fri, 21 Feb 2025 02:23:47 +0000

As a data scientist, you know that effectively communicating your insights is as important as the insights themselves.

But how do you communicate over 3D data?

I can bet most of us have been there: you spend days, weeks, maybe even months meticulously collecting and processing 3D data. Then comes the moment to share your findings, whether it’s with clients, colleagues, or the broader scientific community. You throw together a few static screenshots, but they just don’t capture the essence of your work. The subtle details, the spatial relationships, the sheer scale of the data—it all gets lost in translation.

Comparing 3D Data Communication Methods. © F. Poux

Or maybe you’ve tried using specialized 3D visualization software. But when your client uses it, they struggle with clunky interfaces, steep learning curves, and restrictive licensing.

What should be a smooth, intuitive process becomes a frustrating exercise in technical acrobatics. It’s an all-too-common scenario: the brilliance of your 3D data is trapped behind a wall of technical barriers.

This highlights a common issue: the need to create shareable content that can be opened by anyone, i.e., that does not demand specific 3D data science skills.

Think about it: what is the most used way to share visual information? Images.

But how can we convey the 3D information from a simple 2D image?

Well, let us use “first principle thinking”: let us create shareable content stacking multiple 2D views, such as GIFs or MP4s, from raw point clouds.

The bread of magic to generate GIF and MP4. © F. Poux

This process is critical for presentations, reports, and general communication. But generating GIFs and MP4s from 3D data can be complex and time-consuming. I’ve often found myself wrestling with the challenge of quickly generating rotating GIF or MP4 files from a 3D point cloud, a task that seemed simple enough but often spiraled into a time-consuming ordeal.

Current workflows might lack efficiency and ease of use, and a streamlined process can save time and improve data presentation.

Let me share a solution that involves leveraging Python and specific libraries to automate the creation of GIFs and MP4s from point clouds (or any 3D dataset such as a mesh or a CAD model).

Think about it. You’ve spent hours meticulously collecting and processing this 3D data. Now, you need to present it in a compelling way for a presentation or a report. But how can we be sure it can be integrated into a SaaS solution where it is triggered on upload? You try to create a dynamic visualization to showcase a critical feature or insight, and yet you’re stuck manually capturing frames and stitching them together. How can we automate this process to seamlessly integrate it into your existing systems?

An example of a GIF generated with the methodology. © F. Poux

If you are new to my (3D) writing world, welcome! We are going on an exciting adventure that will allow you to master an essential 3D Python skill. Before diving, I like to establish a clear scenario, the mission brief.

Once the scene is laid out, we embark on the Python journey. Everything is given. You will see Tips (Notes and Growing) to help you get the most out of this article. Thanks to the 3D Geodata Academy for supporting the endeavor.

The Mission

You are working for a new engineering firm, “Geospatial Dynamics,” which wants to showcase its cutting-edge LiDAR scanning services. Instead of sending clients static point cloud images, you propose to use a new tool, which is a Python script, to generate dynamic rotating GIFs of project sites.

After doing so market research, you found that this can immediately elevate their proposals, resulting in a 20% higher project approval rate. That’s the power of visual storytelling.

The three stages of the mission towards an increase project approval. © F. Poux

On top, you can even imagine a more compelling scenario, where “GeoSpatial Dynamics” is able to process point clouds massively and then generate MP4 videos that are sent to potential clients. This way, you lower the churn and make the brand more memorable.

With that in mind, we can start designing a robust framework to answer our mission’s goal.

The Framework

I remember a project where I had to show a detailed architectural scan to a group of investors. The usual still images just could not capture the fine details. I desperately needed a way to create a rotating GIF to convey the full scope of the design. That is why I’m excited to introduce this Cloud2Gif Python solution. With this, you’ll be able to easily generate shareable visualizations for presentations, reports, and communication.

The framework I propose is straightforward yet effective. It takes raw 3D data, processes it using Python and the PyVista library, generates a series of frames, and stitches them together to create a GIF or MP4 video. The high-level workflow includes:

The various stages of the framework in this article. © F. Poux

1. Loading the 3D data (mesh with texture).

2. Loading a 3D Point Cloud

3. Setting up the visualization environment.

4. Generating a GIF

4.1. Defining a camera orbit path around the data.

4.2. Rendering frames from different viewpoints along the path.

4.3. Encoding the frames into a GIF or

5. Generating an orbital MP4

6. Creating a Function

7. Testing with multiple datasets

This streamlined process allows for easy customization and integration into existing workflows. The key advantage here is the simplicity of the approach. By leveraging the basic principles of 3D data rendering, a very efficient and self-contained script can be put together and deployed on any system as long as Python is installed.

This makes it compatible with various edge computing solutions and allows for easy integration with sensor-heavy systems. The goal is to generate a GIF and an MP4 from a 3D data set. The process is simple, requiring a 3D data set, a bit of magic (the code), and the output as GIF and MP4 files.

The growth of the solution as we move along the major stages. © F. Poux

Now, what are the tools and libraries that we will need for this endeavor?

1. Setup Guide: The Libraries, Tools and Data

For this project, we primarily use the following two Python libraries:

NumPy: The cornerstone of numerical computing in Python. Without it, I would have to deal with every vertex (point) in a very inefficient way. NumPy Official Website
pyvista: A high-level interface to the Visualization Toolkit (VTK). PyVista enables me to easily visualize and interact with 3D data. It handles rendering, camera control, and exporting frames. PyVista Official Website

PyVista and Numpy libraries for 3D Data. © F. Poux

These libraries provide all the necessary tools to handle data processing, visualization, and output generation. This set of libraries was carefully chosen so that a minimal amount of external dependencies is present, which improves sustainability and makes it easily deployable on any system.

Let me share the details of the environment as well as the data preparation setup.

Quick Environment Setup Guide

Let me provide very brief details on how to set up your environment.

Step 1: Install Miniconda

Four simple steps to get a working Miniconda version:

Visit: https://docs.conda.io/projects/miniconda/en/latest/
Download the “installer file” for your Operating System (Let it be Windows, MacOS or a Linux distribution)
Run the installer
Open terminal/command prompt and verify with: conda — version

How to install Anaconda for 3D Coding. © F. Poux

Step 2: Create a new environment

You can run the following code in your terminal

conda create -n pyvista_env python=3.10
conda activate pyvista_env

Step 3: Install required packages

For this, you can leverage pip as follows:

pip install numpy
pip install pyvista

Step 4: Test the installation

If you want to test your installation, type python in your terminal and run the following lines:

import numpy as np
import pyvista as pv
print(f”PyVista version: {pv.__version__}”)

This should return the pyvista version. Do not forget to exit Python from your terminal afterward (Ctrl+C).

Note: Here are some common issues and workarounds:

If PyVista doesn’t show a 3D window: pip install vtk
If environment activation fails: Restart the terminal
If data loading fails: Check file format compatibility (PLY, LAS, LAZ supported)

Beautiful, at this stage, your environment is ready. Now, let me share some quick ways to get your hands on 3D datasets.

Data Preparation for 3D Visualization

At the end of the article, I share with you the datasets as well as the code. However, in order to ensure you are fully independent, here are three reliable sources I regularly use to get my hands on point cloud data:

The LiDAR Data Download Process. © F. Poux

The USGS 3DEP LiDAR Point Cloud Downloads

Visit: https://apps.nationalmap.gov/lidar-explorer/
Navigate to your area of interest
Download LAZ/LAS files

OpenTopography

Visit: https://portal.opentopography.org/datasets
Create a free account
Choose “Point Cloud” datasets
Download selected region

ETH Zurich’s PCD Repository

Visit: https://www.eth3d.net/datasets
Download the “high-res multi-view” datasets
Extract the PLY files

For quick testing, you can also use PyVista’s built-in example data:

# Load sample data
from pyvista import examples
terrain = examples.download_crater_topo()
terrain.plot()

Note: Remember to always check the data license and attribution requirements when using public datasets.

Finally, to ensure a complete setup, below is a typical expected folder structure:

project_folder/
├── environment.yml
├── data/
│ └── pointcloud.ply
└── scripts/
└── gifmaker.py

Beautiful, we can now jump right onto the first stage: loading and visualizing textured mesh data.

2. Loading and Visualizing Textured Mesh Data

One first critical step is properly loading and rendering 3D data. In my research laboratory, I have found that PyVista provides an excellent foundation for handling complex 3D visualization tasks.

Here’s how you can approach this fundamental step:

import numpy as np
import pyvista as pv

mesh = pv.examples.load_globe()
texture = pv.examples.load_globe_texture()

pl = pv.Plotter()
pl.add_mesh(mesh, texture=texture, smooth_shading=True)
pl.show()

This code snippet loads a textured globe mesh, but the principles apply to any textured 3D model.

Let me discuss and speak a bit about the smooth_shading parameter. It’s a tiny element that renders the surfaces more continuous (as opposed to faceted), which, in the case of spherical objects, improves the visual impact.

Now, this is just a starter for 3D mesh data. This means that we deal with surfaces that join points together. But what if we want to work solely with point-based representations?

In that scenario, we have to consider shifting our data processing approach to propose solutions to the unique visual challenges attached to point cloud datasets.

3. Point Cloud Data Integration

Point cloud visualization demands extra attention to detail. In particular, adjusting the point density and the way we represent points on the screen has a noticeable impact.

Let us use a PLY file for testing (see the end of the article for resources).

You can load a point cloud pv.read and create scalar fields for better visualization (such as using a scalar field based on the height or extent around the center of the point cloud).

In my work with LiDAR datasets, I’ve developed a simple, systematic approach to point cloud loading and initial visualization:

cloud = pv.read('street_sample.ply')
scalars = np.linalg.norm(cloud.points - cloud.center, axis=1)

pl = pv.Plotter()
pl.add_mesh(cloud)
pl.show()

The scalar computation here is particularly important. By calculating the distance from each point to the cloud’s center, we create a basis for color-coding that helps convey depth and structure in our visualizations. This becomes especially valuable when dealing with large-scale point clouds where spatial relationships might not be immediately apparent.

Moving from basic visualization to creating engaging animations requires careful consideration of the visualization environment. Let’s explore how to optimize these settings for the best possible results.

4. Optimizing the Visualization Environment

The visual impact of our animations heavily depends on the visualization environment settings.

Through extensive testing, I’ve identified key parameters that consistently produce professional-quality results:

pl = pv.Plotter(off_screen=False)
pl.add_mesh(
   cloud,
   style='points',
   render_points_as_spheres=True,
   emissive=False,
   color='#fff7c2',
   scalars=scalars,
   opacity=1,
   point_size=8.0,
   show_scalar_bar=False
   )

pl.add_text('test', color='b')
pl.background_color = 'k'
pl.enable_eye_dome_lighting()
pl.show()

As you can see, the plotter is initialized off_screen=False to render directly to the screen. The point cloud is then added to the plotter with specified styling. The style=’points’ parameter ensures that the point cloud is rendered as individual points. The scalars=’scalars’ argument uses the previously computed scalar field for coloring, while point_size sets the size of the points, and opacity adjusts the transparency. A base color is also set.

Note: In my experience, rendering points as spheres significantly improves the depth perception in the final generated animation. You can also combine this by using the eye_dome_lighting feature. This algorithm adds another layer of depth cues through some sort of normal-based shading, which makes the structure of point clouds more apparent.

You can play around with the various parameters until you obtain a rendering that is satisfying for your applications. Then, I propose that we move to creating the animated GIFs.

5. Creating Animated GIFs

At this stage, our aim is to generate a series of renderings by varying the viewpoint from which we generate these.

This means that we need to design a camera path that is sound, from which we can generate frame rendering.

This means that to generate our GIF, we must first create an orbiting path for the camera around the point cloud. Then, we can sample the path at regular intervals and capture frames from different viewpoints.

These frames can then be used to create the GIF. Here are the steps:

I change to off-screen rendering
I take the cloud length parameters to set the camera
I create a path
I create a loop that takes a point of this pass

Which translates into the following:

pl = pv.Plotter(off_screen=True, image_scale=2)
pl.add_mesh(
   cloud,
   style='points',
   render_points_as_spheres=True,
   emissive=False,
   color='#fff7c2',
   scalars=scalars,
   opacity=1,
   point_size=5.0,
   show_scalar_bar=False
   )

pl.background_color = 'k'
pl.enable_eye_dome_lighting()
pl.show(auto_close=False)

viewup = [0, 0, 1]

path = pl.generate_orbital_path(n_points=40, shift=cloud.length, viewup=viewup, factor=3.0)
pl.open_gif("orbit_cloud_2.gif")
pl.orbit_on_path(path, write_frames=True, viewup=viewup)
pl.close()

As you can see, an orbital path is created around the point cloud using pl.generate_orbital_path(). The path’s radius is determined by cloud_length, the center is set to the center of the point cloud, and the normal vector is set to [0, 0, 1], indicating that the circle lies in the XY plane.

From there, we can enter a loop to generate individual frames for the GIF (the camera’s focal point is set to the center of the point cloud).

The image_scale parameter deserves special attention—it determines the resolution of our output.

I’ve found that a value of 2 provides a good balance between the perceived quality and the file size. Also, the viewup vector is crucial for maintaining proper orientation throughout the animation. You can experiment with its value if you want a rotation following a non-horizontal plane.

This results in a GIF that you can use to communicate very easily.

But we can push one extra stage: creating an MP4 video. This can be useful if you want to obtain higher-quality animations with smaller file sizes as compared to GIFs (which are not as compressed).

6. High-Quality MP4 Video Generation

The generation of an MP4 video follows the exact same principles as we used to generate our GIF.

Therefore, let me get straight to the point. To generate an MP4 file from any point cloud, we can reason in four stages:

Gather your configurations over the parameters that best suit you.
Create an orbital path the same way you did with GIFs
Instead of using the open_gif function, let us use it open_movie to write a “movie” type file.
We orbit on the path and write the frames, similarly to our GIF method.

Note: Don’t forget to use your proper configuration in the definition of the path.

This is what the end result looks like with code:

pl = pv.Plotter(off_screen=True, image_scale=1)
pl.add_mesh(
   cloud,
   style='points_gaussian',
   render_points_as_spheres=True,
   emissive=True,
   color='#fff7c2',
   scalars=scalars,
   opacity=0.15,
   point_size=5.0,
   show_scalar_bar=False
   )

pl.background_color = 'k'
pl.show(auto_close=False)

viewup = [0.2, 0.2, 1]

path = pl.generate_orbital_path(n_points=40, shift=cloud.length, viewup=viewup, factor=3.0)
pl.open_movie("orbit_cloud.mp4")
pl.orbit_on_path(path, write_frames=True)
pl.close()

Notice the use of points_gaussian style and adjusted opacity—these settings provide interesting visual quality in video format, particularly for dense point clouds.

And now, what about streamlining the process?

7. Streamlining the Process with a Custom Function

To make this process more efficient and reproducible, I’ve developed a function that encapsulates all these steps:

def cloudgify(input_path):
   cloud = pv.read(input_path)
   scalars = np.linalg.norm(cloud.points - cloud.center, axis=1)
   pl = pv.Plotter(off_screen=True, image_scale=1)
   pl.add_mesh(
       cloud,
       style='Points',
       render_points_as_spheres=True,
       emissive=False,
       color='#fff7c2',
       scalars=scalars,
       opacity=0.65,
       point_size=5.0,
       show_scalar_bar=False
       )

   pl.background_color = 'k'
   pl.enable_eye_dome_lighting()
   pl.show(auto_close=False)

   viewup = [0, 0, 1]

   path = pl.generate_orbital_path(n_points=40, shift=cloud.length, viewup=viewup, factor=3.0)
  
   pl.open_gif(input_path.split('.')[0]+'.gif')
   pl.orbit_on_path(path, write_frames=True, viewup=viewup)
   pl.close()
  
   path = pl.generate_orbital_path(n_points=100, shift=cloud.length, viewup=viewup, factor=3.0)
   pl.open_movie(input_path.split('.')[0]+'.mp4')
   pl.orbit_on_path(path, write_frames=True)
   pl.close()
  
   return

Note: This function standardizes our visualization process while maintaining flexibility through its parameters. It incorporates several optimizations I’ve developed through extensive testing. Note the different n_points values for GIF (40) and MP4 (100)—this balances file size and smoothness appropriately for each format. The automatic filename generation split(‘.’)[0] ensures consistent output naming.

And what better than to test our new creation on multiple datasets?

8. Batch Processing Multiple Datasets

Finally, we can apply our function to multiple datasets:

dataset_paths= ["lixel_indoor.ply", "NAAVIS_EXTERIOR.ply", "pcd_synthetic.ply", "the_adas_lidar.ply"]

for pcd in dataset_paths:
   cloudgify(pcd)

This approach can be remarkably efficient when processing large datasets made of several files. Indeed, if your parametrization is sound, you can maintain consistent 3D visualization across all outputs.

Growing: I am a big fan of 0% supervision to create 100% automatic systems. This means that if you want to push the experiments even more, I suggest investigating ways to automatically infer the parameters based on the data, i.e., data-driven heuristics. Here is an example of a paper I wrote a couple of years down the line that focuses on such an approach for unsupervised segmentation (Automation in Construction, 2022)

A Little Discussion

Alright, you know my tendency to push innovation. While relatively simple, this Cloud2Gif solution has direct applications that can help you propose better experiences. Three of them come to mind, which I leverage on a weekly basis:

Interactive Data Profiling and Exploration: By generating GIFs of complex simulation results, I can profile my results at scale very quickly. Indeed, the qualitative analysis is thus a matter of slicing a sheet filled with metadata and GIFs to check if the results are on par with my metrics. This is very handy
Educational Materials: I often use this script to generate engaging visuals for my online courses and tutorials, enhancing the learning experience for the professionals and students that go through it. This is especially true now that most material is found online, where we can leverage the capacity of browsers to play animations.
Real-time Monitoring Systems: I worked on integrating this script into a real-time monitoring system to generate visual alerts based on sensor data. This is especially relevant for sensor-heavy systems, where it can be difficult to extract meaning from the point cloud representation manually. Especially when conceiving 3D Capture Systems, leveraging SLAM or other techniques, it can be helpful to get a feedback loop in real-time to ensure a cohesive registration.

However, when we consider the broader research landscape and the pressing needs of the 3D data community, the real value proposition of this approach becomes evident. Scientific research is increasingly interdisciplinary, and communication is key. We need tools that enable researchers from diverse backgrounds to understand and share complex 3D data easily.

The Cloud2Gif script is self-contained and requires minimal external dependencies. This makes it ideally suited for deployment on resource-constrained edge devices. And this may be the top application that I worked on, leveraging such a straightforward approach.

As a little digression, I saw the positive impact of the script in two scenarios. First, I designed an environmental monitoring system for diseases in farmland crops. This was a 3D project, and I could include the generation of visual alerts (with an MP4 file) based on the real-time LiDAR sensor data. A great project!

In another context, I wanted to provide visual feedback to on-site technicians using a SLAM-equipped system for mapping purposes. I integrated the process to generate a GIF every 30 seconds that showed the current state of data registration. It was a great way to ensure consistent data capture. This actually allowed us to reconstruct complex environments with better consistency in managing our data drift.

Conclusion

Today, I walked through a simple yet powerful Python script to transform 3D data into dynamic GIFs and MP4 videos. This script, combined with libraries like NumPy and PyVista, allows us to create engaging visuals for various applications, from presentations to research and educational materials.

The key here is accessibility: the script is easily deployable and customizable, providing an immediate way of transforming complex data into an accessible format. This Cloud2Gif script is an excellent piece for your application if you need to share, assess, or get quick visual feedback within data acquisition situations.

What is next?

Well, if you feel up for a challenge, you can create a simple web application that allows users to upload point clouds, trigger the video generation process, and download the resulting GIF or MP4 file.

This, in a similar manner as shown here:

In addition to Flask, you can also create a simple web application that can be deployed on Amazon Web Services so that it is scalable and easily accessible to anyone, with minimal maintenance.

These are skills that you develop through the Segmentor OS Program at the 3D Geodata Academy.

About the author

Florent Poux, Ph.D. is a Scientific and Course Director focused on educating engineers on leveraging AI and 3D Data Science. He leads research teams and teaches 3D Computer Vision at various universities. His current aim is to ensure humans are correctly equipped with the knowledge and skills to tackle 3D challenges for impactful innovations.

Resources

The post How To Generate GIFs from 3D Models with Python appeared first on Towards Data Science.