SynDRA
Synthetic Dataset for Railway Applications
About

Railways always played a crucial role in global transportation. The major players started investing in Automatic Train Operations, which require many autonomous capabilities, such as accurate localization, obstacle detection, track discrimination, and many more. Recent advancements in deep learning are promising for this field, but data acquisition and systematic testing on railways is expensive, dangerous, and will never capture the most rare cases. Furthermore, only a few datasets are publicly available for railway applications.
To fill these gaps, we used our custom simulation framework to generate SynDRA, a synthetic dataset for semantic segmentation of railway environments.
SynDRA
SynDRA is a synthetic dataset for railway applications. For now, SynDRA includes 80 sequences captured in 4 different environments and changing weather and light conditions. The sensors for now are limited to a stereo RGB camera setup, while the ground truth labels only include semantic segmentation. However, we will soon extend the dataset to additional environments, sensors (LiDAR and Radar), and ground truth labels (object detection, depth masks). Our custom simulation framework is flexible enough to accommodate any needs, so feel free to reach out!
You can download the entire dataset, which includes 80 sequences from stereo camera systems with related semantic annotations, or you can download individual scenarios, each containing 10 different sequences. MEGA Download Links for SynDRA v0, as presented in the WACV2025 paper:
- SynDRA: The whole dataset, ~1.8TB of data.
- Utils: Sensor Specs, and Python Scripts: from .bin files to png.
SynDRA was accepted for publication at the Winter Conference of Computer Vision Applications (WACV 2025).
If you use SynDRA-BBox in your research, please cite the following paper:
@inproceedings{d2025syndra, title={SynDRA: Synthetic Dataset for Railway Applications}, author={D’Amico, Gianluca and Nesti, Federico and Rossolini, Giulio and Marinoni, Mauro and Sabina, Salvatore and Buttazzo, Giorgio}, booktitle={2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, pages={3437--3446}, year={2025}, organization={IEEE}}
A new version of the dataset with bug fixes (such as corrected lighting and semantic labels) will be available soon. Please report any errors or bugs you encounter. You can contact us at gianluca.damico@santannapisa.it. Additionally, a benchmark table showcasing the results of semantic segmentation models on our dataset will be coming soon.
The Dataset
Semantic Segmentation Example
Different adversarial condition and seasonal changes




Sensor Specs - SynDRA
Stereo RGB Cameras
- Resolution: 1920×1080
- FOV: 90°
- Baseline: 0.6 m
- Height from rails: 3.5 m
- Frame Rate: 10 FPS
Annotations
- Semantic Segmentation pixel-level
- Per-frame camera poses
Dataset Structure - SynDRA
The dataset is organized as follows:
SynDRA/ ├── Scenario_0/ │ ├── HV/ │ │ ├── Sunny/ │ │ │ ├── Afternoon/ │ │ │ │ ├── RGBCamera_0/ │ │ │ │ │ ├── Bin_folder/ │ │ │ │ │ │ ├── Sce0_sun_aft_000001.bin │ │ │ │ │ │ ├── Sce0_sun_aft_000002.bin │ │ │ │ │ │ └── ... │ │ │ │ │ ├── Poses/ │ │ │ │ │ └── Times/ │ │ │ │ ├── RGBCamera_1/ │ │ │ │ ├── SSCamera_0/ │ │ │ │ └── SSCamera_1/ │ │ │ ├── Evening/ │ │ │ └── Morning/ │ ├── LV/ │ │ ├── Foggy/ │ │ │ ├── Afternoon/ │ │ │ │ ├── RGBCamera_0/ │ │ │ │ │ ├── Bin_folder/ │ │ │ │ │ │ ├── Sce0_fog_000001.bin │ │ │ │ │ │ ├── Sce0_fog_000002.bin │ │ │ │ │ │ └── ... │ │ │ │ │ ├── Poses/ │ │ │ │ │ └── Times/ │ │ │ │ ├── RGBCamera_1/ │ │ │ │ ├── SSCamera_0/ │ │ │ │ └── SSCamera_1/ │ │ ├── Rainy/ │ │ │ ├── Afternoon/ │ │ │ │ ├── RGBCamera_0/ │ │ │ │ │ ├── Bin_folder/ │ │ │ │ │ │ ├── Sce0_rai_000001.bin │ │ │ │ │ │ ├── Sce0_rai_000002.bin │ │ │ │ │ │ └── ... │ │ │ │ │ ├── Poses/ │ │ │ │ │ └── Times/ │ │ │ │ ├── RGBCamera_1/ │ │ │ │ ├── SSCamera_0/ │ │ │ │ └── SSCamera_1/ ├── Utils/ │ ├── RGB_bin_to_png.py │ ├── SS_bin_to_png.py │ └── Sensor_Specs.txt └── README.md
SynDRA-BBox
SynDRA-BBox is an extension of the original SynDRA dataset, specifically designed for 2D/3D object detection in railway environments. While the original SynDRA dataset focuses on image-level data captured from simulated railway scenarios, SynDRA-BBox introduces precise 2D and 3D bounding box annotations for a diverse set of railway objects, including trains, pedestrians, and natural obstacles.
Key features:
- Fully annotated 3D point clouds using simulated LiDAR data, aligned with RGB images.
- 2D and 3D bounding boxes for major object categories relevant to railway operation and safety.
- Maintains the visual diversity and domain fidelity of the original SynDRA environments, including urban, rural, and station scenes.
The dataset is intended for benchmarking tasks such as:
- 2D/3D object detection
- Sensor fusion (RGB + LiDAR)
- Semi-supervised and unsupervised domain adaptation
- Synthetic-to-real transfer learning
You can download the full dataset, which includes 72 sequences plus 2 bonus sequences from multi-sensor systems with corresponding annotations, or opt to download individual scenarios. Each scenario consists of 8 distinct sequences, with each sequence focused on a specific object of interest: car, truck, bus, crossing pedestrian, parallel pedestrian, tree with leaves, tree without leaves, and rocks. These objects either cross the tracks, run parallel to the railway, or lie over the rail in the path of the train.
To ensure visual diversity, every sequence within a scenario features a different combination of textures, randomly selected from a common pool of 10, for the ballast, sleepers, terrain, and crossing platform. These textures are consistent across all scenarios but are uniquely sampled for each sequence.
We use 5 different car models and various pedestrian models sourced from Epic Games' City Sample. The bus model remains the same across all sequences. Conversely, trees and rocks are uniquely varied across scenarios, with no repetitions. Additionally, each sequence includes dynamic elements such as moving vehicles and pedestrians on nearby roads, which are placed in different positions per sequence. The placement of natural objects like forest elements along the railway is also randomized for each sequence to further enrich the environmental variability. DRIVE Download Links for SynDRA-BBox:
- SynDRA-BBox: The whole dataset, ~1.8TB of data.
- Utils: Sensor Specs, and Python Scripts: from .bin files to png.
SynDRA-BBox related paper is currently under revision.
The Dataset
2D/3D Semantic Segmentation and Bounding Boxes Example


Scenario Examples


Sensor Specs - SynDRA-BBox
RGB Camera 0
- Resolution: 2464x1600
- FOV: 30°
- Height from rails: 3.5 m
- Frame Rate: 10 FPS
Stereo RGB Cameras 1/2
- Resolution: 2464x1600
- FOV: 90°
- Baseline: 0.6 m
- Height from rails: 3.5 m
- Frame Rate: 10 FPS
NotRepLidar_0 (Tele-15)
- Non-repeating pattern
- Range: 500 m
- FOV: 15°
- Height: 2.5 m
- Points/scan: 4800
- Rate: 10 Hz
BeamStackLidar_0 (HDL-64E)
- 64-beam scanning pattern
- Range: 120 m
- Horizontal FOV: 180°
- Vertical FOV: 26.8°
- Height: 2.5 m
- Rate: 10 Hz
Annotations
- 2D/3D bounding boxes
- Semantic segmentation (pixel and point-level)
- Depth (up to 650m)
- Calibration and poses
Dataset Structure - SynDRA-BBox
The dataset is organized as follows:
SynDRA-BBox/ ├── Scenario_0/ │ ├── Bus_0/ │ │ ├── Sunny/ │ │ │ ├── DepthCamera_0/ │ │ │ │ ├── Bin_folder/ │ │ │ │ │ ├── SceBus_0_sun_mor_000000.bin │ │ │ │ │ ├── SceBus_0_sun_mor_000001.bin │ │ │ │ │ └── ... │ │ │ │ ├── Poses/ │ │ │ │ └── Times/ │ │ │ ├── DepthCamera_1/ │ │ │ ├── DepthCamera_2/ │ │ │ ├── RGBCamera_0/ │ │ │ ├── RGBCamera_1/ │ │ │ ├── RGBCamera_2/ │ │ │ ├── SSCamera_0/ │ │ │ ├── SSCamera_1/ │ │ │ ├── SSCamera_2/ │ │ │ ├── NotRepLidar_0/ │ │ │ └── BeamStackLidar_0/ │ │ ├── CameraBBoxes.txt │ │ ├── LidarBBoxes_BS.txt │ │ └── LidarBBoxes_NR.txt │ ├── Car_0/ │ ├── Crossing_Pedestrian_0/ │ ├── Parallel_Pedestrian_0/ │ ├── Rock_0/ │ ├── Tree_noLeaf_0/ │ ├── Tree_Leaf_0/ │ └── Truck_0/ ├── Scenario_1/ ├── Scenario_2/ ├── Scenario_3/ ├── Scenario_4/ ├── Scenario_5/ ├── Scenario_6/ ├── Scenario_7/ ├── Scenario_8/ ├── Scenario_9/ │ ├── Crossing_Pedestrian_9/ │ └── Parallel_Pedestrian_9/ ├── Utils/ │ ├── Helpers/ │ │ ├── bbox_utils.py │ │ ├── image_utils.py │ │ └── ... │ ├── 2dbbox_gen.py │ ├── 3dbbox_gen.py │ ├── README.md │ ├── requirements.txt │ └── Sensor_Specs.txt └── README.md
Media
Research
If you are interested in synthetic data generation for railway environments, we are always open to new collaboration, check our previous work:
-
D’Amico et al. “SynDRA: Synthetic Dataset for Railway Applications.”
Accepted at WACV2025: Winter Conference of Computer Vision Applications (WACV), February 2025, Tucson, Arizona
-
D’Amico et al. “A Comparative Analysis of Visual Odometry in Virtual and Real-World Railways Environments.”
Accepted at RAILWAYS 2024: The Sixth International Conference on Railways Technology, September 2024, Prague
-
D’Amico et al. "TrainSim: A railway simulation framework for LiDAR and camera dataset generation."
IEEE Transactions on Intelligent Transportation Systems (2023)
-
D'Amico et al. “Graphic Simulation Framework of Railway Scenarios for LiDAR Dataset Generation
RAILWAYS 2022: The Fifth International Conference on Railways Technology, August 2022, Montpellier