Academic Jobs - Home of Higher Ed Logo

SIMADL Dataset: How Staffordshire University Researchers Are Powering Smarter Homes

Submit News
a large group of people walking in front of a building
Photo by Haydon on Unsplash

Advancing Smart Home Technology Through Innovative Dataset Creation

Researchers at Staffordshire University have developed a valuable resource that is helping shape the future of intelligent living environments. The SIMADL dataset, formally known as the Simulated Activities of Daily Living Dataset, offers synthetic data designed specifically for training and testing machine learning models in smart home settings. This work stands out because it addresses a common challenge in the field: the scarcity of large, labeled, and privacy-friendly datasets for studying how people interact with their homes on a daily basis.

Activities of Daily Living, often abbreviated as ADLs, refer to routine tasks such as cooking, sleeping, watching television, or moving between rooms. In smart homes equipped with sensors, recognizing these activities accurately can support applications ranging from energy efficiency to health monitoring for older adults. The SIMADL dataset provides two distinct collections: one optimized for classifying normal ADLs and another focused on detecting unusual or anomalous behaviors. Both were generated using an open-source tool called OpenSHS, the Open Smart Home Simulator, which creates realistic 3D environments and sensor readings without requiring real-world installations.

The Role of Simulation in Modern Research

Creating realistic datasets for smart home research traditionally involves installing sensors in actual residences and recruiting participants, a process that raises privacy concerns, costs, and logistical hurdles. Simulation tools like OpenSHS overcome these barriers by allowing researchers to model homes, place virtual sensors on doors, lights, beds, and carpets, and have simulated residents perform scripted or free-form activities. The resulting data captures binary sensor states—on or off—across dozens of channels, producing structured files ready for machine learning pipelines.

Staffordshire University’s team leveraged this approach to generate 84 files in total: 42 for standard classification tasks and 42 for anomaly detection scenarios. Each file contains time-stamped sensor activations that reflect typical daily routines. Because the data is fully synthetic, it can be shared openly without compromising individual privacy, making it an attractive option for academic and industry teams worldwide.

Key Features and Structure of the SIMADL Dataset

The dataset emphasizes practicality for machine learning practitioners. Sensor data is presented in a straightforward tabular format with 29 columns representing different binary sensors. Researchers can split the files into training and testing sets, commonly using a 60/40 ratio, to evaluate model performance. The classification portion covers multiple ADLs performed by seven virtual participants, while the anomaly detection portion introduces deliberate deviations from normal patterns.

One strength lies in its flexibility. Users can explore how different sensor placements or activity sequences affect recognition accuracy. The accompanying documentation explains the simulation parameters, enabling others to extend or replicate the work. This level of transparency supports reproducibility, a cornerstone of credible scientific progress in artificial intelligence and data science.

Contributions from Staffordshire University Researchers

The project reflects strong collaboration within the university’s computing and digital innovation communities. Talal Alshammari and Nasser Alshammari, both pursuing doctoral studies at the time, played central roles in data generation and analysis. Mohamed Sedky and Chris Howard contributed expertise in smart environments and machine learning evaluation. Their combined efforts produced not only the dataset but also a companion study examining various classification algorithms on the new resource.

Staffordshire University has a growing reputation for applied research in digital technologies and health innovation. The SIMADL work aligns with broader institutional priorities around simulation-based education and real-world problem solving. By releasing the dataset openly, the team has extended the university’s impact beyond traditional publications, inviting global participation in refining smart home intelligence.

aerial view of buildings and green grass fields

Photo by Max Böttinger on Unsplash

Applications in Healthcare and Independent Living

One of the most promising uses involves supporting older adults who wish to remain independent at home. Machine learning models trained on datasets like SIMADL can learn to recognize when a resident’s routine changes—perhaps indicating a fall, illness, or forgetfulness—and trigger appropriate alerts to caregivers or family members. The anomaly detection files are particularly relevant here, as they help algorithms distinguish normal variations from genuine concerns.

Beyond healthcare, the dataset supports energy management systems that adjust lighting and heating based on occupancy patterns. It also aids in developing more intuitive voice assistants and automation routines. Because the data is synthetic, developers can test edge cases that would be difficult or unethical to capture in real homes, accelerating safe innovation.

How the Dataset Supports Machine Learning Advancement

Training robust models requires diverse, high-quality examples. SIMADL supplies exactly that for the smart home domain. Researchers have already used it to benchmark algorithms ranging from traditional methods like support vector machines and k-nearest neighbors to more advanced approaches such as convolutional neural networks combined with long short-term memory networks. Reported accuracies on the dataset often exceed 90 percent for certain tasks, demonstrating its utility as a benchmark.

The binary sensor format keeps computational requirements modest, allowing even resource-constrained teams to experiment. At the same time, the volume of files supports rigorous cross-validation and hyperparameter tuning. This balance of accessibility and depth makes SIMADL a practical starting point for students, early-career researchers, and established labs exploring activity recognition.

Broader Impact on Higher Education and Research Communities

Releasing open datasets like SIMADL strengthens the global research ecosystem. Universities benefit when students and faculty can access ready-to-use resources instead of spending months building their own. The work also highlights the value of interdisciplinary approaches, blending computer science, engineering, and health sciences.

Staffordshire University’s example encourages other institutions to prioritize open science practices. When researchers share both tools and data, progress accelerates. Students gain hands-on experience with real-world problems, while industry partners can prototype solutions more quickly. The dataset has already appeared in multiple follow-on studies examining spatio-temporal behavior prediction and hybrid classification techniques.

Future Directions and Opportunities for Collaboration

As smart home technology evolves, datasets must keep pace. Future extensions of SIMADL could incorporate additional sensor types, multi-resident scenarios, or integration with wearable devices. The open nature of the underlying simulator invites community contributions, potentially leading to expanded libraries of activities or improved realism.

Academic programs in data science, artificial intelligence, and digital health can incorporate the dataset into coursework and capstone projects. This hands-on exposure prepares graduates for careers in emerging fields where understanding human behavior through sensor data is essential. Partnerships between universities and technology companies may further refine these resources for commercial applications.

a large brick building with a clock tower

Photo by Chris Marchant on Unsplash

Accessing and Using the SIMADL Dataset

The full dataset and related files are available through established open repositories. Researchers can download the classification and anomaly detection collections directly, along with documentation describing the simulation setup. The original paper detailing the creation process provides additional context for proper use and citation.

Those interested in the simulator itself can explore the OpenSHS project, which remains available for generating custom datasets tailored to specific research questions. This combination of ready-made data and extensible tools lowers barriers for new entrants while supporting advanced experimentation by experienced teams.

Conclusion: A Foundation for Smarter, More Supportive Homes

The SIMADL dataset represents a meaningful step forward in making smart home intelligence more accessible and reliable. By providing high-quality synthetic data focused on everyday activities and anomalies, the Staffordshire University team has created a resource that benefits researchers, educators, and ultimately the people who will live in these intelligent environments. As adoption of smart technologies grows, contributions like this will play an increasingly important role in ensuring systems are both effective and respectful of privacy.

Institutions committed to impactful research continue to demonstrate how focused academic efforts can yield tools with lasting value across healthcare, energy, and automation sectors. The ongoing availability of SIMADL ensures that progress in activity recognition and anomaly detection remains grounded in solid, shareable foundations.

Portrait of Prof. Evelyn Thorpe

Prof. Evelyn ThorpeView full profile

Contributing Writer

Promoting sustainability and environmental science in higher education news.

Acknowledgements:

Discussion

Sort by:

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

New0 comments

Join the conversation!

Add your comments now!

Have your say

Engagement level

Browse by Faculty

Browse by Subject

Frequently Asked Questions

📊What is the SIMADL dataset?

The SIMADL dataset, or Simulated Activities of Daily Living Dataset, is an open collection of synthetic sensor data designed for training machine learning models to recognize routine household activities and detect unusual behaviors in smart home settings.

👥Who created the SIMADL dataset?

Researchers Talal Alshammari, Nasser Alshammari, Mohamed Sedky, and Chris Howard at Staffordshire University developed the dataset and published their work in 2018.

🛠️How was the SIMADL data generated?

The data was created using OpenSHS, an open-source 3D smart home simulator that models sensor activations without the need for real-world installations or participant recruitment.

🏠What are the main uses of the SIMADL dataset?

It supports classification of normal daily activities and detection of anomalies, with applications in elderly care, energy management, and development of intelligent home automation systems.

🔓Is the SIMADL dataset free to use?

Yes, the dataset is openly available for download through public repositories, promoting reproducibility and collaboration in academic and industry research.

🎓How does SIMADL benefit higher education?

It provides students and faculty with ready-to-use data for courses and projects in machine learning, data science, and digital health, reducing the time needed to build custom datasets.

📡What sensors are included in the SIMADL dataset?

The dataset features 29 binary sensors tracking states such as doors, lights, beds, and carpets within a simulated smart home environment.

📚Has the SIMADL dataset been used in other studies?

Yes, it has been cited in subsequent research on activity recognition, spatio-temporal behavior prediction, and hybrid machine learning models for smart environments.

⬇️Where can researchers download the SIMADL dataset?

The files are hosted on GitHub under the openshs organization, alongside documentation and related simulation tools.

🔬What makes SIMADL different from real-world smart home datasets?

Being fully synthetic, it avoids privacy issues, allows controlled generation of edge cases, and can be scaled efficiently while maintaining realistic sensor patterns.