Shape the Future of Data Storage Research: StoreHub Community Survey
The StoreHub initiative is developing a storage-centric, publicly accessible research
cluster designed to meet the diverse needs of the data storage community. To ensure we're
building the infrastructure that best supports your research, we're inviting you to share
your insights through a brief survey. Your feedback will directly influence the design
and resource planning for this new cluster, ensuring it aligns with the community's needs.
Take the survey here
by February 1st to make your voice heard!
StoreHub
StoreHub is a collaborative platform designed to advance data storage research by providing a specialized infrastructure that meets the unique needs of researchers. It brings together experts handling large amounts of data, focusing on I/O performance, and developing innovative storage solutions, making it a vital resource for the community.
Project Motivation
Large-scale applications in scientific, Big Data, and AI communities present unique data storage requirements that existing solutions struggle to address. Modern storage systems are rapidly evolving, leading to heterogeneous storage resources where data movement becomes complex and performance-dominant.
- Infrastructure isolation is essential for capturing subsystem impacts.
- Customized and flexible hardware compositions are needed to emulate various machine models.
- Hardware heterogeneity is vital for assessing research impacts across diverse hardware designs.
- Programmable hardware is transforming modern software solutions.
Project Summary
Our goal is to establish, nurture, and sustain a vibrant research community centered on data storage research. We intend to support this community with StoreHub, an adaptable infrastructure equipped with experimental hardware and cutting-edge software. Our objectives include:
- Flexible storage hardware composition
- Ease of use and deployment
- Responsive support system
- Training opportunities
Project Significance
The significance of StoreHub is three-fold:
- CISE researchers gain access to premier infrastructure for data storage research.
- Uniting CISE researchers into a research community to collectively address the I/O bottleneck.
- Early access to prototype devices from vendors, enhancing creativity and output.
Envisioned Research Infrastructure
Hardware Composition
- Node Composition: A significant number of nodes prioritizing storage capabilities over CPU density.
- Storage Mediums: PMEM, NVMe SSD, SATA SSD, HDDs in RAID configurations.
- Hardware Diversity: Mix of CPUs and GPUs with various technologies.
- Networking: Fast Ethernet and Infiniband network interconnections.
- Modern Protocols: Support for DDR5, PCIe 5.0, NVMe-oF, and innovative research with concept devices.
User Services
- Software Management: Flexible package management.
- Resource Management: Cluster resource manager for isolation and programmable devices support.
- Debugging and Telemetry: Comprehensive tools for capturing code efficiency and hardware utilization.
- Storage Flexibility: Ability to mount/unmount storage devices based on user requirements.
Research Areas Enabled
- Advanced Data Buffering
- Real-time Data Streaming
- I/O Convergence
- Storage Stack Development
- Tuning I/O for Deep Learning
- Storage Configuration and Resource Provisioning
- I/O Characterization and Instrumentation
Institutions
Sponsor
Thanks to the National Science Foundation (NSF) for supporting StoreHub under award CIRC-2346504.