Clamor
Citations Over Time
Abstract
We propose Clamor, a functional cluster computing framework that adds support for fine-grained, transparent access to global variables for distributed, data-parallel tasks. Clamor targets workloads that perform sparse accesses and updates within the bulk synchronous parallel execution model, a setting where the standard technique of broadcasting global variables is highly inefficient. Clamor implements a novel dynamic replication mechanism in order to enable efficient access to popular data regions on the fly, and tracks finegrained dependencies in order to retain the lineage-based fault tolerance model of systems like Spark. Clamor can integrate with existing Rust and C++ libraries to transparently distribute programs on the cluster. We show that Clamor is competitive with Spark in simple functional workloads and can improve performance significantly compared to custom systems on workloads that sparsely access large global variables: from 5x for sparse logistic regression to over 100x on distributed geospatial queries.
Related Papers
- → Advances in geocomputation and geospatial artificial intelligence (GeoAI) for mapping(2023)75 cited
- → Big Geospatial Data or Geospatial Big Data? A Systematic Narrative Review on the Use of Spatial Data Infrastructures for Big Geospatial Sensing Data in Public Health(2022)12 cited
- → Geospatial information utility: an estimation of the relevance of geospatial information to users(2003)32 cited
- → On-line Geospatial Term Extraction from Streaming Geotagged Tweets(2017)4 cited
- → Incremental updating geospatial data by granular computing(2009)