0 citations0 references

On random sampling over joins

ACM SIGMOD Record1999Vol. 28(2), pp. 263–274

Citations Over TimeTop 10% of 1999 papers

Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya

Abstract

A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are significantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.

Related Papers

Efficien t Processing Distributed Joins with Bloomfilter using MapReduce y(2013)
→ Towards multi-way join aware optimizer in SAP HANA(2020)3 cited
→ The Design of the Efficient Theta-Join in Map-Reduce Environment(2016)2 cited
USING SEMI-JOINS TO REALIZE JOINS AND MIXED-SEMI(?)JOINS(1987)
→ The Copies, Joins, Join Sketches and Plates(2021)