000 02762 a2200229 4500
008 210217b ||||| |||| 00| 0 eng d
020 _a9781108477444
082 _a005.7
_bCOR
100 _aCormode, Graham
245 _aSmall summaries for big data
260 _bCambridge University Press,
_c2020.
_aCambridge:
300 _avii, 270 p.: ill. ;
_bhb,
_c24 cm.
365 _aGBP
_b42.99
504 _aIncludes bibliographical references and index.
520 _aThe massive volume of data generated in modern applications can overwhelm our ability to conveniently transmit, store, and index it. For many scenarios, building a compact summary of a dataset that is vastly smaller enables flexibility and efficiency in a range of queries over the data, in exchange for some approximation. This comprehensive introduction to data summarization, aimed at practitioners and students, showcases the algorithms, their behavior, and the mathematical underpinnings of their operation. The coverage starts with simple sums and approximate counts, building to more advanced probabilistic structures such as the Bloom filter, distinct value summaries, sketches, and quantile summaries. Summaries are described for specific types of data, such as geometric data, graphs, and vectors and matrices. The authors offer detailed descriptions of, and pseudocode for, key algorithms that have been incorporated in systems from companies such as Google, Apple, Microsoft, Netflix, and Twitter. graham cormode is Professor of Computer Science at the University of Warwick, doing research in data management, privacy, and big data analysis. Previously he was a principal member of technical staff at AT&T Labs-Research. His work has attracted more than 14,000 citations and has appeared in more than 100 conference papers and 40 journal papers and been awarded 30 US patents. Cormode is the corecipient of the 2017 Adams Prize for Mathematics for his work on statistical analysis of big data. He has edited two books on applications of algorithms and coauthored a third. ke yi is a professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He obtained his PhD from Duke University. His research spans theoretical computer science and database systems. He has received the SIGMOD Best Paper Award (2016), a SIGMOD Best Demonstration Award (2015), and a Google Faculty Research Award (2010). He currently serves as an associate editor of ACM Transactions on Database Systems, and has also previously served for IEEE Transactions on Knowledge and Data Engineering
650 _aComputer programming
650 _aBig Data
650 _aComputer science
650 _aData Science
700 _aYi, Ke
942 _2ddc
_cTD
999 _c54371
_d54371