Akil,, B., Zhou,, Y., & Rohm,, U. (2017). On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science. In IEEE International Conference on Big Data (Big Data) (pp. 303–310), Boston, MA.
Apache Hive. (2018). Design. Retrieved from https://cwiki.apache.org/confluence/display/Hive/Design
Apache MADlib. (2018). Apache MADlib. Retrieved from https://cwiki.apache.org/confluence/display/MADLIB/Apache+MADlib, Accessed October
Apache Presto Overview. (2018). Apache Presto ‐ quick guide. Retrieved from https://www.tutorialspoint.com/apache_presto/apache_presto_quick_guide.htm
Armbrust,, M., Xin,, R., Lian,, C., Huai,, Y., Liu,, D., Bradley,, J., …, Zaharia,, M. (2015). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD `15) (pp. 1383–1394), Melbourne, Australia, May 31–June 4. New York, NY: ACM. https://doi.org/10.1145/2723372.2742797
Bobade,, V. B. (2016). Survey paper on Big Data and Hadoop. International Research Journal of Engineering and Technology, 03(01), 861–863.
Boncz,, P., Neumann,, T., & Erling,, O. (2014). TPC‐H analyzed: Hidden messages and lessons learned from an influential benchmark. Lecture Notes in Computer Science, 8391 LNCS, 61–76.
Chang,, L., Wang, Z., Ma, T., Jian, L., Ma, L., Goldshuv, A., … Bhandarkar, M. (2014). HAWQ: a massively parallel processing SQL engine in Hadoop. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ′14) (pp. 1223–1234), Snowbird, UT, June 22–27. New York, NY: ACM Press.
Chen,, Y., Qin,, X., Bian,, H., Chen,, J., Dong,, Z., Du,, X., … Zhang,, H. (2014). A study of SQL‐on‐Hadoop systems. Lecture Notes in Computer Science, 8807, 154–166.
Floratou,, A., Minhas,, U. F., & Ozcan,, F. (2014). SQL‐on‐Hadoop: Full circle back to shared‐nothing database architectures. Proceedings of the VLDB Endowment, 7(12), 1295–1306.
Gates, A. (2014). Stinger next: Enterprise SQL at Hadoop scale with Apache Hive. Retrieved from https://hortonworks.com/blog/stinger-next-enterprise-sql-hadoop-scale-apache-hive/
Geoinsyssoft. (2016). Hive file formats examples. Retrieved from http://geoinsyssoft.com/hive-file-format-examples/
Ghat,, D., Rorke,, D., & Kumar,, D. (2016). New SQL benchmarks: Apache Impala (incubating) uniquely delivers analytic database performance. [Online]. Retrieved from https://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-delivers-analytic-database-performance/
Gounaris,, A., & Torres,, J. (2018). A methodology for Spark parameter tuning. Big Data Research, 11, 22–32.
Grover,, A., Gholap,, J., Janeja,, V. P., Yesha,, Y., Chintalapati,, R., Marwaha,, H., & Modi,, K. (2015). SQL‐like big data environments: Case study in clinical trial analytics. In IEEE International Conference on Big Data (IEEE BigData 2015) (pp. 2680–2689), Santa Clara, CA, October 29–November 1.
Hortonworks. (2018). Apache Hive overview. Retrieved from https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/hive-overview/content/hive-apache-hive-3-architecturural-overview.html
Impala Frequently Asked Questions. (2018). Impala guide. Retrieved from https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_faq.html
Impala Overview. (2018a). Learn Impala. Retrieved from https://www.tutorialspoint.com/impala/impala_overview.htm
Impala Overview. (2018b). Cloudera Introduction. Retrieved from https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_intro.html
Jethro.(2016). Hadoop Hive and 11 SQL‐on‐hadoop alternatives. [Online]. Retrieved from https://jethro.io/hadoop-hive
Kornacker,, M., Behm,, A., Bittorf,, V., Bobrovytsky,, T., Ching,, C., Choi,, A., Erickson,, J., …, Yoder, M. (2015). Impala: A Modern, Open‐Source SQL Engine for Hadoop. CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, January 4–7, 2015.
Landset,, S., Khoshgoftaar,, T. M., Richter,, A. N., & Hasanin,, T. (2015). A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, 2(1), 24.
Laskowski,, J. (2017). Mastering Apache Spark 2. Retrieved from https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-architecture.html
Li,, Q., Chen,, Y., Wang,, J., Chen,, Y., & Chen,, H. (2016). Web media and stock markets : A survey and future directions from a big data perspective. IEEE Transactions on Knowledge and Data Engineering, 30, 381–399.
Li,, X., & Zhou,, W. (2015). Performance comparison of Hive, Impala and Spark SQL. 7th International Conference on Intelligent Human‐Machine Systems and Cybernetics, Hangzhou (pp. 418–423).
MapR. (2016). SQL on Hadoop details. [Online]. Retrieved from https://mapr.com/why-hadoop/sql-hadoop/sql-hadoop-details/
McDonald,, C. (2015). Apache Drill architecture: The ultimate guide. [Online]. Retrieved from https://www.mapr.com/blog/apache-drill-architecture-ultimate-guide
Mehta,, P., Dorkenwald,, S., Zhao,, D., Kaftan,, T., Cheung,, A., Balazinska,, M., … AlSayyad,, Y. (2017). Comparative evaluation of big‐data systems on scientific image analytics workloads. Proceedings of the VLDB Endowment, 10(11), 1226–1237.
Morgan,, T. P. (2013). EMC morphs Hadoop elephant into SQL database HAWQ. [Online]. Retrieved from http://www.theregister.co.uk/2013/02/25/emc_pivotal_hd_hadoop_HAWQ_database/
Osipov,, D. (2015). Why we use Presto not Hive for interactive Hadoop queries, Shazam data engineer. Retrieved from https://www.computing.co.uk/ctg/analysis/2405737/why-we-use-presto-not-hive-for-interactive-hadoop-queries-at-shazam
Owl,, C. (2015a). The SQL on Hadoop landscape: An overview (Part I). [Online]. Retrieved from http://cleverowl.uk/2015/11/19/the-sql-on-hadoop-landscape-an-overview-part-i/
Owl,, C. (2015b). The SQL on Hadoop landscape: An overview (Part II).[Online]. Retrieved from http://cleverowl.uk/2015/12/25/the-sql-on-hadoop-landscape-an-overview-part-ii/
Pirzadeh,, P., Carey,, M., & Westmann,, T. (2017). A performance study of big data analytics platforms. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2911–2920). Boston, MA: IEEE https://doi.org/10.1109/BigData.2017.8258260
Prasad,, B. R., & Agarwal,, S. (2016). Comparative study of Big Data computing and storage tools : A review. International Journal of Database Theory and Application, 9(1), 45–66.
Presto. (2018). Presto: Distributed SQL query engine for big data. Retrieved from https://prestodb.io
Ramakrishnan,, R., Sridharan,, B., Douceur,, J., Kasturi,, P., Krishnamachari‐Sampath,, B., Krishnamoorthy,, K., … Venkatesan, M. (2017). Azure data lake store: A hyperscale distributed file service for big data analytics. In Proceedings of the International Conference on Management of Data (SIGMOD `17) (pp. 51–63), Chicago, IL, May 14–19. New York, NY: ACM.
Rodrigues,, M., Santos,, M. Y., & Bernardino,, J. (2017). Describing and comparing big data querying tools. Recent Advances in Information Systems and Technologies, 569, 115–124.
Sakr,, S. (2014). A brief comparative perspective on SQL access for Hadoop. Retrieved from https://www.ibm.com/developerworks/library/ba-compare-sql-access-hadoop/index.html
Santos,, M. Y., Costa,, C., Galvão,, J., Andrade,, C., Martinho,, B., Lima,, F., & Costa,, E. (2017). Evaluating SQL‐on‐hadoop for big data warehousing on not‐so‐good hardware. In Proceedings of the 21st International Database Engineering %26 Applications Symposium (IDEAS 2017) (pp. 242–252). New York, NY: ACM.
Silva,, Y. N., Almeida,, I., & Queiroz,, M. (2016). SQL: From traditional databases to big data. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE `16) (p. 418). New York, NY: ACM Press. https://doi.org/10.1145/2839509.2844560
Szegedi,, I. (2014). Pivotal Hadoop distribution and HAWQ realtime query engine. [Online]. Retrieved from https://dzone.com/articles/pivotal-hadoop-distribution
The Apache Software Foundation. (2018). Apache Drill architecture. Retrieved from https://drill.apache.org/docs/architecture-introduction/
TPC‐DS. (2018). Transaction Processing Performance Council (TPC). TPC Benchmark DS standard specification. Retrieved from http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.10.0.pdf
TPC‐H (2017). Transaction Processing Performance Council (TPC). TPC Benchmark H (Decision Support) standard specification. Retrieved from http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.3.pdf
Vaidya,, P., & Lee,, J. (2008). Characterization of TPC‐H queries for a column‐oriented database on a dual‐core amd athlon processor. In Proceedings of the 17th ACM Conference on Information and Knowledge Mining ‐ CIKM ’08 (pp. 1411–1412). Napa Valley, CA: ACM Press.