Showing posts from 2016

Spark Window Functions for DataFrames and SQL


Spark Window Functions for DataFrames and SQL Introduced in Spark 1.4, Spark window functions improved the expressiveness of Spark DataFrames and Spark SQL. With window functions, you can easily calculate a moving average or cumulative sum, or reference a value in a previous row of a table. Window functions allow you to do many common calculations with DataFrames, without having to resort to RDD manipulation.
Aggregates, UDFs vs. Window functions Window functions are complementary to existing DataFrame operations: aggregates, such as sumand avg, and UDFs. To review, aggregates calculate one result, a sum or average, for each group of rows, whereas UDFs calculate one result for each row based on only data in that row. In contrast, window functions calculate one result for each row based on a window of rows. For example, in a moving average, you calculate for each row the average of the rows surrou…

[Python] Using % and .format() for great good!


Contribute on GitHub PyFormat Using % and .format() for great good!Python has had awesome string formatters for many years but the documentation on them is far too theoretic and technical. With this site we try to show you the most common use-cases covered by the old andnew style string formatting API with practical examples.
If not otherwise stated all examples work with Python 2.7, 3.2, 3.3, and 3.4 without requiring any additional libraries or monkey-patching.
Further details about these two formatting methods can be found in the official Python documentation:
old stylenew style If you want to contribute more examples, feel free to create a pull-request on Github! Table of Contents:Basic formattingValue conversionPadding and aligning stringsTruncating long stringsCombining truncating and paddingNumbersPadding numbersSigned numbersNamed placeholdersGetitem and GetattrDatetimeCustom objects Basic formatting Simple positional formatting is probably the most c…