Jan 11, 2024
The recommendation is to use SQL syntax within the DataFrame API because, for many use cases, it's more expressive and concise. SQL is also more widely adopted than PySpark syntax, so it's typically easier for data engineering novices to leverage spark.sql() instead of native PySpark.