Which one is better choice of language for spark Scala or Python?
Well there is no straight forward answer to this question, but can be answered based upon choice of preferences. My preference is python over scala. This is worth of mentioning here. I do like coding in scala but prefer python over it due to certain reason.
Comfort level: I am much more comfortable in python than scala as I am coding in python from quite long time. There are always public opinions about speed and performance. It is well known fact that compiled languages are faster than interpreted ones so need not to waste your time and words for this discussion.
I have given importance to productivity. My team has better resources of python than scala , so I went with the logic “even if your code runs 5% slower but writing of code is ten times faster because you are not investing your time to figure out how new language works”. Here I am very much clear about my preference. Another good reason to choose python is it has a lot of rich data science libraries and data visualisation tool which scala lacks as of now.
But the same time if you need to learn either of them from scratches then I would suggest, go with scala as spark is written in scala. So scala allows you understand internal working of spark. If you need, you can change internal working of spark as per your requirement.
Final verdict: if your requirement is simple or slightly complex analysis using spark then python is right choice. But if your requirement is to develop a production system then ride on scala.