Efficient similarity algorithm now in Apache Spark, thanks to Twitter - The Databricks Blog