In the last years, social media (mainly Instagram) influencers gained immense popularity as more and more brands understand their potential for marketing products and services. The challenge is to recommend influencer(s) for a particular marketing campaign. Companies might be interested in micro influencers for a certain industry/region and they would also like to understand which content works for the audience of a particular influencer.
After finding the right influencers they might want to start a marketing campaign with them and monitor the efficiency of such a campaign. In Socialbakers, we started tackling this problem in November 2018 when we piloted whether we can source Instagram’s business profiles and detect relevant attributes from those profiles. We processed large amounts of semi-structured data (1 TB) and tried to estimate demographic, geographic data and interests of each influencer.
After that, we built a smart search algorithm that would take into account these attributes as well as various metrics. Finally, we designed a machine learning based recommender that links content with the influencers based on their audience. This content would serve as an inspiration to the campaign manager. The data exploration and prototyping were done in Databricks (pyspark). Final optimized ETL that processes the data and persists results in S3 and Elasticsearch was also built in Databricks. The content recommended utilizes NLP and other ML approaches.
We had no knowledge of Apache Spark before the project so had to onboard the technology during the project. In this paper, we would like to discuss two aspects of the influencers project. Firstly, it is the final influencer recommendation solution where we used Databricks for innovative research and large-scale data engineering including ML. Secondly, it is the challenges we faced while deploying Apache Spark from the scratch and onboarding the teams to our new platform.
Session hashtag: #SAISExp15
Petr has 10 years of experience in analytics, data engineering and science projects delivery on various seniority levels - from a developer to a leadership of multiple teams/projects. Currently, he is responsible for running research and data engineering teams in Socialbakers. He also kicked off and runs with Apache Spark deployment in the company. In his free time, he enjoys traveling and hiking.
Jan has 10 years of experience in programming, data processing and research. He is the founding member of the research team in Socialbakers and responsible for design, research and development of core product features exploiting big data analysis and machine learning techniques. Currently, he is involved with Apache Spark deployment for the purposes of research and innovation in Socialbakers.