Mike Dias

Data Engineer, Atlassian

Mike Dias is a data engineer at Atlassian, making the move from Sao Paulo to Sydney to manage the data ingestion pipeline for Atlassian’s on-premise products. Prior to Atlassian, Mike built real-time streaming pipelines for an e-commerce retailer. He is an extremely fit long-distance runner, cooks a mean barbecue and loves to explore Sydney.



Building Understanding Out of Incomplete and Biased Datasets using Machine Learning and DatabricksSummit 2020

At Atlassian, product analytics exists to help our teams build better products by capturing and describing in-product behaviour. Within our on-premise products, only a subset of customers choose to send us anonymised event data, meaning we have an incomplete and biased dataset. In this world, something as simple as 'what percentage of customers use feature X' then becomes a non-trivial estimation task. This world becomes further complex when a metric is subadditive, such as estimating distinct users of a product feature, where one user using the feature on multiple (and possibly unknown) instances should be counted as only one user and our methodology needs to account for this. In this talk, we'll dive into our estimation methods and adjustments we make for various metrics, providing an accessible guide to operating in this environment. We'll also discuss how we democratixed these estimation methods, allowing any stakeholder who can write a query to immediately access our models and create accurate and consistent estimates.