Maureen is Chief Data Scientist at Reonomy, a property intelligence company which is transforming the world’s largest asset class: commercial real estate. For about 20 years, Maureen has run algorithms and simulations on terabytes of data including: location, click, image, streaming and public data. Maureen drove technological advancements resulting in 500% year over year BtoB contract growth at Enigma, a data-as-a-service company, delivered models anticipating human behavior at Axon Vibe, and researched interactions between Dark Matter and Baryons at Rutgers. Maureen’s Ph.D. is in Computational Astrophysics from Columbia University on simulating the cosmological evolution of galaxies.
Our machine learning algorithms are the heart of our ability to deliver products at Reonomy. Our unique data asset is a knowledge graph that connects information for all commercial properties in the United States, to the companies and people that own and work in those properties. This graph is built with models that perform the entity resolution that defines the vertex types, and the attributes on the vertices, as well as create multiple edge types in the graph. Other similar data assets focus on a significantly smaller subset of properties and/or are manually constructed. The volume of the data, as well as the required quality of the connections, restricted us to the best-in-class tools, computational power, and technical stack. It also provides us an exciting opportunity to build something that has not yet become widespread enough, for there to be well known formulas for how to build the data asset and construct deliverables. Having your models define the shape of the data asset that is used to build all the products for the company, makes every choice critical, especially when you are a growing startup supporting about a dozen different models. I'll walk through examples of critical code design choices, cluster configuration choices, and choices on algorithms that were necessary to successfully build the graph components. You'll walk away with key points to consider when implementing production quality models which are embedded in high volume data pipelines, as well as a logical framework for building knowledge graphs that are able to, e.g. support a diverse set of property intelligence products.