In partnership with Tamr – Machine learning (ML) is currently a much-discussed topic, and in many instances the enthusiasm for the subject exceeds its practical implementations. Since Tamr began as a research project at MIT CSAIL (the Computer Science & Artificial Intelligence Laboratory) and other institutions in 2012, our goal has been to apply ML to the challenge of unifying data at scale to address what we believe is a pervasive problem for large enterprises that has not been effectively addressed by mainstream data integration tools.
We begin in Section 2 with a brief introduction to ML to get everybody on the same page. We call it “ML for dummies”. Then, in Section 3 we describe why ML is of interest in data analysis in general. Section 4 turns to the places in the Tamr product where ML is used. Finally, in Section 5 we discuss issues of Tamr scalability in an ML context.