Samantha is an Associate Data Science Manager, based in Chicago. Since joining Nielsen in 2013, she has worked on developing statistical methods for local and national television products. Samantha is currently focused on integrating return path data (RPD) from cable and satellite providers into the Nielsen Local TV ratings, an initiative that is accelerating exciting changes within the company.
Big data, machine learning, the cloud – oh, my! These are buzzwords that data nerds like me are really excited about, but also a little tired of hearing so often. Any good data scientist can tell you that big data doesn’t necessarily mean better data; machine learning isn’t always the best approach to solve a problem; and the cloud won’t overcome all obstacles. But when these three concepts converge in the right way, they can provide a very powerful way to solve key business challenges. That’s exactly the case for return path data (RPD), an initiative I’m working on for Nielsen’s Local TV audience measurement business.
RPD is TV tuning data that comes from from set-top boxes in cable and satellite subscribers’ homes. It includes which programs subscribers watch, when they watch them and where the subscriber households are located. RPD can consist of a digital cable or satellite provider’s entire subscriber base, so it’s a much larger data source than Nielsen’s TV panel. That doesn’t mean, however, that the panel is any less important, in fact it has become more important than ever. The Nielsen TV panel is geographically and demographically representative of the U.S. and provides an important truth source for calibrating RPD. The panel data remains the crucial foundation of Nielsen’s TV ratings, but the large volume of RPD data complements it. While integrating RPD data with our TV panel data has many advantages for our clients, it also produces some interesting challenges.
Nielsen recently announced that RPD will replace paper TV diaries and supplement our metered TV panel for local audience measurement. This is a big, yet beneficial change for everyone, especially our clients. Incorporating RPD increases the stability of the TV ratings and reduces the zero ratings that occur. RPD will also allow us to deliver continuous electronic measurement in diary markets, which are currently only measured a few times per year. Including RPD in the local TV ratings gives Nielsen a competitive advantage in the marketplace because we’re able to combine our gold standard measurement panel with big data and reap the benefits of using both.
Of course, any good project will be challenging, but this is where my team experiences the confluence of big data, machine learning, and the cloud. RPD is messy, so thoroughly exploring and “cleaning” it is crucial. The data is also incomplete, so my team needs to use a variety of statistical and machine learning techniques to fill in different pieces of information. The data is also huge, so we’ve moved to distributed computing on the cloud to make all of this possible.
The RPD initiative has accelerated changes in our team’s technology, skills and work culture, which has been incredibly motivating. Not only are we using the cloud, but we’re also adopting more open source tools that are used more widely in the data science industry. We now have larger variety of tools and techniques at our disposal, which are essential to rapidly developing and deploying solutions. Naturally, this transformation is expanding our technical and quantitative skills. This work has also promoted greater collaboration both within my team and within other departments. We’ve become more agile in responding to changes.
Building a big solution with many moving parts means we can’t work in isolation, and we have to be flexible. It’s been inspiring to see the dedication of my teammates and colleagues within the department as they rise to overcome these challenges. I’m proud to say that my team has been the “guinea pigs” at the forefront of many of these efforts.