Podcast White Paper

By Jeff Kent, May 2016


Who is listening to your programs?
Are advertisers potentially buying server pings, database crawls and fractured downloads.This is a simple question, but a difficult problem to solve for the radio broadcast industry. Counting downloads, counting listeners, how long a program has been listened to is somewhat subjective in the industry today.

The traditional intelligence/analytic tools are cumbersome to use.  It takes an inordinate amount of time to configure, collect and extract data from multiple and disparate sources, making error resolution or meaningful analytics difficult.

Broadcasters need a simple way to collect digital intelligence such as
o    Top programs
o    Number of impressions
o    How long are the users connect
o    Number of unique listeners using specific audio/video resources
o    MB/second
o    Traffic/hour
o    Bytes per URL
o    Bandwidth consumed
o    And more

In today’s environment, organizations understand the need to provide better user experiences by analyzing interactions and engagement across multiple channels.  However, using traditional tools makes it almost impossible to access ALL data in real-time, and as new platforms are constantly implemented, these challenges are compounded.  

This paper outlines how American Public Media/Minnesota Public Media and NPR were able to improve decision making after the implementation of a big data solution customized to the digital media industry requirements.   APM and NPR are now able to access information, accurately and in real time, whereby they can make better decisions for programming, underwriting, and business intelligence.


American Public Media (APM) is the largest station-based public radio organization in the U.S., combining multi-regional station operations, national content creation and distribution in one organization. The station operations include 49 public radio stations and 41 translators in the Upper Midwest and California.  APM’s portfolio includes more than 20 nationally distributed programs are such public radio staples as the BBC World Service, A Prairie Home Companion®, Marketplace®, The Splendid Table®, Performance Today®, On Being™ and special reports produced by our national documentary unit, American RadioWorks.

Until recently, Podcast producers have had to trust the download metrics provided to them from 3rd party companies – CDNs, Ad servers, Podtracker,  Itunes, Stitcher, Iheart etc.  It’s clear that those numbers don’t always make sense.   Some show inflated download counts and others feel oppressively low. So who do you trust and how are those numbers derived?
In most cases the producer sees a report or a dashboard generated from a ‘black box’ closed system and it’ assumed that ‘a download is a download’.  In reality, downloads are a complicated and messy beast.  Third party systems are constantly crawling servers and creating false impressions that need to be filtered.  A single file download can start and stop numerous times resulting in a fractured and amplified download count. Also, a podcast can now be downloaded or streamed and the events generated create unique and very different data events. 

Why is it important to have accurate listener metrics for Podcast, download, streaming or mobile apps?

Advertisers have to trust the data handed to them by producers and digital sales teams when buying podcast ad impressions. 50,000 impressions should equal 50,000 downloads.  Unfortunately for the advertisers, they are potentially buying server pings, database crawls, and fractured downloads.  By producing a unified download count standard we can provide advertisers a standard that they can use as a trusted, validated number to guarantee that they are actually buying a downloaded file.  They could then use this standard to hold other podcast producers accountable and start to level the playing field.                                                  


In evaluating the different options to solve this issue, there are several alternatives.  Each with their own advantages and disadvantages.  The following is a comparison between Open Source, Commercial off-the-shelf, and Big Data.


Open source

Open source software is generally free software that is available for anyone to use. Open source developers allow the source code of their software to be publicly available for anyone.  Other developers are free and welcome to add to the code. 


What are the advantages of open source software?

  • The software license is generally free.
  • The software is continually evolving in real time as developers add to it and modify it.
  • In general, using open source software also means you are not locked in to using a particular vendor’s system.
  • You can modify and adapt open source software for your own business requirements.

Any disadvantages?

  • Open Source is developed by a community, there is no requirement to create a commercial product that will sell and generate money, open source software can tend to evolve more in line with developers’ wishes than the needs of the end user.
  • Most of the Open Source community are highly technical individuals and the User Interface is less “user-friendly” and not as easy to use because less attention is paid to developing the user interface.
  • There is less support available for technical and compatibility issues since the community of users is respond to and fix problems.  The community is not compensated for support issues.
  • The software may be free but there are hidden costs such as additional hardware requirements and support costs.
  • Although having an open system means that there are many people identifying bugs and fixing them, it also means that malicious users can potentially view it and exploit any vulnerabilities.

Most Open Source solutions need other tools in order for the solution to be fully functional such as agents (collectors or forwarders), configuration, incident and alerting tools.

The storage requirement is 600% of the original file size for Open Source while the better COTS solutions are 50% of the original file size.  This correlates to infrastructure systems requiring 500% additional capabilities and more expensive servers and storage.  
See Figure 1 comparing hardware requirements of Open Source vs. Commercial off-the-shelf.

Commercial-off-the-shelf (COTS)  

There are several commercial-off-the-shelf packages available that provide a complete solution.  

What are the advantages of COTS software?

World-wide trends indicate a swift increase in preference to Commercial-off-the-shelf (COTS) software in comparison to Custom Made software solutions. This trend is as a result of the following advantages of using Commercial-off-the-shelf (COTS) software: 

  • Time used to purchase software is much shorter than time spent developing the software.
  • Less resources in terms of human capital, office space and money are required when purchasing Commercial-off-the-shelf (COTS) software compared to development of customized software.
  • Commercial-off-the-shelf software have a greater chance of incorporating industry set standards.
  • Commercial-off-the-shelf software have less dependence on platform since they support components across different environments.
  • Reliability of the system is greater since commercial software is generally tested for a larger variety of use. 

Any disadvantages?

On the other hand the following drawbacks of using COTS would need to be considered: 

  • Vendors may cease their support or may go out of business
  • There may be a need for customizing the software to fully fit individual business functionalities and this may end up being expensive and time consuming
  • In cases where the software is licensed periodically or for a set number of users, COTS may be expensive in the long run

Several COTS products offer the collection of log and event data with pre-built correlation rules, alerts and reports that are easy to use.  The challenge with lower end solutions, the data sources may not index important sources such as Wowza, Icecast and Adswizz that are specific to the broadcasting industry.  The data must first be normalized before being analyzed by the tool. Normalization may lose data and structure the data in pre-conceived ways.  Reports are limited to the pre-set reports and there are no packages nor reports specific to the Broadcasting industry. 


Splunk is a Commercial-off-the-shelf tool that is configurable to index and provide reporting for unusual source types such as Wowza, Icecast, Adswiz or other data types used in broadcasting.  

What are the advantages of Splunk software?

  • All in one solution that offers a complete solution including the agent or forwarders, indexing, searching, reporting, alerting and executive dashboards.
  • More efficient in managing the data therefore the hardware costs are lower and performance is quicker.
  • Splunk Apps for quick time to value

Any disadvantages?

  • License costs
  • More configurable but requires additional knowledge or training to use the power

Major broadcasters such as NPR, American Public Media and NY Public Radio currently use Splunk for collecting Podcast and other listener metrics such as downloads, streaming and mobile app.  One of the main reasons is Splunk’s ability to index broadcasting metrics such as Wowza, Icecast, Adswizz, Triton and more.  Splunk also conforms to the latest Public Broadcasting Consortium’s Best Practices.

The price for Splunk is based on the number of Gigabytes indexed from the logfiles from either the CDN or in-house apps, the license is reasonable since the size of those files are under generally small and under a Gig. Where the price increases is when indexing security logfiles since those files generally can be much larger.

There is an out-of-the-box Splunk App specifically designed to collect and report on Podcast and Listener metrics.  This eliminates the additional knowledge and time required to produce reports and dashboards since the Splunk App is a complete turnkey solution and conforms to the Public Radio Consortium guidelines.

According to Alex Gitelzon, Systems Administrator at APM, Splunk helps his company by allowing it to know things like who’s getting what podcasts, where are they getting the podcasts from, which ads people like, who is listening, what state or region people are listening in from.  APM can then target people more effectively online.

“So we look at it and and know — this podcast was downloaded this many times by these people, this app downloaded half the podcast for some reason, this app downloaded the entire podcast,” said Gitelzon. This helps APM evaluate exactly how many downloads are actually happening and which apps are most successful. “Splunk makes it very easy to find and then do something about it,” noted GItelzon.

The Splunk dashboard can be customize to report on any type of metric such as: 


•    Streams played
•    Unique clients
•    Total hours streamed
•    Top 20 files
•    Stream metrics
•    Min/max/median/avg
•    URL’s streamed
•    MB/second
•    Traffic/hour
•    Bytes per URL
•    Number of impressions
•    Bandwidth consumed
•    And more





NPR Experience
Splunk enables NPR to accurately and quickly track and analyze important social media and audio/video downloads and streaming content. This new level of digital intelligence has allowed NPR, for the first time, to estimate how many unique listeners are using audio and video resources and which mobile platforms are most popular. 

NPR uses Splunk to analyze more than 50 million audio and video events each month and report on the results to C-level NPR management and member station managers.

Splunk helps NPR:

  • Determine the total number of “listeners” for online content, as well as totals for individual programs—showing where promotional campaigns are working and the relative success of new programming.
  • Identify the most popular mobile platforms (e.g., iPhone, iPad, Android), this enables limited development resources to be targeted for greatest benefit.
  • Track podcast downloads for 300+ partner stations to determine accurate apportionment of sponsorship funds.
  • Provide cost-effective and accurate song royalty tracking and calculation to accommodate downloads and streaming broadcasts—saves time of 3-4 staff and supports continued programming expansion.


Podcast producers and Broadcasters are now able to trust the download metrics from a tool like Splunk since the data is coming directly from the source and the data is in its raw format.  The reports and dashboards are accurate and no longer “black boxes” with data that could be potentially summarized or filtered.  A download or stream is counted as a download or stream and not corrupted by server pings, database crawls or factured downloads.  

Advertisers can be confident when buying podcast ad impressions, they are actual impressions.  

The additional benefits to the company include:

  • Increase productivity by eliminating manual cut and pasting reports to real time automated reports and dashboards.
  • Saving bandwidth costs by alerting and elimating rogue devices that are consuming expensive CDN services.  
  • Accelerate API performance and eliminate the need for costly hardware upgrades.
  • Optimize resource allocation by monitoring popular programming and devices.
  • Improve and simplify royalty accounting accuracy and flexibility.
  • Combine multiple sources such as Icecast, Wowza, Triton and Adswizz to a single report and dashboard to enhance usefulness and impact.
  • Return on Investment (ROI) in less than 30 days
  • The reports are accurate and conforms to the Public Radio Consortium Best Practices Guidelines.
  • Splunk is a Big Data platform therefore the investment can also be used for security, IT operations, Internet of Things, marketing and business analytics.