Diving into Robocall Content with SNORCall

Sathvik Prasad, Trevor Dunlap, Alexander Ross, and Bradley Reaves

Proceedings of the USENIX Security Symposium, 2023

Applies weak-supervision labeling to 232,000 robocall transcripts, producing the first large-scale estimates of robocall scam prevalence and campaign infrastructure.

Abstract

Unsolicited bulk telephone calls — termed “robocalls” — nearly outnumber legitimate calls, overwhelming telephone users. While the vast majority of these calls are illegal, they are also ephemeral. Although telephone service providers, regulators, and researchers have ready access to call metadata, they do not have tools to investigate call content at the vast scale required. This paper presents SnorCall, a framework that scalably and efficiently extracts content from robocalls. SnorCall leverages the Snorkel framework that allows a domain expert to write simple labeling functions to classify text with high accuracy. We apply SnorCall to a corpus of transcripts covering 232,723 robocalls collected over a 23-month period. Among many other findings, SnorCall enables us to obtain first estimates on how prevalent different scam and legitimate robocall topics are, determine which organizations are referenced in these calls, estimate the average amounts solicited in scam calls, identify shared infrastructure between campaigns, and monitor the rise and fall of election-related political calls. As a result, we demonstrate how regulators, carriers, anti-robocall product vendors, and researchers can use SnorCall to obtain powerful and accurate analyses of robocall content and trends that can lead to better defenses.

Citation (IEEE)

S. Prasad, T. Dunlap, A. Ross, and B. Reaves, “Diving into Robocall Content with SNORCall,” in Proceedings of the USENIX Security Symposium, 2023.

BibTeX

@inproceedings{pdrr23,
  abstract = {Unsolicited bulk telephone calls --- termed "robocalls" --- nearly outnumber legitimate calls, overwhelming telephone users. While the vast majority of these calls are illegal, they are also ephemeral. Although telephone service providers, regulators, and researchers have ready access to call metadata, they do not have tools to investigate call content at the vast scale required. This paper presents SnorCall, a framework that scalably and efficiently extracts content from robocalls. SnorCall leverages the Snorkel framework that allows a domain expert to write simple labeling functions to classify text with high accuracy. We apply SnorCall to a corpus of transcripts covering 232,723 robocalls collected over a 23-month period. Among many other findings, SnorCall enables us to obtain first estimates on how prevalent different scam and legitimate robocall topics are, determine which organizations are referenced in these calls, estimate the average amounts solicited in scam calls, identify shared infrastructure between campaigns, and monitor the rise and fall of election-related political calls. As a result, we demonstrate how regulators, carriers, anti-robocall product vendors, and researchers can use SnorCall to obtain powerful and accurate analyses of robocall content and trends that can lead to better defenses.},
  author = {{Sathvik Prasad} and {Trevor Dunlap} and {Alexander Ross} and {Bradley Reaves}},
  booktitle = {Proceedings of the {USENIX} Security Symposium},
  date = {2023-08-09},
  title = {Diving into Robocall Content with {SNORCall}},
}