Aditya Parameswaran

I am an assistant professor of Computer Science at the University of Illinois (UIUC) . My research interests are broadly in simplifying and improving data analytics, i.e., helping users make better use of their data.

My work involves building real data analytics systems with principled foundations, designing algorithms (with formal guarantees) for the systems, as well as mining data obtained from such systems.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He spent the 2013-14 year visiting MIT CSAIL and Microsoft Research New England, after completing his Ph.D. from Stanford University, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems.

Aditya is a recipient of the Arthur Samuel award for the best dissertation in CS at Stanford (2014), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner up (2014), a Google Faculty Research Award (2015), the Key Scientific Challenges Award from Yahoo! Research (2010), three best-of-conference citations (VLDB 2010, KDD 2012 and ICDE 2014), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007). His research group is supported with funding from by the NIH, the NSF, and Google.


  • October 1, 2015: We just heard word that NIH has funded our BD2K commons supplement. Looking forward to working with folks at UChicago to improve data publication workflows!
  • September 15, 2015: Student awesomeness: my student Silu Huang won the 3M foundation fellowship, while Tarique Siddiqui won the Siebel Foundation fellowship.
  • September 1, 2015: Thanks to the NSF, we now have funding to support research and development on DataHub via a Medium IIS grant with MIT and UMD! Link to the project page here.
  • August 1, 2015: The full SeeDB paper has been accepted at VLDB 2016 in India!
  • July 1, 2015: Our JellyBean paper on using humans to count objects in images will appear at HCOMP 2015!
  • June 9, 2015: Release of a new preprint on calibrating the output of confidence estimates from classification algorithms, using classical learning theory tools. This is work driven by my awesome student Yihan Gao.
  • June 6, 2015: Our DataHub query language proposal was accepted at TaPP, a focused provenance workshop.
  • June 1, 2015: Final tally for VLDB 2015 -- three papers and three demos on a variety of topics:
    • papers: crowds, visualizations, and versioning;
    • demos: data exploration, Excel-meets-databases, and collaborative data analytics.
  • May 27, 2015: Our paper on versioning principles was accepted at VLDB'15 without any revisions!
  • May 15, 2015: Undergraduate research news: Andrew Kuznetsov, a freshman working in our group won the ISUR undergraduate research prize, and Andrew with two other freshmen -- Andrew Thieck and Radhir Kothuri won the third prize in the Illinois Engineering Open House competition for their crowdsourcing tool.
  • May 12, 2015: Our paper on debiasing was accepted at KDD 2015!
  • April 9, 2015: Our first release of a new project, titled Data-Spread, with my esteemed colleague Kevin Chang and student Mangesh Bendre. Data-Spread is a tool that unifies databases and spreadsheets. You have to see it to believe it!
    • Here is a YouTube video showing Data-Spread in action.
    • Here is our demo paper on Data-Spread.
  • March 10, 2015: Four more new preprints in the last month! These were:
    • our paper on SeeDB for query driven automatic visualization generation;
    • our jellybean paper on counting objects in images; turns out we can do way better than humans or computer vision algorithms!
    • our paper on debiasing of batches; crowdsourcing practitioners often use batching to save costs, but this can lead to non-independence: we deal with this issue.
    • our versioning theory paper; to build a solid foundation for our DataHub project, we explored how to trade off storage and retrieval costs.
  • February 9, 2015: Our paper on exploiting correlations to avoid expensive predicate evaluations was accepted at SIGMOD 2015!
  • February 12, 2015: Many thanks to Google for their support via a Google Faculty Research Award! Excited to be building the next generation visualization toolkit.
  • December 10, 2014: Three new preprints in the last month! These were:
    • smart drill-down, our tool for zooming into portions of a dataset quickly;
    • our paper on globally optimal crowdsourcing quality management; and
    • our paper on gathering data using the crowd, exploiting a hierarchy and MABs.
  • November 10, 2014: Three new paper acceptances in the last month!
  • October 10, 2014: Thrilled to be a part of the new NIH BD2K (Big Data 2 Knowledge) center for revolutionizing genomic data analysis. Thank you, NIH, for the support!
  • September 2, 2014: We can finally talk about our exciting new project, titled Datahub (i.e., GitHub for Data) on collaborative data science and version management. The ambitious goal is to eliminate the pain-points of data book-keeping while doing collaborative data science.
  • September 1, 2014: Our paper on pricing for crowdsourcing tasks has been accepted for presentation at VLDB 2015! The paper studies a simple, but important problem: if you have a batch of tasks and a deadline, how should you vary price to meet the deadline?
  • August 25, 2014: Pleasantly surprised to be selected as the KDD dissertation award runner-up, having already been given the SIGMOD dissertation award! Feel truly lucky to have two communities - SIGMOD and KDD - supporting my work!
  • August 24, 2014: Had a blast being a keynote speaker at KDD IDEA 2014 - a big thank you to the organizers for inviting me! If this year was any indication, IDEA is going to flourish as a workshop for many years!
  • August 20, 2014: Our paper on optimally learning maximum-likelihood worker accuracies has been accepted as a work-in-progress paper for HCOMP 2014! The paper tackles the problem of worker quality estimation in a way EM-based algorithms cannot - by providing optimality guarantees.
  • August 15, 2014: Started at Illinois; exciting times ahead!

Synergistic Activities

I am currently serving on or have served on the Program Committees of: VLDB 2013-14-15, KDD 2015, SIGMOD 2014-15, WSDM 2015, WWW 2014, SOCC 2014, HCOMP 2014, ICDE 2014, and EDBT 2014.

Visual Analytics

Automatically recommending visualizations or visual summaries on very large volumes of data

View details »

Interactive Analytics

Interactive querying of large datasets, keeping track of versions, while possibly sacrificing slightly on accuracy of query results

View details »

Crowd-Powered Analytics

Using crowdsourcing to process and make sense of large volumes of data

View details »

Information Extraction

Extracting information from the web, integrating it with existing information, and surfacing this information to users

View details »

Recommendation Systems

Building scalable recommendation systems that take into account contextual information

View details »

Recent Releases

Selected Projects



zenvisage is a tool for effortlessly visualizing insights from very large data sets. It automates finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.



DataSpread is a tool that marries the best of databases and spreadsheets.


DataHub: Collaborative Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets.


DataSift: A Crowd-Powered Search Engine

DataSift is a crowd-powered search engine that is useful for long or complex queries that traditional search engines have trouble with, or with queries that contain rich media, such as images or videos.


Crowd Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error.


NeedleTail: A System for Browsing

NeedleTail is a system tuned towards instantly returning a small number (a "screenful") of query results very quickly on extremely large datasets.