Skip to yearly menu bar Skip to main content


( events)   Timezone:  
The 2021 schedule is still incomplete
Workshop
Fri Jul 23 08:00 AM -- 05:20 PM (PDT)
Machine Learning for Data: Automated Creation, Privacy, Bias
Zhiting Hu · Li Erran Li · Willie Neiswanger · Benedikt Boecking · Yi Xu · Belinda Zeng





Workshop Home Page

As the use of machine learning (ML) becomes ubiquitous, there is a growing understanding and appreciation for the role that data plays for building successful ML solutions. Classical ML research has been primarily focused on learning algorithms and their guarantees. Recent progress has shown that data is playing an increasingly central role in creating ML solutions, such as the massive text data used for training powerful language models, (semi-)automatic engineering of weak supervision data that enables applications in few-labels settings, and various data augmentation and manipulation techniques that lead to performance boosts on many real world tasks. On the other hand, data is one of the main sources of security, privacy, and bias issues in deploying ML solutions in the real world. This workshop will focus on the new perspective of machine learning for data --- specifically how ML techniques can be used to facilitate and automate a range of data operations (e.g. ML-assisted labeling, synthesis, selection, augmentation), and the associated challenges of quality, security, privacy and fairness for which ML techniques can also enable solutions.

Opening Remarks (opening)
Invited Talk: David Alvarez-Melis. Comparing, Transforming, and Optimizing Datasets with Optimal Transport. (Invited Talk)
Invited Talk: Lora Aroyo (Invited Talk)
Spotlight: SNoB: Social Norm Bias of “Fair” Algorithms (Spotlight)
Spotlight: CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training (Spotlight)
Invited Talk: Eric P. Xing. A Data-Centric View for Composable Natural Language Processing​. (Invited Talk)
Invited Talk: Kamalika Chaudhuri (Invited Talk)
Poster Session
Invited Talk: Hoifung Poon. Task-Specific Self-Supervised Learning for Precision Medicine. (Invited Talk)
Invited Talk: Dawn Song. Towards building a responsible data economy. (Invited Talk)
Spotlight: An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises (Spotlight)
Invited Talk: Alex Ratner. Programmatic weak supervision for data-centric AI. (Invited Talk)
Invited Talk: Kumar Chellapilla. Machine Learning with Humans-in-the-loop (HITL) (Invited Talk)
Panel Discussion with Hoifung Poon, Kamalika Chaudhuri, Paroma Varma, and Kumar Chellapilla (panel Discussion)
AutoMixup: Learning mix-up policies with Reinforcement Learning (Poster)
BRR: Preserving Privacy of Text Data Efficiently on Device (Poster)
Measuring Fairness in Generative Models (Poster)
CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training (Poster)
Towards Principled Disentanglement for Domain Generalization (Poster)
MetaDataset: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts (Poster)
Deep Causal Inequalities: Demand Estimation in Differentiated Products Markets (Poster)
Benchmarking Differential Privacy and Federated Learning for BERT Models (Poster)
Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods (Poster)
An Efficient DP-SGD Mechanism for Large Scale NLP Models (Poster)
DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy? (Poster)
Data Considerations in Graph Representation Learning for Supply Chain Networks (Poster)
Bayesian Regression from Multiple Sources of Weak Supervision (Poster)
A Standardized Data Collection Toolkit for Model Benchmarking (Poster)
SNoB: Social Norm Bias of “Fair” Algorithms (Poster)
Deep AutoAugment (Poster)
An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises (Poster)
Adversarial Stacked Auto-Encoders for Fair Representation Learning (Poster)
Model Mis-specification and Algorithmic Bias (Poster)
Regularization and False Alarms Quantification: Towards an Approach to Assess the Economic Value of Machine Learning (Poster)