Workshop
Machine Learning for Data: Automated Creation, Privacy, Bias
Zhiting Hu · Li Erran Li · Willie Neiswanger · Benedikt Boecking · Yi Xu · Belinda Zeng
Fri 23 Jul, 8 a.m. PDT
As the use of machine learning (ML) becomes ubiquitous, there is a growing understanding and appreciation for the role that data plays for building successful ML solutions. Classical ML research has been primarily focused on learning algorithms and their guarantees. Recent progress has shown that data is playing an increasingly central role in creating ML solutions, such as the massive text data used for training powerful language models, (semi-)automatic engineering of weak supervision data that enables applications in few-labels settings, and various data augmentation and manipulation techniques that lead to performance boosts on many real world tasks. On the other hand, data is one of the main sources of security, privacy, and bias issues in deploying ML solutions in the real world. This workshop will focus on the new perspective of machine learning for data --- specifically how ML techniques can be used to facilitate and automate a range of data operations (e.g. ML-assisted labeling, synthesis, selection, augmentation), and the associated challenges of quality, security, privacy and fairness for which ML techniques can also enable solutions.
Schedule
Fri 8:00 a.m. - 8:10 a.m.
|
Opening Remarks
(
opening
)
>
SlidesLive Video |
🔗 |
Fri 8:10 a.m. - 8:50 a.m.
|
Invited Talk: David Alvarez-Melis. Comparing, Transforming, and Optimizing Datasets with Optimal Transport.
(
Invited Talk
)
>
SlidesLive Video |
David Alvarez-Melis 🔗 |
Fri 8:50 a.m. - 9:30 a.m.
|
Invited Talk: Lora Aroyo
(
Invited Talk
)
>
SlidesLive Video |
Lora Aroyo 🔗 |
Fri 9:30 a.m. - 9:45 a.m.
|
Spotlight: SNoB: Social Norm Bias of “Fair” Algorithms
(
Spotlight
)
>
SlidesLive Video |
Myra Cheng 🔗 |
Fri 9:45 a.m. - 10:00 a.m.
|
Spotlight: CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training
(
Spotlight
)
>
SlidesLive Video |
Hari Prasanna Das 🔗 |
Fri 10:20 a.m. - 11:00 a.m.
|
Invited Talk: Eric P. Xing. A Data-Centric View for Composable Natural Language Processing.
(
Invited Talk
)
>
SlidesLive Video |
Eric Xing 🔗 |
Fri 11:00 a.m. - 11:40 a.m.
|
Invited Talk: Kamalika Chaudhuri
(
Invited Talk
)
>
SlidesLive Video |
Kamalika Chaudhuri 🔗 |
Fri 11:40 a.m. - 12:30 p.m.
|
Poster Session ( Poster Session ) > link | 🔗 |
Fri 1:30 p.m. - 2:10 p.m.
|
Invited Talk: Hoifung Poon. Task-Specific Self-Supervised Learning for Precision Medicine.
(
Invited Talk
)
>
SlidesLive Video |
Hoifung Poon 🔗 |
Fri 2:10 p.m. - 2:50 p.m.
|
Invited Talk: Dawn Song. Towards building a responsible data economy.
(
Invited Talk
)
>
SlidesLive Video |
Dawn Song 🔗 |
Fri 2:50 p.m. - 3:05 p.m.
|
Spotlight: An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises
(
Spotlight
)
>
SlidesLive Video |
Mayana Wanderley Pereira 🔗 |
Fri 3:20 p.m. - 4:00 p.m.
|
Invited Talk: Alex Ratner. Programmatic weak supervision for data-centric AI.
(
Invited Talk
)
>
SlidesLive Video |
Alex Ratner 🔗 |
Fri 4:00 p.m. - 4:40 p.m.
|
Invited Talk: Kumar Chellapilla. Machine Learning with Humans-in-the-loop (HITL)
(
Invited Talk
)
>
SlidesLive Video |
Kumar Chellapilla 🔗 |
Fri 4:40 p.m. - 5:20 p.m.
|
Panel Discussion with Hoifung Poon, Kamalika Chaudhuri, Paroma Varma, and Kumar Chellapilla
(
panel Discussion
)
>
SlidesLive Video |
🔗 |
-
|
MetaDataset: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts
(
Poster
)
>
|
Weixin Liang · James Zou · Weixin Liang 🔗 |
-
|
Towards Principled Disentanglement for Domain Generalization
(
Poster
)
>
|
Hanlin Zhang · Yi-Fan Zhang · Weiyang Liu · Adrian Weller · Bernhard Schölkopf · Eric Xing 🔗 |
-
|
An Efficient DP-SGD Mechanism for Large Scale NLP Models
(
Poster
)
>
|
Christophe Dupuy · Radhika Arava · Rahul Gupta · Anna Rumshisky 🔗 |
-
|
CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training
(
Poster
)
>
|
Hari Prasanna Das · Ryan Tran · Japjot Singh · Yu Wen Lin · Costas J. Spanos 🔗 |
-
|
Measuring Fairness in Generative Models
(
Poster
)
>
|
Christopher Teo · Ngai-Man Cheung 🔗 |
-
|
BRR: Preserving Privacy of Text Data Efficiently on Device
(
Poster
)
>
|
Ricardo Silva Carvalho · Theodore Vasiloudis · Oluwaseyi Feyisetan 🔗 |
-
|
AutoMixup: Learning mix-up policies with Reinforcement Learning
(
Poster
)
>
|
Long Luu · Zeyi Huang · Haohan Wang 🔗 |
-
|
Deep Causal Inequalities: Demand Estimation in Differentiated Products Markets
(
Poster
)
>
|
Edvard Bakhitov · Amandeep Singh · Jiding Zhang 🔗 |
-
|
Regularization and False Alarms Quantification: Towards an Approach to Assess the Economic Value of Machine Learning
(
Poster
)
>
|
Nima Safaei · Pooria Assadi 🔗 |
-
|
Model Mis-specification and Algorithmic Bias
(
Poster
)
>
|
Yangfan Liang · Peter Zhang 🔗 |
-
|
Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods
(
Poster
)
>
|
Terrance Liu · Giuseppe Vietri · Steven Wu 🔗 |
-
|
Adversarial Stacked Auto-Encoders for Fair Representation Learning
(
Poster
)
>
|
Patrik Joslin Kenfack · Adil Khan · Rasheed Hussain 🔗 |
-
|
An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises
(
Poster
)
>
|
Mayana Wanderley Pereira · Rahul Dodhia · Juan Lavista Ferres 🔗 |
-
|
Deep AutoAugment
(
Poster
)
>
|
Yu Zheng · Zhi Zhang · Shen Yan · Mi Zhang 🔗 |
-
|
SNoB: Social Norm Bias of “Fair” Algorithms
(
Poster
)
>
|
Myra Cheng · Maria De-Arteaga · Lester Mackey · Adam Tauman Kalai 🔗 |
-
|
A Standardized Data Collection Toolkit for Model Benchmarking
(
Poster
)
>
|
Avanika Narayan · Piero Molino · Karan Goel · Christopher Re 🔗 |
-
|
Bayesian Regression from Multiple Sources of Weak Supervision
(
Poster
)
>
|
Putra Manggala · Holger Hoos · Eric Nalisnick · Putra Manggala 🔗 |
-
|
Data Considerations in Graph Representation Learning for Supply Chain Networks
(
Poster
)
>
|
Edward Kosasih · Ryan-Rhys Griffiths · Alexandra Brintrup · Ajmal Aziz 🔗 |
-
|
DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy?
(
Poster
)
>
|
Archit Uniyal · Rakshit Naidu · Sasikanth Kotti · Patrik Joslin Kenfack · Sahib Singh · FatemehSadat Mireshghallah 🔗 |
-
|
Benchmarking Differential Privacy and Federated Learning for BERT Models
(
Poster
)
>
|
Priyam Basu · Rakshit Naidu · Zumrut Muftuoglu · Sahib Singh · FatemehSadat Mireshghallah 🔗 |