Workshop
DMLR Workshop: Data-centric Machine Learning Research
Ce Zhang · Praveen Paritosh · Newsha Ardalani · Nezihe Merve Gürel · William Gaviria Rojas · Yang Liu · Rotem Dror · Manil Maskey · Lilith Bat-Leah · Tzu-Sheng Kuo · Luis Oala · Max Bartolo · Ludwig Schmidt · Alicia Parrish · Daniel Kondermann · Najoung Kim
Ballroom C
Sat 29 Jul, noon PDT
This is the third edition of highly successful workshops focused on data-centric AI, following the success of the Data-Centric AI workshop at NeurIPS 2021 and DataPerf workshop at ICML 2022. Data, and operations over data (e.g., cleaning, debugging, curation) have been continually fueling the success of machine learning for decades. While historically the ML community has focused primarily on model development, recently the importance of data quality has attracted intensive interest from the community, including the creation of the NeurIPS dataset and benchmark track, several data-centric AI benchmarks (e.g., DataPerf), and the flourishing of data consortiums such as LAION, the community’s attention has been directed to the quality of data used for ML training and evaluation. The goal of this workshop is to facilitate these important topics in what we call Data-centric Machine Learning Research, which includes not only datasets and benchmarks, but tooling and governance, as well as fundamental research on topics such as data quality and data acquisition for dataset creation and optimization.
Schedule
Sat 12:00 p.m. - 12:05 p.m.
|
Introduction and Opening
(
Opening Remarks
)
>
SlidesLive Video |
Praveen Paritosh 🔗 |
Sat 12:05 p.m. - 12:40 p.m.
|
Keynote 1: Andrew Ng (Landing AI)
(
Keynote
)
>
SlidesLive Video |
Andrew Ng 🔗 |
Sat 12:40 p.m. - 1:10 p.m.
|
Data-centric Ecosystem: Croissant and Dataperf - Peter Mattson (Google & MLCommons)
(
Talk
)
>
SlidesLive Video |
Peter Mattson · Praveen Paritosh 🔗 |
Sat 1:10 p.m. - 1:25 p.m.
|
Coffee break / networking break link | 🔗 |
Sat 1:25 p.m. - 2:00 p.m.
|
Keynote 2: Mihaela van der Schaar (University of Cambridge) - Reality-Centric AI
(
Keynote
)
>
SlidesLive Video |
Mihaela van der Schaar 🔗 |
Sat 2:00 p.m. - 2:30 p.m.
|
Invited Talk 2: Olga Russakovsky (Princeton University)
(
Talk
)
>
SlidesLive Video |
Olga Russakovsky · Vikram V Ramaswamy 🔗 |
Sat 2:30 p.m. - 3:00 p.m.
|
Invited Talk 3: Masashi Sugiyama (RIKEN & UTokyo) - Data distribution shift
(
Talk
)
>
SlidesLive Video |
Masashi Sugiyama 🔗 |
Sat 3:00 p.m. - 4:00 p.m.
|
Lunch Break / networking break link | 🔗 |
Sat 4:00 p.m. - 4:35 p.m.
|
Keynote 3: Isabelle Guyon (Google Brain) - Towards Data-Centric AutoML
(
Keynote
)
>
link
SlidesLive Video |
Isabelle Guyon 🔗 |
Sat 4:35 p.m. - 5:05 p.m.
|
Invited Talk 1: Dina Machuve (DevData Analytics) - Data for Agriculture
(
Talk
)
>
SlidesLive Video |
Dina Machuve 🔗 |
Sat 5:05 p.m. - 5:20 p.m.
|
Announcement and open discussion on DMLR (Selected members of DMLR Advisory Board)
(
Discussion Panel
)
>
SlidesLive Video |
Ce Zhang 🔗 |
Sat 5:20 p.m. - 6:15 p.m.
|
Panel Discussion
(
Discussion Panel
)
>
SlidesLive Video |
Megan Ansdell · Nathan Lambert · Ludwig Schmidt · Praveen Paritosh · Sang Michael Xie 🔗 |
Sat 6:15 p.m. - 6:30 p.m.
|
Coffee break / networking break link | 🔗 |
Sat 6:30 p.m. - 7:30 p.m.
|
Poster Session 1
(
Poster Session - In Person
)
>
|
🔗 |
Sat 7:30 p.m. - 8:00 p.m.
|
Poster Session 2 (Virtual) ( Poster Session - Virtual ) > link | 🔗 |
-
|
Training on Thin Air: Improve Image Classification with Generated Data
(
Poster
)
>
|
Yongchao Zhou · Hshmat Sahak · Jimmy Ba 🔗 |
-
|
DMOps: Data Management Operations and Recipes
(
Poster
)
>
|
Eujeong Choi · Chanjun Park 🔗 |
-
|
Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps)
(
Poster
)
>
|
Damrin Kim · NamHyeok Kim · Chanjun Park · Harksoo Kim 🔗 |
-
|
To Aggregate or Not? Learning with Separate Noisy Labels
(
Poster
)
>
|
Jiaheng Wei · Zhaowei Zhu · Tianyi Luo · Ehsan Amid · Abhishek Kumar · Yang Liu 🔗 |
-
|
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training
(
Poster
)
>
|
Jieyu Zhang · Bohan Wang · zhengyu hu · Pang We Koh · Alex Ratner 🔗 |
-
|
Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios
(
Poster
)
>
|
NamHyeok Kim · Chanjun Park 🔗 |
-
|
Algorithm Selection for Deep Active Learning with Imbalanced Datasets
(
Poster
)
>
|
Jifan Zhang · Shuai Shao · Saurabh Verma · Robert Nowak 🔗 |
-
|
How to Improve Imitation Learning Performance with Sub-optimal Supplementary Data?
(
Poster
)
>
|
Ziniu Li · Tian Xu · Zeyu Qin · Yang Yu · Zhiquan Luo 🔗 |
-
|
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
(
Poster
)
>
|
Sang Michael Xie · Hieu Pham · Xuanyi Dong · Nan Du · Hanxiao Liu · Yifeng Lu · Percy Liang · Quoc Le · Tengyu Ma · Adams Wei Yu 🔗 |
-
|
How to Cope with Gradual Data Drift?
(
Poster
)
>
|
Rasool Fakoor · Jonas Mueller · Zachary Lipton · Pratik Chaudhari · Alex Smola 🔗 |
-
|
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
(
Poster
)
>
|
Chanjun Park · Seonmin Koo · Seolhwa Lee · Jaehyung Seo · Sugyeong Eo · Hyeonseok Moon · HEUISEOK LIM 🔗 |
-
|
Programmable Synthetic Tabular Data Generation
(
Poster
)
>
|
Mark Vero · Mislav Balunovic · Martin Vechev 🔗 |
-
|
Unitail: A Benchmark for Detecting, Reading, and Matching in Retail Scene
(
Poster
)
>
|
Fangyi Chen · Han Zhang · Hao Chen · Kai Hu · Jiachen Dou · zaiwang li · Chenchen Zhu · Marios Savvides 🔗 |
-
|
Understanding Unfairness via Training Concept Influence
(
Poster
)
>
|
Yuanshun Yao · Yang Liu 🔗 |
-
|
Promises and Pitfalls of Threshold-based Auto-labeling
(
Poster
)
>
|
Harit Vishwakarma · Heguang Lin · Frederic Sala · Ramya Korlakai Vinayak 🔗 |
-
|
Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors
(
Poster
)
>
|
Jesse Cummings · Jonas Mueller · Elías Snorrason 🔗 |
-
|
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
(
Poster
)
>
|
Seungjun Lee · Hyeonseok Moon · Chanjun Park · HEUISEOK LIM 🔗 |
-
|
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
(
Poster
)
>
|
Jinyi Liu · Yi Ma · Jianye Hao · Yujing Hu · Yan Zheng · Tangjie Lv · Changjie Fan 🔗 |
-
|
CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training
(
Poster
)
>
|
A. Feder Cooper · Wentao Guo · Duc Khiem Pham · Tiancheng Yuan · Charlie Ruan · Yucheng Lu · Chris De Sa 🔗 |
-
|
Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion
(
Poster
)
>
|
Si Chen · Feiyang Kang · Nikhil Abhyankar · Ming Jin · Ruoxi Jia 🔗 |
-
|
Probing Heterogeneous Pretraining Datasets with Small Curated Datasets
(
Poster
)
>
|
Gregory Yauney · Emily Reif · David Mimno 🔗 |
-
|
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation
(
Poster
)
>
|
Joshua Vendrow · Saachi Jain · Logan Engstrom · Aleksander Madry 🔗 |
-
|
EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost
(
Poster
)
>
|
Jaeseung Heo · Seungbeom Lee · Sungsoo Ahn · Dongwoo Kim 🔗 |
-
|
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
(
Poster
)
>
|
Seonmin Koo · Chanjun Park · Jinsung Kim · Jaehyung Seo · Sugyeong Eo · Hyeonseok Moon · HEUISEOK LIM 🔗 |
-
|
Contrastive clustering of tabular data
(
Poster
)
>
|
Piotr Przemielewski · Witold Wydmański · Marek Śmieja 🔗 |
-
|
Investigating minimizing the training set fill distance in machine learning regression
(
Poster
)
>
|
Paolo Climaco · Jochen Garcke 🔗 |
-
|
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
(
Poster
)
>
|
Patrik Okanovic · Roger Waleffe · Vasileios Mageirakos · Konstantinos Nikolakakis · Amin Karbasi · Dionysios Kalogerias · Nezihe Merve Gürel · Theodoros Rekatsinas 🔗 |
-
|
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data
(
Poster
)
>
|
Nathan Vaska · Victoria Helus 🔗 |
-
|
Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks
(
Poster
)
>
|
Natalie Abreu · Nathan Vaska · Victoria Helus 🔗 |
-
|
Fair Machine Unlearning: Data Removal while Mitigating Disparities
(
Poster
)
>
|
Alex Oesterling · Jiaqi Ma · Flavio Calmon · Hima Lakkaraju 🔗 |
-
|
TMARS: Improving Visual Representations by Circumventing Text Feature Learning
(
Poster
)
>
|
Pratyush Maini · Sachin Goyal · Zachary Lipton · Zico Kolter · Aditi Raghunathan 🔗 |
-
|
Do Machine Learning Models Learn Statistical Rules Inferred from Data?
(
Poster
)
>
|
Aaditya Naik · Yinjun Wu · Mayur Naik · Eric Wong 🔗 |
-
|
Predicting Article Time Periods with Text2Time: A Transformer-based Approach
(
Poster
)
>
|
KARTHICK GUNASEKARAN 🔗 |
-
|
Knowledge Graph-Augmented Korean Generative Commonsense Reasoning
(
Poster
)
>
|
Dahyun Jung · Jaehyung Seo · Jaewook Lee · Chanjun Park · HEUISEOK LIM 🔗 |
-
|
Accelerating Batch Active Learning Using Continual Learning Techniques
(
Poster
)
>
|
Gantavya Bhatt · Arnav M Das · · Rui Yang · Vianne Gao · Jeff Bilmes 🔗 |
-
|
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
(
Poster
)
>
|
Liangchen Luo · Lei Shu · Jayakumar Hoskere · Yun Zhu · Canoee Liu · Simon Tong · Jindong Chen · Lei Meng 🔗 |
-
|
Data Similarity is Not Enough to Explain Language Model Performance
(
Poster
)
>
|
Gregory Yauney · Emily Reif · David Mimno 🔗 |
-
|
Enhancing Time Series Forecasting Models under Concept Drift by Data-centric Online Ensembling
(
Poster
)
>
|
Yi-Fan Zhang · Qingsong Wen · Xue Wang · Weiqi Chen · Liang Sun · Zhang Zhang · Liang Wang · Rong Jin · Tieniu Tan 🔗 |
-
|
A Privacy-Friendly Approach to Data Valuation
(
Poster
)
>
|
Jiachen Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal 🔗 |
-
|
Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources
(
Poster
)
>
|
Feiyang Kang · Hoang Anh Just · Anit Kumar Sahu · Ruoxi Jia 🔗 |
-
|
Improve Model Inference Cost with Image Gridding
(
Poster
)
>
|
Shreyas Krishnaswamy · Lisa Dunlap · Lingjiao Chen · Matei Zaharia · James Zou · Joseph Gonzalez 🔗 |
-
|
THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech
(
Poster
)
>
|
Saad Almohaimeed · Saleh Almohaimeed · Saleh Almohaimeed · Ashfaq Ali Shafin · Bogdan Carbunar · Ladislau Boloni 🔗 |
-
|
On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets
(
Poster
)
>
|
Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel 🔗 |
-
|
Partial Label Learning meets Active Learning: Enhancing Annotation Efficiency through Binary Questioning
(
Poster
)
>
|
Shivangana Rawat · Chaitanya Devaguptapu · Vineeth Balasubramanian 🔗 |
-
|
Towards an Efficient Algorithm for Time Series Forecasting with Anomalies
(
Poster
)
>
|
Hao Cheng · Qingsong Wen · Yang Liu · Liang Sun 🔗 |
-
|
Towards Declarative Systems for Data-Centric Machine Learning
(
Poster
)
>
|
Stefan Grafberger · Bojan Karlaš · Paul Groth · Sebastian Schelter 🔗 |
-
|
Data Banzhaf: A Robust Data Valuation Framework for Machine Learning
(
Poster
)
>
|
Jiachen Wang · Ruoxi Jia 🔗 |
-
|
No Imputation without Representation
(
Poster
)
>
|
Oliver Lenz · Daniel Peralta · 🔗 |
-
|
L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models
(
Poster
)
>
|
Aabha Pingle · Aditya Vyawahare · Isha Joshi · Rahul Tangsali · Raviraj Joshi 🔗 |
-
|
Point Cloud Classification with ModelNet40: What is left?
(
Poster
)
>
|
Jarne Van den Herrewegen · Tom Tourwé · Francis Wyffels 🔗 |
-
|
Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?
(
Poster
)
>
|
Megan Richards · Diane Bouchacourt · Mark Ibrahim · Polina Kirichenko 🔗 |
-
|
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
(
Poster
)
>
|
Julian Bitterwolf · Maximilian Müller · Matthias Hein 🔗 |
-
|
Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana
(
Poster
)
>
|
Darlington Akogo · Issah Samori · Cyril Akafia · Harriet Fiagbor · Andrews Kangah · Donald Donald · Kwabena Fuachie · Luis Oala 🔗 |
-
|
On the Reproducibility of Data Valuation under Learning Stochasticity
(
Poster
)
>
|
Jiachen Wang · Feiyang Kang · Chiyuan Zhang · Ruoxi Jia · Prateek Mittal 🔗 |
-
|
On the Usefulness of Synthetic Tabular Data Generation
(
Poster
)
>
|
Dionysis Manousakas · Sergul Aydore 🔗 |
-
|
Bayesian Optimisation Against Climate Change: Applications and Benchmarks
(
Poster
)
>
|
Sigrid Passano Hellan · Chris Lucas · Nigel Goddard 🔗 |
-
|
Suboptimal Data Can Bottleneck Scaling
(
Poster
)
>
|
Jacob Buckman · Kshitij Gupta · Ethan Caballero · Rishabh Agarwal · Marc Bellemare 🔗 |
-
|
Speech Wikimedia: A 77 Language Multilingual Speech Dataset
(
Poster
)
>
|
Rafael Mosquera Gómez · Julian Eusse · Juan Ciro · Daniel Galvez · Ryan Hileman · Kurt Bollacker · David Kanter 🔗 |
-
|
Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
(
Poster
)
>
|
Siddharth Joshi · Baharan Mirzasoleiman 🔗 |
-
|
Active learning for time instant classification
(
Poster
)
>
|
Nauman Ahad · Namrata Nadagouda · Eva Dyer · Mark Davenport 🔗 |
-
|
Prediction without Preclusion Recourse Verification with Reachable Sets
(
Poster
)
>
|
Avni Kothari · Berk Ustun · Lily Weng · Bogdan Kulynych 🔗 |
-
|
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
(
Poster
)
>
|
Mayee Chen · Nicholas Roberts · Kush Bhatia · Jue Wang · Ce Zhang · Frederic Sala · Christopher Ré 🔗 |
-
|
Birds of an Odd Feather: Guaranteed Out-of-Distribution (OOD) Novel Category Detection
(
Poster
)
>
|
Yoav Wald · Suchi Saria 🔗 |
-
|
Mobile Internet Quality Estimation using Self-Tuning Kernel Regression
(
Poster
)
>
|
Hanyang Jiang · Yao Xie · Ellen Zegura · Elizabeth Belding · Shaowu Yuchi 🔗 |
-
|
Estimating label quality and errors in semantic segmentation data via any model
(
Poster
)
>
|
Vedang Lad · Jonas Mueller 🔗 |
-
|
STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Maps
(
Poster
)
>
|
Ammar Sherif · Abubakar Abid · Mustafa Elattar · Mohamed ElHelw 🔗 |
-
|
Detecting Errors in Numerical Data via any Regression Model
(
Poster
)
>
|
Hang Zhou · Jonas Mueller · Mayank Kumar · Jane-Ling Wang · Jing Lei 🔗 |
-
|
ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data
(
Poster
)
>
|
Ulyana Tkachenko · Aditya Thyagarajan · Jonas Mueller 🔗 |
-
|
Characterizing Risk Regimes for Safe Deployment of Deep Regression Models
(
Poster
)
>
|
Jayaraman J. Thiagarajan · Vivek Narayanaswamy · Puja Trivedi · Rushil Anirudh 🔗 |
-
|
Offline Reinforcement Learning with Imbalanced Datasets
(
Poster
)
>
|
Li Jiang · Sijie Cheng · Jielin Qiu · Victor Chan · Ding Zhao 🔗 |
-
|
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data
(
Poster
)
>
|
Alycia Lee · Brando Miranda · Brando Miranda · Sanmi Koyejo 🔗 |
-
|
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
(
Poster
)
>
|
Yue Yu · Yuchen Zhuang · Jieyu Zhang · Yu Meng · Alex Ratner · Ranjay Krishna · Jiaming Shen · Chao Zhang 🔗 |
-
|
Is Pre-training Truly Better Than Meta-Learning?
(
Poster
)
>
|
Brando Miranda · Patrick Yu · Saumya Goyal · Yu-Xiong Wang · Sanmi Koyejo 🔗 |
-
|
Characterizing the Impacts of Semi-supervised Learning for Weak Supervision
(
Poster
)
>
|
Jeffrey Li · Jieyu Zhang · Ludwig Schmidt · Alex Ratner 🔗 |
-
|
A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification
(
Poster
)
>
|
Min Du · Nesime Tatbul · Brian Rivers · Akhilesh Kumar Gupta · Lucas Hu · Wei Wang · Ryan Marcus · Shengtian Zhou · Insup Lee · Justin Gottschlich 🔗 |
-
|
Learning pipeline-invariant representation for robust brain phenotype prediction
(
Poster
)
>
|
Xinhui Li · Alex Fedorov · Mrinal Mathur · Anees Abrol · Gregory Kiar · Sergey Plis · Vince Calhoun 🔗 |
-
|
Improving multimodal datasets with image captioning
(
Poster
)
>
|
Thao Nguyen · · Gabriel Ilharco · Sewoong Oh · Ludwig Schmidt 🔗 |
-
|
Adaptive Aggregated Drift Detector
(
Poster
)
>
|
Beverly Quon · Jean-Luc Gaudiot 🔗 |
-
|
On Estimating the Epistemic Uncertainty of Graph Neural Networks using Stochastic Centering
(
Poster
)
>
|
Puja Trivedi · Mark Heimann · Rushil Anirudh · Danai Koutra · Jayaraman J. Thiagarajan 🔗 |
-
|
Identifying Implicit Social Biases in Vision-Language Models
(
Poster
)
>
|
Kimia Hamidieh · Haoran Zhang · Thomas Hartvigsen · Marzyeh Ghassemi 🔗 |
-
|
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
(
Poster
)
>
|
Amro Abbas · Daniel Simig · Surya Ganguli · Ari Morcos · Kushal Tirumala 🔗 |
-
|
LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning
(
Poster
)
>
|
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak 🔗 |
-
|
Internet Explorer: Targeted Representation Learning on the Open Web
(
Poster
)
>
|
Alexander Li · Ellis Brown · Alexei Efros · Deepak Pathak 🔗 |
-
|
Graphtester: Exploring Theoretical Boundaries of GNNs on Graph Datasets
(
Poster
)
>
|
M. Eren Akbiyik · Florian Grötschla · Beni Egressy · Roger Wattenhofer 🔗 |
-
|
Early Experiments in Scalable Dataset Selection for Self-Supervised Learning in Geospatial Imagery Models
(
Poster
)
>
|
Muhammed Razzak · Anthony Ortiz · Caleb Robinson 🔗 |
-
|
Uncovering Neural Scaling Law in Molecular Representation Learning
(
Poster
)
>
|
Dingshuo Chen · Yanqiao Zhu · Jieyu Zhang · Yuanqi Du · Zhixun Li · Qiang Liu · Shu Wu · Liang Wang 🔗 |
-
|
MultiLegalPile: A 689GB Multilingual Legal Corpus
(
Poster
)
>
|
Joel Niklaus · Veton Matoshi · Matthias Stürmer · Ilias Chalkidis · Daniel Ho 🔗 |
-
|
On Memorization and Privacy risks of Sharpness Aware Minimization
(
Poster
)
>
|
Young In Kim · Pratiksha Agrawal · Johannes Royset · RAJIV KHANNA 🔗 |
-
|
Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?
(
Poster
)
>
|
Luísa Shimabucoro · Timothy Hospedales · Henry Gouk 🔗 |
-
|
Can Expert Demonstration Guarantee Offline Performance in Sparse Reward Environment?
(
Poster
)
>
|
Jeyeon Eo · Dongsu Lee · Minhae Kwon 🔗 |
-
|
The Matrix Reloaded: A Counterfactual Perspective on Bias in Machine Learning
(
Poster
)
>
|
Andre Carreiro · Mariana Pinto · Pedro Madeira · Alberto Lopez · Hugo Gamboa 🔗 |
-
|
D4: Document Deduplication and Diversification
(
Poster
)
>
|
Kushal Tirumala · Daniel Simig · Armen Aghajanyan · Ari Morcos 🔗 |
-
|
On Data Quality and Speed of Training: Bad Data Slows Training
(
Poster
)
>
|
Newsha Ardalani · Mostafa Elhoushi · Carole-Jean Wu 🔗 |
-
|
Decoupled Graph Label Denoising for Robust Semi-Supervised Node Classification
(
Poster
)
>
|
Kaize Ding · Yancheng Wang · Huan Liu 🔗 |
-
|
Ensemble Fractional Imputation for Incomplete Categorical Data with a Graphical Model
(
Poster
)
>
|
Yonghyun Kwon · Jae-kwang Kim 🔗 |
-
|
Put on your detective hat: What's wrong in this video?
(
Poster
)
>
|
12 presentersRohith Peddi · Shivvrat Arya · Bharath Challa · Likhitha Pallapothula · Akshay Vyas · Qifan Zhang · Jikai Wang · Vasundhara Komaragiri · Eric Ragan · Nicholas Ruozzi · Yu Xiang · Vibhav Gogate |
-
|
Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value
(
Poster
)
>
|
Yongchan Kwon · James Zou 🔗 |
-
|
Regularizing Neural Networks with Meta-Learning Generative Models
(
Poster
)
>
|
Shin'ya Yamaguchi · Daiki Chijiwa · Sekitoshi Kanai · Atsutoshi Kumagai · Hisashi Kashima 🔗 |
-
|
Taming Small-sample Bias in Low-budget Active Learning
(
Poster
)
>
|
Linxin Song · Jieyu Zhang · Xiaotian Lu · Tianyi Zhou 🔗 |
-
|
PhysicsCAP: Natural Scene Understanding By Semantic Segmentation, CLIP And Physical Models Through Refined and Enriched Captions
(
Poster
)
>
|
Hidetomo Sakaino 🔗 |
-
|
Training with Low-Label-Quality Data: Rank Pruning and Multi-Review
(
Poster
)
>
|
Yue Xing · Ashutosh Pandey · David Yan · Fei Wu · Michael Fronda · Pamela Bhattacharya 🔗 |
-
|
DataCI: A Platform for Data-Centric AI on Streaming Data
(
Poster
)
>
|
Huaizheng Zhang · Liao Chang · Yuanming Li 🔗 |
-
|
Participatory Personalization in Classification
(
Poster
)
>
|
Hailey Joren · Chirag Nagpal · Katherine Heller · Berk Ustun 🔗 |
-
|
Making Scalable Meta Learning Practical
(
Poster
)
>
|
Sang Keun Choe · Sanket Vaibhav Mehta · Hwijeen Ahn · Willie Neiswanger · Pengtao Xie · Emma Strubell · Eric Xing 🔗 |
-
|
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
(
Poster
)
>
|
Guozheng Ma · · Haoyu Wang · Lu Li · Zilin Wang · Zhen Wang · Li Shen · Xueqian Wang · Dacheng Tao 🔗 |
-
|
Data Integration for Driver Telematics with Selection Biases
(
Poster
)
>
|
Hashan Peiris · Himchan Jeong · Jae-kwang Kim 🔗 |
-
|
Self-supervised Autoencoder for Correlation-Preserving in Tabular GANs
(
Poster
)
>
|
Siddarth Ramesh · Surgan Jandial · Gauri Gupta · Piyush Gupta · Balaji Krishnamurthy 🔗 |
-
|
Why Do Self-Supervised Models Transfer? On Data Augmentation and Feature Properties
(
Poster
)
>
|
Linus Ericsson · Henry Gouk · Timothy Hospedales 🔗 |
-
|
Principlism Guided Responsible Data Curation
(
Poster
)
>
|
Jerone Andrews · Dora Zhao · William Thong · Apostolos Modas · Orestis Papakyriakopoulos · Alice Xiang 🔗 |