Workshop
DMLR Workshop: Data-centric Machine Learning Research
Ce Zhang 路 Praveen Paritosh 路 Newsha Ardalani 路 Nezihe Merve G眉rel 路 William Gaviria Rojas 路 Yang Liu 路 Rotem Dror 路 Manil Maskey 路 Lilith Bat-Leah 路 Tzu-Sheng Kuo 路 Luis Oala 路 Max Bartolo 路 Ludwig Schmidt 路 Alicia Parrish 路 Daniel Kondermann 路 Najoung Kim
Ballroom C
Sat 29 Jul, noon PDT
This is the third edition of highly successful workshops focused on data-centric AI, following the success of the Data-Centric AI workshop at NeurIPS 2021 and DataPerf workshop at ICML 2022. Data, and operations over data (e.g., cleaning, debugging, curation) have been continually fueling the success of machine learning for decades. While historically the ML community has focused primarily on model development, recently the importance of data quality has attracted intensive interest from the community, including the creation of the NeurIPS dataset and benchmark track, several data-centric AI benchmarks (e.g., DataPerf), and the flourishing of data consortiums such as LAION, the community鈥檚 attention has been directed to the quality of data used for ML training and evaluation. The goal of this workshop is to facilitate these important topics in what we call Data-centric Machine Learning Research, which includes not only datasets and benchmarks, but tooling and governance, as well as fundamental research on topics such as data quality and data acquisition for dataset creation and optimization.
Schedule
Sat 12:00 p.m. - 12:05 p.m.
|
Introduction and Opening
(
Opening Remarks
)
>
SlidesLive Video |
Praveen Paritosh 馃敆 |
Sat 12:05 p.m. - 12:40 p.m.
|
Keynote 1: Andrew Ng (Landing AI)
(
Keynote
)
>
SlidesLive Video |
Andrew Ng 馃敆 |
Sat 12:40 p.m. - 1:10 p.m.
|
Data-centric Ecosystem: Croissant and Dataperf - Peter Mattson (Google & MLCommons)
(
Talk
)
>
SlidesLive Video |
Peter Mattson 路 Praveen Paritosh 馃敆 |
Sat 1:10 p.m. - 1:25 p.m.
|
Coffee break / networking break link | 馃敆 |
Sat 1:25 p.m. - 2:00 p.m.
|
Keynote 2: Mihaela van der Schaar (University of Cambridge) - Reality-Centric AI
(
Keynote
)
>
SlidesLive Video |
Mihaela van der Schaar 馃敆 |
Sat 2:00 p.m. - 2:30 p.m.
|
Invited Talk 2: Olga Russakovsky (Princeton University)
(
Talk
)
>
SlidesLive Video |
Olga Russakovsky 路 Vikram V Ramaswamy 馃敆 |
Sat 2:30 p.m. - 3:00 p.m.
|
Invited Talk 3: Masashi Sugiyama (RIKEN & UTokyo) - Data distribution shift
(
Talk
)
>
SlidesLive Video |
Masashi Sugiyama 馃敆 |
Sat 3:00 p.m. - 4:00 p.m.
|
Lunch Break / networking break link | 馃敆 |
Sat 4:00 p.m. - 4:35 p.m.
|
Keynote 3: Isabelle Guyon (Google Brain) - Towards Data-Centric AutoML
(
Keynote
)
>
link
SlidesLive Video |
Isabelle Guyon 馃敆 |
Sat 4:35 p.m. - 5:05 p.m.
|
Invited Talk 1: Dina Machuve (DevData Analytics) - Data for Agriculture
(
Talk
)
>
SlidesLive Video |
Dina Machuve 馃敆 |
Sat 5:05 p.m. - 5:20 p.m.
|
Announcement and open discussion on DMLR (Selected members of DMLR Advisory Board)
(
Discussion Panel
)
>
SlidesLive Video |
Ce Zhang 馃敆 |
Sat 5:20 p.m. - 6:15 p.m.
|
Panel Discussion
(
Discussion Panel
)
>
SlidesLive Video |
Megan Ansdell 路 Nathan Lambert 路 Ludwig Schmidt 路 Praveen Paritosh 路 Sang Michael Xie 馃敆 |
Sat 6:15 p.m. - 6:30 p.m.
|
Coffee break / networking break link | 馃敆 |
Sat 6:30 p.m. - 7:30 p.m.
|
Poster Session 1
(
Poster Session - In Person
)
>
|
馃敆 |
Sat 7:30 p.m. - 8:00 p.m.
|
Poster Session 2 (Virtual) ( Poster Session - Virtual ) > link | 馃敆 |
-
|
Training on Thin Air: Improve Image Classification with Generated Data
(
Poster
)
>
|
Yongchao Zhou 路 Hshmat Sahak 路 Jimmy Ba 馃敆 |
-
|
DMOps: Data Management Operations and Recipes
(
Poster
)
>
|
Eujeong Choi 路 Chanjun Park 馃敆 |
-
|
Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps)
(
Poster
)
>
|
Damrin Kim 路 NamHyeok Kim 路 Chanjun Park 路 Harksoo Kim 馃敆 |
-
|
To Aggregate or Not? Learning with Separate Noisy Labels
(
Poster
)
>
|
Jiaheng Wei 路 Zhaowei Zhu 路 Tianyi Luo 路 Ehsan Amid 路 Abhishek Kumar 路 Yang Liu 馃敆 |
-
|
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training
(
Poster
)
>
|
Jieyu Zhang 路 Bohan Wang 路 zhengyu hu 路 Pang We Koh 路 Alex Ratner 馃敆 |
-
|
Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios
(
Poster
)
>
|
NamHyeok Kim 路 Chanjun Park 馃敆 |
-
|
Algorithm Selection for Deep Active Learning with Imbalanced Datasets
(
Poster
)
>
|
Jifan Zhang 路 Shuai Shao 路 Saurabh Verma 路 Robert Nowak 馃敆 |
-
|
How to Improve Imitation Learning Performance with Sub-optimal Supplementary Data?
(
Poster
)
>
|
Ziniu Li 路 Tian Xu 路 Zeyu Qin 路 Yang Yu 路 Zhiquan Luo 馃敆 |
-
|
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
(
Poster
)
>
|
Sang Michael Xie 路 Hieu Pham 路 Xuanyi Dong 路 Nan Du 路 Hanxiao Liu 路 Yifeng Lu 路 Percy Liang 路 Quoc Le 路 Tengyu Ma 路 Adams Wei Yu 馃敆 |
-
|
How to Cope with Gradual Data Drift?
(
Poster
)
>
|
Rasool Fakoor 路 Jonas Mueller 路 Zachary Lipton 路 Pratik Chaudhari 路 Alex Smola 馃敆 |
-
|
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
(
Poster
)
>
|
Chanjun Park 路 Seonmin Koo 路 Seolhwa Lee 路 Jaehyung Seo 路 Sugyeong Eo 路 Hyeonseok Moon 路 HEUISEOK LIM 馃敆 |
-
|
Programmable Synthetic Tabular Data Generation
(
Poster
)
>
|
Mark Vero 路 Mislav Balunovic 路 Martin Vechev 馃敆 |
-
|
Unitail: A Benchmark for Detecting, Reading, and Matching in Retail Scene
(
Poster
)
>
|
Fangyi Chen 路 Han Zhang 路 Hao Chen 路 Kai Hu 路 Jiachen Dou 路 zaiwang li 路 Chenchen Zhu 路 Marios Savvides 馃敆 |
-
|
Understanding Unfairness via Training Concept Influence
(
Poster
)
>
|
Yuanshun Yao 路 Yang Liu 馃敆 |
-
|
Promises and Pitfalls of Threshold-based Auto-labeling
(
Poster
)
>
|
Harit Vishwakarma 路 Heguang Lin 路 Frederic Sala 路 Ramya Korlakai Vinayak 馃敆 |
-
|
Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors
(
Poster
)
>
|
Jesse Cummings 路 Jonas Mueller 路 El铆as Snorrason 馃敆 |
-
|
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
(
Poster
)
>
|
Seungjun Lee 路 Hyeonseok Moon 路 Chanjun Park 路 HEUISEOK LIM 馃敆 |
-
|
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
(
Poster
)
>
|
Jinyi Liu 路 Yi Ma 路 Jianye Hao 路 Yujing Hu 路 Yan Zheng 路 Tangjie Lv 路 Changjie Fan 馃敆 |
-
|
CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training
(
Poster
)
>
|
A. Feder Cooper 路 Wentao Guo 路 Duc Khiem Pham 路 Tiancheng Yuan 路 Charlie Ruan 路 Yucheng Lu 路 Chris De Sa 馃敆 |
-
|
Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion
(
Poster
)
>
|
Si Chen 路 Feiyang Kang 路 Nikhil Abhyankar 路 Ming Jin 路 Ruoxi Jia 馃敆 |
-
|
Probing Heterogeneous Pretraining Datasets with Small Curated Datasets
(
Poster
)
>
|
Gregory Yauney 路 Emily Reif 路 David Mimno 馃敆 |
-
|
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation
(
Poster
)
>
|
Joshua Vendrow 路 Saachi Jain 路 Logan Engstrom 路 Aleksander Madry 馃敆 |
-
|
EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost
(
Poster
)
>
|
Jaeseung Heo 路 Seungbeom Lee 路 Sungsoo Ahn 路 Dongwoo Kim 馃敆 |
-
|
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
(
Poster
)
>
|
Seonmin Koo 路 Chanjun Park 路 Jinsung Kim 路 Jaehyung Seo 路 Sugyeong Eo 路 Hyeonseok Moon 路 HEUISEOK LIM 馃敆 |
-
|
Contrastive clustering of tabular data
(
Poster
)
>
|
Piotr Przemielewski 路 Witold Wydma艅ski 路 Marek 艢mieja 馃敆 |
-
|
Investigating minimizing the training set fill distance in machine learning regression
(
Poster
)
>
|
Paolo Climaco 路 Jochen Garcke 馃敆 |
-
|
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
(
Poster
)
>
|
Patrik Okanovic 路 Roger Waleffe 路 Vasileios Mageirakos 路 Konstantinos Nikolakakis 路 Amin Karbasi 路 Dionysios Kalogerias 路 Nezihe Merve G眉rel 路 Theodoros Rekatsinas 馃敆 |
-
|
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data
(
Poster
)
>
|
Nathan Vaska 路 Victoria Helus 馃敆 |
-
|
Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks
(
Poster
)
>
|
Natalie Abreu 路 Nathan Vaska 路 Victoria Helus 馃敆 |
-
|
Fair Machine Unlearning: Data Removal while Mitigating Disparities
(
Poster
)
>
|
Alex Oesterling 路 Jiaqi Ma 路 Flavio Calmon 路 Hima Lakkaraju 馃敆 |
-
|
TMARS: Improving Visual Representations by Circumventing Text Feature Learning
(
Poster
)
>
|
Pratyush Maini 路 Sachin Goyal 路 Zachary Lipton 路 Zico Kolter 路 Aditi Raghunathan 馃敆 |
-
|
Do Machine Learning Models Learn Statistical Rules Inferred from Data?
(
Poster
)
>
|
Aaditya Naik 路 Yinjun Wu 路 Mayur Naik 路 Eric Wong 馃敆 |
-
|
Predicting Article Time Periods with Text2Time: A Transformer-based Approach
(
Poster
)
>
|
KARTHICK GUNASEKARAN 馃敆 |
-
|
Knowledge Graph-Augmented Korean Generative Commonsense Reasoning
(
Poster
)
>
|
Dahyun Jung 路 Jaehyung Seo 路 Jaewook Lee 路 Chanjun Park 路 HEUISEOK LIM 馃敆 |
-
|
Accelerating Batch Active Learning Using Continual Learning Techniques
(
Poster
)
>
|
Gantavya Bhatt 路 Arnav M Das 路 路 Rui Yang 路 Vianne Gao 路 Jeff Bilmes 馃敆 |
-
|
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
(
Poster
)
>
|
Liangchen Luo 路 Lei Shu 路 Jayakumar Hoskere 路 Yun Zhu 路 Canoee Liu 路 Simon Tong 路 Jindong Chen 路 Lei Meng 馃敆 |
-
|
Data Similarity is Not Enough to Explain Language Model Performance
(
Poster
)
>
|
Gregory Yauney 路 Emily Reif 路 David Mimno 馃敆 |
-
|
Enhancing Time Series Forecasting Models under Concept Drift by Data-centric Online Ensembling
(
Poster
)
>
|
Yi-Fan Zhang 路 Qingsong Wen 路 Xue Wang 路 Weiqi Chen 路 Liang Sun 路 Zhang Zhang 路 Liang Wang 路 Rong Jin 路 Tieniu Tan 馃敆 |
-
|
A Privacy-Friendly Approach to Data Valuation
(
Poster
)
>
|
Jiachen Wang 路 Yuqing Zhu 路 Yu-Xiang Wang 路 Ruoxi Jia 路 Prateek Mittal 馃敆 |
-
|
Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources
(
Poster
)
>
|
Feiyang Kang 路 Hoang Anh Just 路 Anit Kumar Sahu 路 Ruoxi Jia 馃敆 |
-
|
Improve Model Inference Cost with Image Gridding
(
Poster
)
>
|
Shreyas Krishnaswamy 路 Lisa Dunlap 路 Lingjiao Chen 路 Matei Zaharia 路 James Zou 路 Joseph Gonzalez 馃敆 |
-
|
THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech
(
Poster
)
>
|
Saad Almohaimeed 路 Saleh Almohaimeed 路 Saleh Almohaimeed 路 Ashfaq Ali Shafin 路 Bogdan Carbunar 路 Ladislau Boloni 馃敆 |
-
|
On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets
(
Poster
)
>
|
Ching-Yun (Irene) Ko 路 Pin-Yu Chen 路 Payel Das 路 Yung-Sung Chuang 路 Luca Daniel 馃敆 |
-
|
Partial Label Learning meets Active Learning: Enhancing Annotation Efficiency through Binary Questioning
(
Poster
)
>
|
Shivangana Rawat 路 Chaitanya Devaguptapu 路 Vineeth Balasubramanian 馃敆 |
-
|
Towards an Efficient Algorithm for Time Series Forecasting with Anomalies
(
Poster
)
>
|
Hao Cheng 路 Qingsong Wen 路 Yang Liu 路 Liang Sun 馃敆 |
-
|
Towards Declarative Systems for Data-Centric Machine Learning
(
Poster
)
>
|
Stefan Grafberger 路 Bojan Karla拧 路 Paul Groth 路 Sebastian Schelter 馃敆 |
-
|
Data Banzhaf: A Robust Data Valuation Framework for Machine Learning
(
Poster
)
>
|
Jiachen Wang 路 Ruoxi Jia 馃敆 |
-
|
No Imputation without Representation
(
Poster
)
>
|
Oliver Lenz 路 Daniel Peralta 路 馃敆 |
-
|
L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models
(
Poster
)
>
|
Aabha Pingle 路 Aditya Vyawahare 路 Isha Joshi 路 Rahul Tangsali 路 Raviraj Joshi 馃敆 |
-
|
Point Cloud Classification with ModelNet40: What is left?
(
Poster
)
>
|
Jarne Van den Herrewegen 路 Tom Tourw茅 路 Francis Wyffels 馃敆 |
-
|
Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?
(
Poster
)
>
|
Megan Richards 路 Diane Bouchacourt 路 Mark Ibrahim 路 Polina Kirichenko 馃敆 |
-
|
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
(
Poster
)
>
|
Julian Bitterwolf 路 Maximilian M眉ller 路 Matthias Hein 馃敆 |
-
|
Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana
(
Poster
)
>
|
Darlington Akogo 路 Issah Samori 路 Cyril Akafia 路 Harriet Fiagbor 路 Andrews Kangah 路 Donald Donald 路 Kwabena Fuachie 路 Luis Oala 馃敆 |
-
|
On the Reproducibility of Data Valuation under Learning Stochasticity
(
Poster
)
>
|
Jiachen Wang 路 Feiyang Kang 路 Chiyuan Zhang 路 Ruoxi Jia 路 Prateek Mittal 馃敆 |
-
|
On the Usefulness of Synthetic Tabular Data Generation
(
Poster
)
>
|
Dionysis Manousakas 路 Sergul Aydore 馃敆 |
-
|
Bayesian Optimisation Against Climate Change: Applications and Benchmarks
(
Poster
)
>
|
Sigrid Passano Hellan 路 Chris Lucas 路 Nigel Goddard 馃敆 |
-
|
Suboptimal Data Can Bottleneck Scaling
(
Poster
)
>
|
Jacob Buckman 路 Kshitij Gupta 路 Ethan Caballero 路 Rishabh Agarwal 路 Marc Bellemare 馃敆 |
-
|
Speech Wikimedia: A 77 Language Multilingual Speech Dataset
(
Poster
)
>
|
Rafael Mosquera G贸mez 路 Julian Eusse 路 Juan Ciro 路 Daniel Galvez 路 Ryan Hileman 路 Kurt Bollacker 路 David Kanter 馃敆 |
-
|
Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
(
Poster
)
>
|
Siddharth Joshi 路 Baharan Mirzasoleiman 馃敆 |
-
|
Active learning for time instant classification
(
Poster
)
>
|
Nauman Ahad 路 Namrata Nadagouda 路 Eva Dyer 路 Mark Davenport 馃敆 |
-
|
Prediction without Preclusion Recourse Verification with Reachable Sets
(
Poster
)
>
|
Avni Kothari 路 Berk Ustun 路 Lily Weng 路 Bogdan Kulynych 馃敆 |
-
|
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
(
Poster
)
>
|
Mayee Chen 路 Nicholas Roberts 路 Kush Bhatia 路 Jue Wang 路 Ce Zhang 路 Frederic Sala 路 Christopher R茅 馃敆 |
-
|
Birds of an Odd Feather: Guaranteed Out-of-Distribution (OOD) Novel Category Detection
(
Poster
)
>
|
Yoav Wald 路 Suchi Saria 馃敆 |
-
|
Mobile Internet Quality Estimation using Self-Tuning Kernel Regression
(
Poster
)
>
|
Hanyang Jiang 路 Yao Xie 路 Ellen Zegura 路 Elizabeth Belding 路 Shaowu Yuchi 馃敆 |
-
|
Estimating label quality and errors in semantic segmentation data via any model
(
Poster
)
>
|
Vedang Lad 路 Jonas Mueller 馃敆 |
-
|
STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Maps
(
Poster
)
>
|
Ammar Sherif 路 Abubakar Abid 路 Mustafa Elattar 路 Mohamed ElHelw 馃敆 |
-
|
Detecting Errors in Numerical Data via any Regression Model
(
Poster
)
>
|
Hang Zhou 路 Jonas Mueller 路 Mayank Kumar 路 Jane-Ling Wang 路 Jing Lei 馃敆 |
-
|
ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data
(
Poster
)
>
|
Ulyana Tkachenko 路 Aditya Thyagarajan 路 Jonas Mueller 馃敆 |
-
|
Characterizing Risk Regimes for Safe Deployment of Deep Regression Models
(
Poster
)
>
|
Jayaraman J. Thiagarajan 路 Vivek Narayanaswamy 路 Puja Trivedi 路 Rushil Anirudh 馃敆 |
-
|
Offline Reinforcement Learning with Imbalanced Datasets
(
Poster
)
>
|
Li Jiang 路 Sijie Cheng 路 Jielin Qiu 路 Victor Chan 路 Ding Zhao 馃敆 |
-
|
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data
(
Poster
)
>
|
Alycia Lee 路 Brando Miranda 路 Brando Miranda 路 Sanmi Koyejo 馃敆 |
-
|
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
(
Poster
)
>
|
Yue Yu 路 Yuchen Zhuang 路 Jieyu Zhang 路 Yu Meng 路 Alex Ratner 路 Ranjay Krishna 路 Jiaming Shen 路 Chao Zhang 馃敆 |
-
|
Is Pre-training Truly Better Than Meta-Learning?
(
Poster
)
>
|
Brando Miranda 路 Patrick Yu 路 Saumya Goyal 路 Yu-Xiong Wang 路 Sanmi Koyejo 馃敆 |
-
|
Characterizing the Impacts of Semi-supervised Learning for Weak Supervision
(
Poster
)
>
|
Jeffrey Li 路 Jieyu Zhang 路 Ludwig Schmidt 路 Alex Ratner 馃敆 |
-
|
A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification
(
Poster
)
>
|
Min Du 路 Nesime Tatbul 路 Brian Rivers 路 Akhilesh Kumar Gupta 路 Lucas Hu 路 Wei Wang 路 Ryan Marcus 路 Shengtian Zhou 路 Insup Lee 路 Justin Gottschlich 馃敆 |
-
|
Learning pipeline-invariant representation for robust brain phenotype prediction
(
Poster
)
>
|
Xinhui Li 路 Alex Fedorov 路 Mrinal Mathur 路 Anees Abrol 路 Gregory Kiar 路 Sergey Plis 路 Vince Calhoun 馃敆 |
-
|
Improving multimodal datasets with image captioning
(
Poster
)
>
|
Thao Nguyen 路 路 Gabriel Ilharco 路 Sewoong Oh 路 Ludwig Schmidt 馃敆 |
-
|
Adaptive Aggregated Drift Detector
(
Poster
)
>
|
Beverly Quon 路 Jean-Luc Gaudiot 馃敆 |
-
|
On Estimating the Epistemic Uncertainty of Graph Neural Networks using Stochastic Centering
(
Poster
)
>
|
Puja Trivedi 路 Mark Heimann 路 Rushil Anirudh 路 Danai Koutra 路 Jayaraman J. Thiagarajan 馃敆 |
-
|
Identifying Implicit Social Biases in Vision-Language Models
(
Poster
)
>
|
Kimia Hamidieh 路 Haoran Zhang 路 Thomas Hartvigsen 路 Marzyeh Ghassemi 馃敆 |
-
|
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
(
Poster
)
>
|
Amro Abbas 路 Daniel Simig 路 Surya Ganguli 路 Ari Morcos 路 Kushal Tirumala 馃敆 |
-
|
LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning
(
Poster
)
>
|
Jifan Zhang 路 Yifang Chen 路 Gregory Canal 路 Stephen Mussmann 路 Yinglun Zhu 路 Simon Du 路 Kevin Jamieson 路 Robert Nowak 馃敆 |
-
|
Internet Explorer: Targeted Representation Learning on the Open Web
(
Poster
)
>
|
Alexander Li 路 Ellis Brown 路 Alexei Efros 路 Deepak Pathak 馃敆 |
-
|
Graphtester: Exploring Theoretical Boundaries of GNNs on Graph Datasets
(
Poster
)
>
|
M. Eren Akbiyik 路 Florian Gr枚tschla 路 Beni Egressy 路 Roger Wattenhofer 馃敆 |
-
|
Early Experiments in Scalable Dataset Selection for Self-Supervised Learning in Geospatial Imagery Models
(
Poster
)
>
|
Muhammed Razzak 路 Anthony Ortiz 路 Caleb Robinson 馃敆 |
-
|
Uncovering Neural Scaling Law in Molecular Representation Learning
(
Poster
)
>
|
Dingshuo Chen 路 Yanqiao Zhu 路 Jieyu Zhang 路 Yuanqi Du 路 Zhixun Li 路 Qiang Liu 路 Shu Wu 路 Liang Wang 馃敆 |
-
|
MultiLegalPile: A 689GB Multilingual Legal Corpus
(
Poster
)
>
|
Joel Niklaus 路 Veton Matoshi 路 Matthias St眉rmer 路 Ilias Chalkidis 路 Daniel Ho 馃敆 |
-
|
On Memorization and Privacy risks of Sharpness Aware Minimization
(
Poster
)
>
|
Young In Kim 路 Pratiksha Agrawal 路 Johannes Royset 路 RAJIV KHANNA 馃敆 |
-
|
Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?
(
Poster
)
>
|
Lu铆sa Shimabucoro 路 Timothy Hospedales 路 Henry Gouk 馃敆 |
-
|
Can Expert Demonstration Guarantee Offline Performance in Sparse Reward Environment?
(
Poster
)
>
|
Jeyeon Eo 路 Dongsu Lee 路 Minhae Kwon 馃敆 |
-
|
The Matrix Reloaded: A Counterfactual Perspective on Bias in Machine Learning
(
Poster
)
>
|
Andre Carreiro 路 Mariana Pinto 路 Pedro Madeira 路 Alberto Lopez 路 Hugo Gamboa 馃敆 |
-
|
D4: Document Deduplication and Diversification
(
Poster
)
>
|
Kushal Tirumala 路 Daniel Simig 路 Armen Aghajanyan 路 Ari Morcos 馃敆 |
-
|
On Data Quality and Speed of Training: Bad Data Slows Training
(
Poster
)
>
|
Newsha Ardalani 路 Mostafa Elhoushi 路 Carole-Jean Wu 馃敆 |
-
|
Decoupled Graph Label Denoising for Robust Semi-Supervised Node Classification
(
Poster
)
>
|
Kaize Ding 路 Yancheng Wang 路 Huan Liu 馃敆 |
-
|
Ensemble Fractional Imputation for Incomplete Categorical Data with a Graphical Model
(
Poster
)
>
|
Yonghyun Kwon 路 Jae-kwang Kim 馃敆 |
-
|
Put on your detective hat: What's wrong in this video?
(
Poster
)
>
|
12 presentersRohith Peddi 路 Shivvrat Arya 路 Bharath Challa 路 Likhitha Pallapothula 路 Akshay Vyas 路 Qifan Zhang 路 Jikai Wang 路 Vasundhara Komaragiri 路 Eric Ragan 路 Nicholas Ruozzi 路 Yu Xiang 路 Vibhav Gogate |
-
|
Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value
(
Poster
)
>
|
Yongchan Kwon 路 James Zou 馃敆 |
-
|
Regularizing Neural Networks with Meta-Learning Generative Models
(
Poster
)
>
|
Shin'ya Yamaguchi 路 Daiki Chijiwa 路 Sekitoshi Kanai 路 Atsutoshi Kumagai 路 Hisashi Kashima 馃敆 |
-
|
Taming Small-sample Bias in Low-budget Active Learning
(
Poster
)
>
|
Linxin Song 路 Jieyu Zhang 路 Xiaotian Lu 路 Tianyi Zhou 馃敆 |
-
|
PhysicsCAP: Natural Scene Understanding By Semantic Segmentation, CLIP And Physical Models Through Refined and Enriched Captions
(
Poster
)
>
|
Hidetomo Sakaino 馃敆 |
-
|
Training with Low-Label-Quality Data: Rank Pruning and Multi-Review
(
Poster
)
>
|
Yue Xing 路 Ashutosh Pandey 路 David Yan 路 Fei Wu 路 Michael Fronda 路 Pamela Bhattacharya 馃敆 |
-
|
DataCI: A Platform for Data-Centric AI on Streaming Data
(
Poster
)
>
|
Huaizheng Zhang 路 Liao Chang 路 Yuanming Li 馃敆 |
-
|
Participatory Personalization in Classification
(
Poster
)
>
|
Hailey Joren 路 Chirag Nagpal 路 Katherine Heller 路 Berk Ustun 馃敆 |
-
|
Making Scalable Meta Learning Practical
(
Poster
)
>
|
Sang Keun Choe 路 Sanket Vaibhav Mehta 路 Hwijeen Ahn 路 Willie Neiswanger 路 Pengtao Xie 路 Emma Strubell 路 Eric Xing 馃敆 |
-
|
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
(
Poster
)
>
|
Guozheng Ma 路 路 Haoyu Wang 路 Lu Li 路 Zilin Wang 路 Zhen Wang 路 Li Shen 路 Xueqian Wang 路 Dacheng Tao 馃敆 |
-
|
Data Integration for Driver Telematics with Selection Biases
(
Poster
)
>
|
Hashan Peiris 路 Himchan Jeong 路 Jae-kwang Kim 馃敆 |
-
|
Self-supervised Autoencoder for Correlation-Preserving in Tabular GANs
(
Poster
)
>
|
Siddarth Ramesh 路 Surgan Jandial 路 Gauri Gupta 路 Piyush Gupta 路 Balaji Krishnamurthy 馃敆 |
-
|
Why Do Self-Supervised Models Transfer? On Data Augmentation and Feature Properties
(
Poster
)
>
|
Linus Ericsson 路 Henry Gouk 路 Timothy Hospedales 馃敆 |
-
|
Principlism Guided Responsible Data Curation
(
Poster
)
>
|
Jerone Andrews 路 Dora Zhao 路 William Thong 路 Apostolos Modas 路 Orestis Papakyriakopoulos 路 Alice Xiang 馃敆 |