Paper ID: 56 Title: Metadata-conscious anonymous messaging Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper tackles the problem of how to obfuscate the source of a message by engineering the time in which the message is forwarded in an anonymous social network such that an adversary that uses statistical analysis cannot infer the source from the times in which the message is forwarded. The authors provide Clarity - Justification: The paper is difficult to follow, especially if one does not read the appendix and Fanti et al., 2015. The authors often skip details, which makes it difficult to follow the technical details. Significance - Justification: The problem is scientifically interesting and it has been barely studied. The authors provide a sound theoretical analysis with novel ideas. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors tackle a very interesting problem that has been barely studied to date. Their analysis is sound, however, I have some suggestions that could improve the authors' paper: - The authors discuss two adversarial models. However, a spy can only receive a % of the message times, from a random set of nodes, different each time (e.g., twitter provides a % of the tweets for free). This model has been used in a recent paper for inferring the source of partially observed cascades: Farajtabar et al., AISTATS 2015 (http://arxiv.org/abs/1501.06582), which the authors did not cite. - The section that describes adaptive diffusion on a line in Section 2 is difficult to follow and I believe several details are omitted. For example, in Fanti et al., it seems that adaptive diffusion also requires the actual diffusion probability to depend on the distance between the source and the virtual source, however, the authors do not talk about diffusion probabilities. - The authors argue that in standard diffisuion, the distance between spies will be large. Social networks are known to have small diameter (small distance between nodes), and therefore, it is unclear whether the delay distribution can be approximated by a Gaussian. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper is heavily based on recent prior work (Fanti et al., 2015) on obfuscating randomized rumor spreading on graphs. The contributions of this paper are more precise analyses of protocols proposed in this previous work, including an analysis of maximum likelihood procedures for very regular network structures. The flavor of the paper is an analysis of stochastic processes on graphs. The author(s) claim to investigate "the fundamental question of how to intervene in the propagation of anonymous messages in order to make it difficult to find the source" but this is under a very particular (random) spy model, and the notion of "fundamental question" seems a bit overstated. Note: this is a supplemental review to the regular reviewers. I have neither seen reviewer nor author comments and am providing some limited comments with regards to the contributions of the paper, its thematic fit for ICML, and some additional feedback. In particular, I was not able to evaluate the supplementary material, given the short time frame for the review of this paper. Clarity - Justification: The paper feels somewhat unreadable without knowledge of Fanti et al. (2015). Despite that, the motivation and mathematics are relatively clear. I felt that the paper spent a bit too much time on marketing and justification (almost 2 pages) including discussions of the "ecosystem" of P2P systems which felt more at home in a venue like SIGCOMM or INFOCOM or SIGMETRICS (where the work of Fanti et al. appeared). As such, the paper felt as though it were a repurposed paper previously submitted elsewhere, rather than an attempt to engage the machine learning community (writ large) with a new problem framework. Significance - Justification: I believe this is a nice analysis of a protocol for a rather regular model for spreading on graphs. As such it belongs to the sphere of estimation problems for random processes on graphs. With respect to that literature, this is a nice contribution. I am less confident that it would be of interest more broadly without the authors making some more effort to indicate where data-driven approaches could benefit from these theoretical insights. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors prove several statements. 1) An upper bound on the ML estimate of the source node for diffusion processes on a line graph. 2) A lower bound on the detection probability for any spreading protocol. 3) A lower bound on the detection probability for an estimator for diffusion on regular trees. 4) A (tight) characterization of the ML estimate for randomly placed spies on a d-regular tree. My main question regarding this work is to what degree it is "in-scope" for ICML. This is a subjective statement, and one which I imagine the authors would resist. I am aware that this might seem like defending the turf of ICML. However, unless the committee feels that the authors would make a significant effort to rewrite the paper to engage more of the ICML attendees, I think that this paper would find marginal interest. In particular, I have a hard time identifying previous papers at ICML on epidemic spread and related phenomena. This in itself is not a problem -- the ICML community should be open to new problems and new models. However, to do that the authors would have to offer more of a bridge. If this paper directly raised significant new questions for other learning problems or opened up new interesting directions for existing learning frameworks, I might be more sanguine about it. Instead, as written, the authors give a slight nod towards other extensions/applications. As it is, I think that it is an interesting result that could find a better home elsewhere. In the end, accepting this paper to the program feels like a matter of taste: the results are correct and relatively clear, but the level of interest of conference attendees is less clear. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper discusses message passing protocols when anonymity is of interest and there are 'spy' nodes that are trying to determing the message source. They consider two diffusion protocols: standard information diffusion and adaptive information diffusion. They derive an information theoretic lower bound in proposition 3.1 for any protocol for regular trees. Theorem 1 derives the probability of localization by the maximum likelihood estimator. The experiments demonstrate a drastic improvement due to adaptive message passing. Clarity - Justification: It is not exactly clear what the contribution of the theorems are or how to evaluate them. Significance - Justification: The problem of anonymous message passing is well motivated. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The interpretation of the adaptive diffusion via Polya urn model is a nice addition to the paper, and makes the method clear. The discussion after the statement of theorem 1 is helpful also. One issue is that theorem 1 is not information theoretic, but rather specific to ML estimation. It is not clear if ML estimation is optimal in this case. ===== Review #4 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a anonymous messaging method in which the writer of a message cannot be estimated accurately from meta information such as timestamps. The paper shows that adaptive diffusion protocol performs than standard diffusion method for regular trees. Clarity - Justification: The paper is well-written and well-organized. However, the problem addressed in the paper is very specific and more illustration/visualization of the basic protocol would be helpful to a general audience. Significance - Justification: Anonymous messaging is arguably an important practical problem, and the paper presents a concrete work toward this goal. The question is its relevance to the general machine learning audience, as there is no learning component in the problem itself (except for underlying probabilistic assumptions and ML estimation.) The paper is likely to receive much more attention from networking and security research community than the machine learning community. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The reviewer is not an expert in this problem and is unable to provide a detailed review. ===== Review #5 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The goal of anonymous messaging is that messages can propagate on a network without revealing their source. In their paper the authors discuss a realistic model for an adversary which is able to observe when and from where a message reaches spy nodes in the network. Adaptive diffusion has been developed hide the location of the source, but meta-data (control packages) used in some variants of the algorithm and available to spy nodes reveals it. The authors show that the tree protocol of adaptive diffusion is robust against spy nodes in the case of networks with a large degree. Clarity - Justification: The paper includes a full description of the tree protocol for adaptive diffusion and the inference algorithm used by the adversary, which are needed to reproduce the experiments. Results from both analytic calculations and numerical experiments are presented in a clear way. However, results regarding the scaling of the success probability for different variants are spread out over the whole paper and difficult to compare. Significance - Justification: While the authors do not propose a new algorithm for hiding the source of a message spreading on a network, they discuss a scenario, which is very realistic and thus relevant for anonymous messaging. Here they show that an algorithm designed against a different adversary works in this case, too, which is a new result. Additionally, several bounds for success probabilities and their proofs are provided. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The scenario and the results discussed in the paper are very interesting, but their presentation could be improved. The scaling laws for the success probability should be discussed in one place, e.g. a conclusion. And the extensive discussion of the line graph in the paper could be swapped with the section for regular trees in the supplementary material. =====