1-First, the title should be changed. No analysis of experimental data is performed, therefore the authors are not working with "reality". Conflating simulation with data is ultimately detrimental to experimental science.
2-The authors cite their reference  several times throughout the paper when discussing theoretical issues of quark versus gluon discrimination. These citations include in the second and third paragraphs of the introduction and immediately before section 2.2. I am not sure why this reference is the catch-all for theory issues with quark/gluon tagging, as reference  consists of two papers published in 2018, and many issues were identified years or even decades earlier. I urge the authors to provide more representative references in place of . For example, the first reference of  mentions the serious theoretical challenges of quark/gluon discrimination, but this was presented as a Les Houches report in their reference . Further, the second reference of  mentions infrared and colliear safety, but this issue was known decades ago.
3-I'm somewhat confused by the setup in the first paragraph of the introduction regarding "kinematic observables". The authors write that there is a change from measuring kinematic observables on jets to measuring the particle four-vectors directly with machine learning. While I understand what the authors are trying to say here, a particle's momentum four-vector is the fundamental kinematic observable. The authors should clarify these statements and more precisely and distinguish between what they mean by "kinematic observable" and what they propose in this paper.
4-At the beginning of section 2, the authors make a couple of imprecise statements. In the first sentence of section 2, they write that "quarks and gluons are poorly defined in perturbative QCD". More precisely, quark and gluon flavor jets are ambiguous in perturbative QCD, and require some definition as there is no preferred definition. In the second paragraph of section 2, the authors write that "pile up could be dealt with by using standard techniques." I understand that this article is likely only to be read by other subject experts, but the authors could provide a few references to some "standard techniques" for context for researchers outside of the field. The authors should make these changes.
5-Finally, the authors should be careful with the interpretation and implications of their 6 variable BDT that is compared to LoLa. The authors demonstrate that each of the 6 observables are individually good quark versus gluon discriminants, but this in no way means that their combination is an "optimal" discriminant. It could be that the information in jets that they use for discrimination is identical, so when combined in a BDT would not improve performance. One is only guaranteed to have an optimal tagger if the observables feed to the machine form some complete basis on phase space. Individual particle four-vectors of course do this, which is why LoLa performs so well. However, the authors should add caveats in their construction and comparison to the 6 observable BDT. Locations in the draft where qualification should be added include the discussion on page 5 and page 8.