@小二 #17
翻了一下原paper,https://bmcmedethics.biomedcentral.com/articles/10.1186/s12910-019-0406-6
Again,我个人不是empiricist,学过相关课程,但是不在做前沿研究,有做宏观或者计量的大拿欢迎提供更专业的意见…
首先看COTRS 2017 data,这个是aggregated的每年数据。7个数据点2此项拟合可以达到>0.999,还是很高的,但是不能排除可能性。文章的这段discussion非常好
The R-squared statistic can be difficult to interpret correctly. It is problematic to apply a formal statistical significance test to this data to determine whether it is too close to a quadratic curve, primarily because the behavior of structural growth in an industry, especially one involving as many contingencies as voluntary organ transplantation, is difficult to quantify. For this reason the same measures of closeness of fit were calculated for comparable data for 50 other countries from the Global Observatory on Donation and Transplantation (GODT) database.
在双11的例子中,我可以用天猫的数据(毕竟需要第三方审计满足上市公司的信息披露要求)作为benchmark,开判断双11数据的准确度。但是这篇文章找不到中国真实可信的benchmark的数据,所以用了其他国家的数据做benchmakr,然后发现中国器官移植数据的数据过于标准了…
It was found that when fitted to quadratic equations, every other country was between one and two orders of magnitude further away from the perfect R-squared of 1 compared to China. Of all other countries, the closest R-squared was 1.30% away from a perfect 1, with others ranging down to 99.9% away from 1; China’s three values ranged between .112% to .0478% away from 1. It was further discovered that the mean squared errors of China did not conform to the pattern exhibited by all other countries.
当然,后面作者还用COTRS 2017 data做了一些robustness tests。我看来这个COTRS 2017 data数据的确有点可疑。
后来,作者还pick up了一些Central Red Cross data的数据,这个是细化的短期数据,有更高的variability。作者讨论了5个不同寻常的数据点,作者认为是manipulation。有这种可能,但是manipulation实在是很难证明的,也可能有其他解释(比如数据平时可能遗漏,在年末审计的时候被找出来更正),我个人对作者对这一个Central Red Cross data数据的讨论存疑(至少还需要找其他benchmark来比较)。
作者最终结论:
The first is that the unusual and anomalous features in the data are due to deliberate human intervention. We believe this is the only plausible explanation for the qualities identified in the COTRS, and central and provincial Red Cross data, which include mirroring of quadratic formulae, stubborn adherence to arbitrary ratios, anomalies that abrogate the mathematical integrity of data series, unsubstantiated growth patterns, and other irregularities. It is difficult to imagine how such data from three sources could have come to possess these qualities if not for deliberate, ongoing and imperfect human intervention.
The second is that this intervention could not have been piecemeal or without forethought.
同意第一个,毕竟数学统计分析看上去plausible,但是第二个结论有点想当然,不是用数据在说话。
--
@笑翻江山 #19
淘宝假货还是很多的,天猫少一些…
@rrrr #20
只要用没有造假的真实benchmark进行对比,假的数据就很容易被发现的… 从数据上来说,从常识来说(上市公司的数据需要通过第三方审计,所以造假的成本很大,因此天猫的年数据应该是真的;在此前提下,双11的数据也应该是真的),我认为双11的确没有造假…
@Merlin #21
“sorghum harvest” 这个梗很隐秘啊,我之前看的时候怎么都想不通,后来搜到其他人的解释才明白… 然后去搜了相关报道,恶心了半天…