We tried our best to recover the latest threads/posts from corrupted RocksDB files and obtained no success.
We then manually scraped every snapshot between 10/13 and 10/24 from Google web cache. The HTML pages are then parsed into JSON documents using a 500-line Python script.
The hardest part of this work is to calculate absolute/precise date and time from relative/blurry time representations (“9分钟前”,“4小时前”), using implicit constraints (e.g. posts with lower PIDs are created earlier than posts with higher PIDs) and a random walk algorithm.
Despite all the effort (~30h spent on this problem), we still lost a significant portion of our users' work. If your threads/posts are still missing and you happen to have a snapshot/backup of the page, please let us know.
Beijing Wudaokou Computing Technology LLC