It’s no secret that CPod’s content is inadequately curated, but I’d suggest that you not let the perfect become the enemy of the good. Even if CPod were to hire a team to transcribe the thousands of lessons’ audio content and then try to mine it for HSK vocabulary frequency it still wouldn’t be organized by grammatical features. To do so sounds more like a cutting-edge AI project with programming inputs from linguists. Also, to try to methodically go through the CPod library based on HSK content is contrary to the whole CPod founding concept of being “top-down.” And listeners’ level of known vocabulary would not necessarily follow HSK levels. Unless someone went from knowing zero Chinese to enrolling in cram courses for HSK 1, HSK 2, etc. it seems unlikely that anyone’s vocabulary would precisely correspond to HSK levels, so it may be an exaggeration to say that your proposed project would allow CPod to organize lessons “VERY precisely.” As a practical matter, I would suggest:
- If you want to select lessons where the level labels (newbie, elementary, etc.) really mean something, stick to the ones where John Pasden was there and actively involved in organizing the lessons.
- If you want transcripts, there were quite a few created collaboratively under the user group “Transcripts with Tal.” The users who did the transcripts got the added benefit of adding a new dimension to their learning experience.
- Use CPod for what it is: a place to learn Chinese based firstly on subject matter that interests you, at a level that at least approximates or is only slightly above your current ability.
I hope my criticism does not seem harsh. Your idea is an interesting theoretical use of information technology, but would not be practical to produce or as useful to as wide a user base as you suggest, it seems to me. I hope have understood your meaning fully; I did not get the reference to Lingq.