Get JBP's lecture transcripts and inject them into their appropriate forum topics


(Benjamin Lupton) #1

It would be fantastic for search and SEO, as well as study for us to get the transcripts/captions of Peterson’s lectures into this forum, as then we can jump to the specific parts of his videos when he says certain things, and can search the videos by what he says.

Youtube does have an official download UI for the timecoded captions, but it is only available to the channel owner, which is not us.

They also seem have an API, which I have not yet tried:

https://developers.google.com/youtube/v3/docs/captions/download

However, searching GitHub there does seem to be several tools:

And it would be easy enough to build a headless browser script to scrape them from the page in realtime. It can even become a saas.

Additional resources:


(Nick Redmark) #2

http://search.jordanbpeterson.com/


(Benjamin Lupton) #3

Pity they don’t make the data public. Jordan would be able to go a lot further if he embraced open source. Especially as those transcriptions he is making searchable were submitted by the community. He should at least give them access too.


(Nick Redmark) #4

Perhaps you will find more info here (I don’t have access to reddit right now):


(Benjamin Lupton) #5

Seems he is using one of the scraping tools above, but is not going into details as he desired it to be a private commercial venture - I guess which he then sold to Peterson.


(Benjamin Lupton) #6

Great, seems the v3 API doesn’t require you to be the channel owner:

https://developers.google.com/youtube/v3/docs/captions/list#usage

/**
 * API response
 */
{
  "kind": "youtube#captionListResponse",
  "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/E_2T0GwW9dmWmtzjw7RwvYg5_1o\"",
  "items": [
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/6cGn6jKHpY3WcGkOVfk6GeanGro\"",
      "id": "ymvDzem3dRLeGFRG4tgivCl20PIRXLQ8kDKPRdAP6Sg=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-09-08T05:27:56.760Z",
        "trackKind": "ASR",
        "language": "en",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/QCcw_9DBqRz4-yxMFFQlDZBWZn0\"",
      "id": "MpHP7NxwDIYqCDMsT329zrHkKqXzmNkm",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-03-05T00:31:29.320Z",
        "trackKind": "standard",
        "language": "cs",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/1EwbwicP0kBtg46fLasU7EKyX2U\"",
      "id": "MpHP7NxwDIYhuvON-RXqEFnsHxzyiTRS",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-17T21:40:57.897Z",
        "trackKind": "standard",
        "language": "el",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/aKaTWucjFqav4Oz2k1DpHSRM9SI\"",
      "id": "MpHP7NxwDIZ_HlG7HDXPSqzj9FR9zS3u",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-10-15T15:02:07.930Z",
        "trackKind": "standard",
        "language": "en",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/7hOnRg7GPYVK2yWHOCr63XyjAZU\"",
      "id": "vvV5Whe6EHaHRmbz__y-vvK6M_OdZLsod0ASc9D0v_c=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-06-22T18:19:13.117Z",
        "trackKind": "standard",
        "language": "es-ES",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/Ig06awSpVMFnowORt8rFx83XFXo\"",
      "id": "MpHP7NxwDIZ48u-oIxyQ-CmWZMRXHcw4",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-10-02T02:33:15.650Z",
        "trackKind": "standard",
        "language": "es",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/5_Lxz46WT3XGgagjG-lxsGCLk44\"",
      "id": "MpHP7NxwDIbE8GEQxiz4CwN7Gf50RT28",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-12-13T21:40:28.478Z",
        "trackKind": "standard",
        "language": "fr",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/UwswSanYmZTb0NRwVTqs1jxJBIg\"",
      "id": "MpHP7NxwDIYgSdM-HPqKmH4f3Zq-peRn",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-03-30T17:50:12.543Z",
        "trackKind": "standard",
        "language": "hr",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/upGN5ZyF0ECVGG3g8Hxvw-QnVBM\"",
      "id": "MpHP7NxwDIbgoQmVp1GQNyudzjaJ26r9",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-09-05T11:46:10.482Z",
        "trackKind": "standard",
        "language": "pl",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/CbZhgPSn6JU14MPDuJShcyJ730Q\"",
      "id": "MpHP7NxwDIb2Y8vixR1OO4p48lXTDnPc",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-02-02T14:38:52.507Z",
        "trackKind": "standard",
        "language": "pt",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/2XCRWLTJG-rKKJbVSSYdDxv-1-c\"",
      "id": "MpHP7NxwDIa6PTKl4TPWbASgIXcBu9E6",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-01-17T22:05:54.769Z",
        "trackKind": "standard",
        "language": "ro",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/-jBcs6MgqQmuvBBX2HN24-DZhaA\"",
      "id": "MpHP7NxwDIYJv0RmOObEaqoH04xA_wFv",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-06-22T18:12:17.678Z",
        "trackKind": "standard",
        "language": "ru",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/Z5oA9p3nspajEtYwsQiMdURLcAQ\"",
      "id": "MpHP7NxwDIZ4v9LGWyx57aI_5dDJE4PW",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-02-02T16:33:07.953Z",
        "trackKind": "standard",
        "language": "sk",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/7gERfTFZ3A_zWsNoxjwWR0av6sA\"",
      "id": "vvV5Whe6EHa0DAKipDDdkPkrr4ZSddXPVnkcKay08Uw=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:44:58.222Z",
        "trackKind": "standard",
        "language": "zh-CN",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/U4yE0JUgIqizBOLphon5Nesn04o\"",
      "id": "ymvDzem3dRKZpT_d37M1p6xf1Tg3qHiHPJqYKiuthB0=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:43:14.956Z",
        "trackKind": "standard",
        "language": "zh-Hans",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/8WCIS0L0YFb5rjHjKl3nJ9JTcu8\"",
      "id": "ymvDzem3dRKZpT_d37M1p4pCmiUawPGXTGPhHKptcqs=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:44:04.105Z",
        "trackKind": "standard",
        "language": "zh-Hant",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/pi-IEqGB5LzcVayleM3r4Wk5tRY\"",
      "id": "vvV5Whe6EHa0DAKipDDdkGLYOz5FfWKP85Yxb1FotZM=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:46:26.935Z",
        "trackKind": "standard",
        "language": "zh-TW",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/pQOm04hpGNBuUInDwPcutKHkWSg\"",
      "id": "MpHP7NxwDIbnl_FaO1FLwBXxSoy9Fh3U",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:47:53.879Z",
        "trackKind": "standard",
        "language": "zh",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    }
  ]
}

https://developers.google.com/apis-explorer/#p/youtube/v3/youtube.captions.list?part=id&videoId=f-wWBGo6a2w&_h=3&

https://developers.google.com/apis-explorer/#p/youtube/v3/youtube.captions.download?id=ymvDzem3dRLeGFRG4tgivCl20PIRXLQ8kDKPRdAP6Sg%3D&_h=4&


And seems youtube-dl has us covered already:

> youtube-dl --skip-download --write-sub --all-subs f-wWBGo6a2w
[youtube] f-wWBGo6a2w: Downloading webpage
[youtube] f-wWBGo6a2w: Downloading video info webpage
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.hr.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.el.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.fr.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.en.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.pt.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.ru.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-Hans.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-TW.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-Hant.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.sk.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-CN.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.pl.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.cs.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.ro.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.es-ES.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.es.vtt
> cat "Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.en.vtt" | less

WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:09.040
[CLASSICAL MUSIC]

00:00:09.340 --> 00:00:29.020
[APPLAUSE AND CHEERS]

00:00:29.100 --> 00:00:32.300
Well, thank you all very much for coming to this.

00:00:32.300 --> 00:00:37.900
It's really shocking to me that you don't have anything better to do on a Tuesday night. [AUDIENCE LAUGHTER]

00:00:38.800 --> 00:00:40.960
No, but seriously, though, it is.

00:00:40.960 --> 00:00:52.240
I mean, it's very strange in some sense that there's so many of you here to listen to a sequence of lectures on the psychological significance of the Biblical stories.

00:00:52.320 --> 00:01:02.120
It's something I've wanted to do for a long time, but it still does surprise me that there's a ready audience for it.

00:01:03.220 --> 00:01:07.440
So that's good, so we'll see how it goes.

00:01:08.920 --> 00:01:11.280
I'll start with this because this is the right question.

00:01:11.280 --> 00:01:13.720
The right question is why bother doing this.

00:01:13.720 --> 00:01:16.380
And I don't mean why should I bother doing it.

(Benjamin Lupton) #7

Now the next step will be injecting it all into discourse.

API Docs:

https://docs.discourse.org

Create:

https://docs.discourse.org/#tag/Topics%2Fpaths%2F~1posts.json%2Fpost

https://docs.discourse.org/#tag/Posts%2Fpaths%2F~1posts.json%2Fpost

Update:

https://docs.discourse.org/#tag/Topics%2Fpaths%2F~1t~1{slug}~1{id}.json%2Fput

https://docs.discourse.org/#tag/Posts%2Fpaths%2F~1posts~1{id}%2Fput

Search:

https://docs.discourse.org/#tag/Search

https://docs.discourse.org/#tag/Categories%2Fpaths%2F~1c~1{id}.json%2Fget


(Benjamin Lupton) #8

So just went to add the english transcript to Bible 1, and got this:

Body is limited to 64000 characters; you entered 211434.

Went to update it, however the max that is allowed is 99000:

max_post_length: Value must be between 0 and 99000.

So not sure how to proceed.


(Benjamin Lupton) #9

Okay, solution here is: