medium-2-md: Convert Medium posts to markdown with front matter

I have recently migrated my (this) blog out of Medium because of several reasons. While I have covered each of these reasons, with details, in a different blog post, this post is about one of the technical problems with the migration itself.

The new platform where I have migrated my blog is Hugo. Hugo is a static site generator written in Go and is amazingly fast. It also supports markdown content files similar to many other static site generators.

The first thing that came to my mind before migrating my blog from Medium to Hugo was the export and import of content (posts). Medium supports export of content in HTML only (sigh). When you export your content from Medium, you get HTML files for everything — posts (stories), people and tags you are following, highlights, etc. etc.

Medium - exported content

See the image above, that’s what I got when I exported my Medium content through the settings page. Each of those directories has HTML files in it. I was really surprised that in this world of web2 (almost web3) when the primary ways of data exchange are JSON, YAML, XML, etc., why on Earth is Medium exporting content in HTML. Anyway, drama aside, my issue was that the exported data was in HTML and the import was needed in markdown.

So, I built a tool to convert exported HTML Medium posts/stories into Jekyll/Hugo compatible markdown.

Introducing medium-2-md

medium-2-md (yeah, that’s what I’m calling it) is a simple CLI tool which takes a directory containing Medium posts' HTML files and converts them into markdown. Not only that, but it downloads images and also adds a rich front matter to these converted markdown files so that they can be directly used in Jekyll or Hugo.

https://www.npmjs.com/package/medium-2-md

The CLI tool is written with node.js. It expects the input HTML files to have the same tags and attributes as the files contained in the posts directory shown in the picture above. That way it is able to extract all the information needed for the front matter.

It has not been tested for general purpose HTML to markdown conversion; it is only suitable for HTML files exported from Medium stories/posts.

Usage

medium-2-md is available as an npm package so that it can be easily downloaded and used without the need of going through the code. All you need to do is download and install it (globally) and then use it directly from the command line. Follow the steps listed below to use medium-2-md.

Step 1: Export and extract your Medium posts from your Medium account.

  • Go to https://medium.com/me/settings and scroll to Download your information. Click the download button. This will give you a medium-export.zip archive containing all your Medium content.
  • Extract the .zip archive downloaded in the previous step. It will have a sub-directory called posts.
  • Copy the path of this posts directory.

Step 2: Install node.js and medium-2-md on your system.

  • Download and Install node.js - https://nodejs.org/en/download/.
  • Install medium-2-md package - npm i -g medium-2-md.

Note: The -g flag means that you are installing the package globally. It is essential in order to use the package directly from the command prompt/terminal.

Step 3: Run the following command to convert all your Medium posts (HTML) to markdown files.

medium-2-md convertLocal '<path of the posts directory>' -dfi

That’s it. The output markdown files will be stored in a sub-directory called md_<a big number> in the posts directory itself. (By the way, that big number is coming from the Date.now() JavaScript function, added to differentiate the output folders in case we go crazy with it.)

The converted markdown files will also have front matter which will have the title, description, published date and canonical URL of the original Medium post/story. Images will be downloaded inside a sub-directory in the output directory.

More details about the package are available in the readme file at the medium-2-md GitHub repository.

This tool helped save a lot of time when I was converting all my stories/posts to markdown. I tested the converted markdown in a test Hugo site and it worked like a charm. All previous posts in this blog were converted to markdown using medium-2-md.

A couple of things that we need to take care manually are code fences and tags. There is no tag related information in the exported Medium files. The code fences are also exported as plain text and the markdown conversion ignores them too.

Give it a try and feel free to provide your feedback using the issue tracker on the GitHub repository - https://github.com/gautamdhameja/medium-2-md.

Markdown, HTML, Hugo, Jekyll, Data Migration, Blogging
comments powered by Disqus