Now that wasn't too hard. Imported the old db into the database server, created a script that connects to that database using pymysql, loops through the old posts and returns a data structure. Use some Django model bits to import the data I needed... ah well, might as well just show you:
# blog/oldblog.py
from blog.models import Posts
import datetime
import pymysql, pymysql.cursors
usermap = {
'2': '1',
'3': '2',
}
def runimport():
connection = pymysql.connect(settings)
try:
with connection.cursor() as cursor:
sql = "select ownerid,date,post,title from blog order by date desc"
cursor.execute(sql)
result = cursor.fetchall()
finally:
connection.close()
for record in result:
record['date'] = datetime.datetime.utcfromtimestamp(float(record['date']))
importrecord(record['title'],record['post'],usermap[str(record['ownerid'])],record['date'])
def importrecord(title,content,author,date):
try:
p = Posts(title=title, content=content, author_id=author)
p.save()
p.posttime = date
p.save()
print("Post import successful: {}".format(title))
except:
print("--Error: Could not import post: '{}'".format(title))
Then, from there, it's into the Django shell using the production server settings:
>>> from blog.oldblog import runimport,importrecord
>>> runimport()
Code's a little jank and some of the metadata that was part of the old blog system was lost, but that's okay. I can take the hit. Also, a few records failed to import due to not having titles. No big loss there, they were from when I was first building the old blog from scratch in php. Almost entirely "test post", "test post 2", "asdf" type posts. I had to convert the unix timestamp I was using in the old blog (bad) to a proper datetime for the new one (good); Super straightforward with datetime.
Now, this solution worked because I don't have a large dataset. If the blog was millions of lines of posts, I definitely wouldn't want to return the entire table in a single variable; I'd want to return one record at a time, and move the cursor down the dataset, allowing the script to free up memory as it moved along.
There's also the issue of the static usermap. I did it this way because:
- It was quick and dirty
- I only had 2 users to import posts from
The right way to do it would be to do a nested sql lookup against the 'users' table in the old blog, and map the old user to the new user that way. But I didn't maintain usernames across those databases; Old usernames are our handles, new usernames are our names. We'd need a static map anyway.
Last issue, no meaningful error handling. The old posts that failed to import don't get handled, just ignored with a message. But, for reasons stated above, no big deal for something this quick.
Overall: Success. Quick and dirty works a lot of the time, especially for one-off import scripts like this one. Didn't take long to hammer out all the kinks, like the date change thing you may have noticed. Thing about the Django DateTimeField with auto_now_add=True, is even if you specify the date in the field initially, it'll get set to the current time when the record is added. So you've gotta save it, update the time, and then save it again. I'm not sure if there's a way around that, but this method seems to work.
btw, the new blog system makes posting longer/more technical posts much easier and more fun. So I'll probably be doing a lot more of them in the future.