Thursday, January 28, 2010

Mashing stuff up, seeking nirvana

Malcolm's Music

My former colleague and current friend Malcolm Gauld has, for quite a few months, been posting "favorite" songs (numbered consecutively, but not ranked), with random commentary, onto his Facebook status. I have enjoyed this series of updates quite a bit, in part because I really agree with Mal's taste in music, in part because I admire anyone who does something so diligently and well, and in part because I've been exposed to some great music because of these updates.

Recently, I realized that I have missed a good number of the status messages simply because I have way too many FB friends and Mal's status doesn't always make it onto my News Feed ("Home") page or into my awareness. So, I commented on one of his updates:
you got a place where you keep track of all these? the list just gets better and better!!!

His response was characteristic (which you'll appreciate if you know Mal):
The Process: Get coffee. Park self at computer. Start the day's work. Start iTunes. Set Shuffle on. Wait for "Song of the Day" to reveal itself. One of the few things in life (mine, anyway) that works every time!

That wasn't what I was asking, Mal! SO I responded:
Yeah, but do you keep track from day to day?
His disappointing and somewhat surprising response:
Not really. I just try not to repeat myself. (But the memory gets shaky.)
This response was a bit surprising only because Mal is known for his meticulous attention to details (about music, generally, but what he likes, in particular) and his extreme willingness to share his opinions and knowledge with others (including, for a long time, in a column he wrote for a newsletter as Headmaster of the Hyde School in Bath, Maine, where, BTW, I worked with him for a while.

Status History?

Me being the kind of person *I* am, I immediately began thinking of whether there was a way to retrieve all those status messages and compile them into a list, both for Mal's sake, and for mine. Predictably, I turned to Google for some answers, and typed in "facebook status updates history" and learned a few things from the pages I found there.


First, I went to a blog post from early 2008 in which Charles Hudson wondered aloud whether it would be possible to collect the status updates of others and "mine" them for interesting information. Then, I went to Yahoo Answers to look at the answer to the question "How do you delete Facebook's past Status History?," because the excerpt of the page on the Google search page suggested that the answer also included information about getting a list of status updates. But, alas, it did not. However, there was a link on that page to another question, "How do I view my past facebook statuses?," which reminded me that you can keep scrolling down on your own "wall" (or profile) page in Facebook and clicking "Older Posts," presumably continually until you reach some fictional end of Facebook's storage capacity (i.e. back to the beginning of your time on Facebook), although that reveals all the other posts (including who gave you what type of heart or other gift when and whose comment you liked when and every instance of someone tagging you on one of those trendy surveys that appear continuously when), as well as the status messages and everything else you post. In other words, clicking every possible "Older Posts" link on your profile and then Selecting All (Ctrl-A) and pasting that into a word processor and then tediously editing out everything not relevant to your own statuses could, in theory anyway, give you a list of prior status updates. But I wasn't going to suggest that to Mal. He and I have better things to do. (Well, at least HE does....)

Anyway, the question asked was whether there was any OTHER way to do this, and the "Best Answer - Chosen by Voters" (!) was "You can't."


But that answer was more than a year ago, and, well, me being one who doesn't take "No" for an answer, well, I wasn't going to accept that.

So I went back to the Google search. This time, I paid a little more attention to what was on the list. Lo! and Behold! The FIRST "hit" on the page was a link to information about a Facebook application purporting to give you a list of all of your status updates!


("That'll teach me to be so cursory in my examination of the hits on a Google search," I said to myself unconvincingly.)

So I followed that link. There actually wasn't any information there, at all. (Where did Google get the information? I wondered. Perhaps one gets a different page if one isn't perpetually signed into Facebook (as I am)...in any case, I got a dialog box asking me to "Allow" permission for the app to gain access to my information, including (it seems to me) anything they want for any purpose they want (!)...but the app had four stars, so I willingly gave my assent and was immediately taken to a page with....


,,,NOTHING!!!!

But I was not going to give up after getting so close! I decided this was just a glitch in the app, and so went to the little "Facebook" application button in the lower left of the screen

and found the "My status history" link on top of the "Recently Used" section, and clicked, and Voila! ...a list of all my status updates (in either newest to oldest or the opposite order), all 563 of them (I learned later), all the way back to July 23, 2008!!!

(This amazing little application was developed, I learned later, by Herbert 'Herby' Vojčík,a Slovakian computer programer. You can learn more about the app by visiting the application fan page.)

Excel

I quickly selected everything on the page (Ctrl-A), copied it to the clipboard (Ctrl-C), opened up a Word document, then thought better of it and closed the Word document and opened up an Excel spreadsheet (thinking, this is a DATABASE!!!) and pasted it in (Ctrl-V).


Mistake!!! I had selected the entire page contents, all the crap on the top and right and bottom of the page. I tried to delete it all, but couldn't figure out how to delete one small little "Submit" button


(still can't figure out where that came from on the page)

not to mention the stuff at the bottom. So I tried again. This time, I clicked at the beginning of the first status, scrolled down, and then (holding the shift key down), clicked at the end of the last status," then copied (Ctrl-C), then switched to Excel (and a new, blank, sheet), and pasted (Ctrl-V).

Voila!

I quickly saved the file, triumphant!

...

...

But as I looked at it, I wasn't satisfied. There were empty rows in between each message. I had to remove the empty rows. I figured if I sorted the sheet, the blanks would rise to the top (or sink to the bottom). So I clicked the sort button (A --> Z), and, sure enough, the blanks were down at the bottom (i.e., effectively gone). But all of the status messages were now sorted alphabetically as well....which removed the ORDER of the messages, which was crucial if this was going to be a database of any utility.

So I quickly undid the sort (Ctrl-Z) and tried to think of another solution. Surely there was a way to eliminate empty rows from your data!!?! I didn't know....but I know Excel well enough to be pretty confident that if you encounter something you've got to get done, there's a way to do it. I looked around on the various menus.

Finally, I found it! On the "Data" menu was a button that said "Remove Duplicates." I figured all those empty rows were duplicates of each other! And, well, if any of the status messages are duplicates, deleting them probably shouldn't matter (um, right?!?), so I went ahead and selected the whole sheet and hit the button.

Voila! It did leave one empty row (the second row, since it wasn't a duplicate of anything above it), which I quickly deleted manually.

I quickly saved the resulting sheet, again triumphant.

(or is that "demented" :-) )

Data Massaging

But something still wasn't right. Each status message had, at the end of it, a date and time. This wasn't a BAD thing of course (because it might be useful to know when a particular status message had been made, or to know what order they were made in), but the message and the timestamp were in the same cell. Thus, two types of data were mixed, which would make it next to impossible to sort by date or do anything else automatically that was related to the date and time.

I decided to do some data massaging and see if I could separate the text of the messages from the date and time. I certainly wasn't going to do this manually (563 rows, I discovered!!!), and, what's more, I NEVER do anything manually if I can figure out a way to get the computer to do it for me, and I knew (based on past experience with excel) that you could pretty much separate out anything in a cell (or combine cells, or do all kinds of other operations).

I noticed that the date and time in each cell were separated off from the rest of the status message by parentheses. So I knew I had a relatively easy way to distinguish "status message" from "date and time." But the length of each status message was variable. So I couldn't just cut the cells all at the same number of characters and call it a day. However, I could use Excel to COUNT how many characters the left parenthesis was from the beginning of the cell and separate the content at that number minus one (to separate at the space before the parens).

Excel formulas, especially the text function ones) seem incredibly complex and dense if you've never worked with them before, but I have pretty extensive experience going way back to VisiCalc, and have no fear of the RIGHT, LEFT, MID, LEN, FIND, etc. formulas. So I dived right in...

First, I needed to create a formula in column B that would copy only the status message, and not the date/time. Then, I would put a formula in column C that would copy just the date and time and leave the message. Simple, right?

To copy the status message, I knew I needed to use a formula beginning with =LEFT(A1, and then put a number or formula in that would indicate the correct number of characters to copy. I knew I could use the FIND function to return the number of characters before a left paren appears (FIND("(",A1) -- the left paren has to be in quotations because otherwise it will be treated as a left parens in the formula (and return an error) instead of the text I was looking for -- and subtract 1 (to go back to the space). So I put this into cell B1:

=LEFT(A1,(FIND("(",A1-1))))

THAT should work, I thought! I quickly "quick-filled" the formula into the rest of the rows, and, sure enough, my status messages (without date/time) appeared in column B. AWESOME!!!

Then, to get the date and time, I figured all I had to do was take what was left from the formula in Column B. The easiest way I could think of to do that was to use the RIGHT function, along with the number generated by the FIND function in the other formula and the LEN function to return the total length of the content in cell A1. Thus:

=RIGHT(A1,(FIND("(",A1-1)-LEN(A1)))

But that didn't work...it returned the dreaded #VALUE, which says basically that you've tried to compute something uncomputable, so I looked more carefully at what I'd done, and soon realized that the number returned by the FIND function was going to be less than the number returned by the LEN function, and you can't take the leftmost n characters where n is a negative number, so I reversed those, getting:

=LEFT(A1,(LEN(A1)-FIND("(",A1-1)))

That fixed the problem! Now, I had the date and time (in parentheses, with a blank space before it (easily stripped out later) in Column C.

Or at least that was the case in SOME of the rows. Most of them. But there were quite a few rows where Column C included a piece of the status message as well, and, on further look, Column B in those cases didn't contain the whole status message, but only the part up to where a piece was picked up in Column C. What was wrong?

I quickly realized that the rows where my formulas didn't work were ones where the status message itself included a left parenthesis. Those cells got cut at the character or space before the left parenthesis, NOT where I had intended them to be cut. Damn!!!

How was I going to write a formula that would vary depending on whether there were one or two (or more) left parentheses? I thought about it a bit and realized I could use an IF/THEN function, but I was stumped as to how I could know whether the left parens occured once or more than once in the cell.

Short Break

At this point, I was feeling a little fried, so I went to the kitchen and got myself a Diet Coke and a cutting board with some cheese, salami, and Chicago Flats (crackers, of sorts). I returned to my (home) office to find one of the cats starting to puke on my desk (which bothers me like almost anything else!), so I threw him off (so he could finish his puke on the floor), and cleaned up the mess he'd already made on the desk (and the mess he proceeded to make in four places on the floor - ugh).

Then, I washed my hands (more ugh) and returned to my computer, and started in on the lovely blue brie and smoked gouda, and took a very long swig of my Diet Coke. (Yum!)

Upon looking again at the data I was trying to "massage," I was hit by an obvious alternative to my previous (not quite good enough) solution to my problem: the date and time strings in the cells ALL had the same length! (Whoever wrote the code to extract these statuses had been smart enough not to mess with the fixed-length variable used by Facebook's database system, which thoughtfully writes "1:15 am," for example, as 01:15 and "1:15 pm" as 13:15. This is true for the months as well, so January is 01, February is 02, and so on. This is a fixed length also for the day of the week, Sun, Mon, Tue, etc. Smart computers!!!

Including the preceding space and both the left and the right parens, the length of the date and time is 32 characters. Thus, I could write two very easy formulas. In Column B, for the date and time:

=RIGHT(A1,32)

and for Column C, the message itself:

=LEFT(A1,LEN(A1)-32)

TRUE VOILA!!!! :-)

Nirvana

I was happy. I LOVE solving these sorts of puzzles (I know, it's strange). Now, all that was necessary was to convert the contents of columns B and C into values (instead of formulas...so I could do other things with them) - a simple matter of copying them into another column and using "Paste Special..." to paste only values - and to strip out the leading space and parens from the data and time column (simply using Find/Replace twice, once to replace " {" with "" (nothing) and again to replace ")" with "" (nothing). I knew I could proceed to separate the date and time (and even day of the week) into separate columns, too, but I had acheived my fundamental purpose, which was......

??

??

Well I haven't said (other than that I wanted to create a database, and I wanted to figure out a way to help Malcolm (remember Malcolm? This is a blog post about Malcolm....) get his list of daily favorite songs), but I had another motivation. I wanted the texts of all of MY status messages so I could paste them into WORDLE and make a "word cloud" of all those messages so I (and others) could see, visually, what I've been most concerned about, all these months. So, that is what I did. I simply copied the text of the messages (stripped, helpfully, of the data and time, which would have severely messed up my wordle) into the World engine, and VOILA!

Visually, this shows a few things:

  • I use the words "one" and "just" a whole lot
  • I am concerned with education, students, learning, thinking
  • I like, get, wonder, use, think, need, like, party, force, and love
  • Obama is the most important person in my life (er, um, or at least my status updates)
But, more importantly, it's just fun to have this, as well as that database of my updates going way back to July 23, 2008. Who knows, I just might keep it updated, and refer to it again, sometime, here, or in a Facebook status update.

And now, you can make a Wordle too, of YOUR status updates. Just follow these totally simple, straightforward, short, and easy directions. Or, if you decide to seek your own path to status-update-Wordle-mashup nirvana, drop me a line, or visit me on Facebook, and let me know how YOU got here.

:-)

No comments: