Wednesday, April 26, 2017

My Stolen R Google Scholar Account has been Stolen From Me!!

For my own stupid amusement in 2014 I created a Google Scholar account for the R programming language and blogged about it.  If you read that post, you'll see I mostly did it because I find it hilarious how many published papers screw up R's citation (probably including some of my own :).  

Now, to be totally clear, I've never contributed a single thing to the development of R and I'm not at all associated with the project.  I'm simply a fan who noticed it gets cited a lot and thought it could be interesting to give it a presence on Google Scholar.  So did I have any credible authority to appropriate this account?  Of course not.  Or at least I had no more right than any of the other ~7 billion humans on earth who never contributed to R.  Was it ethical of me to do this?  Let's call that a grey area.  On the one hand, it gave R some presence, but on the other hand, I controlled the account, which I'll admit is a little awkward.

So why am I writing this?  Well, as you can probably guess from the title, the account appears to have been taken away from me.  Silently taken away, I might add.  I'm not sure when, because I honestly haven't checked on it in years.  If you go to the account now, it links to CRAN as a homepage, and says that the email address associated with the account is registered to a Canadian University. Generally things seem above board, and probably this is the right, just, and equitable conclusion of my little spoof.  

I think the only remaining interest here is how did someone at that Canadian University convince Google to turn over the account?  And was it easier to do it that way than to simply email me?

So, chastened, I think I will no longer create novelty Google Scholar accounts.  Certainly not for the living (be they people or projects).  That said, if you are interested, I noticed that many notable deceased scientists, like Fisher, Pearson, Snedecor, Tukey, Lush, Haldane, or Henderson do not have accounts. That's a shame. 

Wednesday, April 6, 2016

Warning Signs



I hate gore.  I have nightmares when I see a gory movie, and as a consequence I almost never see anything even mildly messy.  However, I'm morbidly fascinated by industrial warning signs. What am I talking about?  See below.

This is a warning label for a power takeoff shaft (PTO) on a tractor.  You get the idea, this thing spins really fast and with a ton of torque and if you get to close it will turn you into a loaf of human challah.

I can't stop looking at these things when I operate heavy equipment, which I guess is the point. It's hard to be complacent about safety when you can't take your eyes off the cartoonish stickman being snuffed out on the warning label.  

Here are just a few that have given me the creeps like the PTO image above.


And a whole scary series of pinch points:

And their close relatives, crush points:

And here's one I saw just the other day, bike endo man:

(I think this is basically an admission by the city that their streets suck, it should say something like "Warning, we can't be bothered to maintain this road, bike at your own risk")

But there's one thing that freaks me out when I'm around it that has a shockingly benign warning label, compressed air.  Here's the normal label:

Seriously!  That doesn't look dangerous at all!  Kind of looks like a rolling pin.  Where is the stick guy getting blown up?

I propose a new compressed gas warning sign.  We've all seen Jaws. After almost 2 hrs of eating everything in sight, they finally get the shark to bite down on a compressed air tank and Roy Scheider shoots it, causing ka-boom.  

There you go.  A simple three panel warning sign about the dangers of compressed air. Definitely don't put it in your mouth, especially if Scheider is gunning for you, or else, ka-boom.

Thursday, July 30, 2015

Drawing Truchet Tiles

In continuing my interest in geometric art, I've gotten interested in tiles and tiling patterns.  One of the simplest patterns is called the Truchet tile.  It consists of 4 types of tiles, each with one corner filled in with a right triangle.  Below is a picture of the 4 tiles that make up the Truchet set.

I wrote a small python function that draws Truchet tiles using the matplotlib plotting library.  I've put it in my GitHub repository.  If you run the code it produces a bunch of Truchet tile patterns (below). Also you can clone the python functions and play around with them and make up your own tiles.

Here is the first pattern.  Just a 20x20 grid of random Truchet tiles.  I've added in blue gridlines so you can see the individual tiles.

Next is an arrangement the follows a particular pattern.  This is where it gets interesting.  Imagining patterns and drawing them.  If you number the four tiles above 1-4, and start at the lower left and work across by row to the upper right, this pattern is a row of 1s, then a row of 2s, then a row of 4s, and finally a row of 3s, and repeat.

A variation on the traditional Truchet tile is the two tile set with semi-circles in the corners below.

Just sitting there those two tiles look kind of lame, but a grid of these is surprisingly cool.  Below is a random 20x20 grid, and again I've marked the gridlines so you can see what's going on.
That's pretty fascinating.  It's like a maze.

One can also apply patterns to these modified Truchet tiles.  One examples are below.  I'll leave it to you to work out the pattern.

   Finally, I invented my own new Truchet tile.  It's just a simple modification of the two semi-circle pattern, where a line is drawn that connects the semi-circles.  Below is a 20x20 random grid of these 2 modified tiles.  I think they look cool.  You can imagine all kinds of other modifications that might look neat.  My guess is that organic patterns, like things that look like leaves or flowers in the corners of the tiles might look really nice.  One day I may get around to mocking that up and posting here.

Wednesday, September 3, 2014

Unsolicited Email

In any profession one of the downsides of attending conferences or publishing papers is that your email address gets out into the world and you become the target of unsolicited junk email.  Most of what I receive is trying to sell me a scientific product or get me to attend a conference or publish in a journal that no one has ever never heard. I never open the bulk of it.

Recently I've received a ton of crap.  

Today, as a sort of silver-lining, I got my all-time favorite unsolicited email, and one that I actually opened and read completely.  Check it out below. The subject line was "Get all the DNA out of urine!".  YES!!

Best part: "Urine is a veritable gold mine...".  I couldn't agree more!

Anyway, I'm in a select group (I assume) of scientists chosen to be beta-testers of Zymo Research's Extract-ALL(TM) Urine DNA Kit, so I got that going for me, which is nice.

Thursday, July 10, 2014

Duke Divinity School says it answers what science cannot

I stumbled on an article in the Duke Chronicle with the title above.  I assumed it was probably a young journalist getting a little carried away with the headline, but read it anyway.  In fact, the headline was pretty factual.  Here are some interesting nuggets:

Duke Chapel
Students in the Divinity graduate programs come from a wide variety of backgrounds, but all of them come to seek further study in the field of faith. Each come having accepted the fundamentals of their Christian faith—just as a mathematics graduate student accepts the concept of numbers, or a medical student accepts chemistry, Hays said.
[Hays is the Dean of the Div. School.]

“There is no field at Duke that doesn’t take on presuppositions,” Myers said. “I don’t think the argument should be about the crazy claims that the Christian Church makes because we all have crazy presuppositions."

[Myers is as grad student at the Div. School]

“Science seeks to describe empirical phenomena in a material world,” Hays said. “It describes how things work. Science cannot answer questions about why it exists or for what purposes or how it came to be. Those are the questions that theology tries to address.”

[Hays again]

The study of what it means to be human is at the heart of humanities studies, and that is where religion plays a role, said Carnes, who wants to become a theology professor. 
“Humanities in general have something to do with what it means to be a human in a way that math and science can’t fully address,” she said.
[Carnes is another grad student at the Div. School]
I am a practicing scientist.  I do not have a very sophisticated understanding in philosophy, and I'll take it as given that there is some lack of "proof" for the theory of numbers or atoms, as Hays claims. However, I absolutely reject the reasoning that somehow a belief in 2+2=4 is essentially the same as a belief in the Christian god (or any other god). That's nuts! Chemistry and Christianity are just not on equal footing in terms of supporting evidence, no matter how intractable the ultimate proof of atom theory may be. 
These little philosophical slights-of-hand, like the one used by Hays, create a superficial notion of an equivalence of the factuality and veracity of science and religious belief. This kind of thinking just blows my mind. I also find this very disheartening because I have a hunch that these tricks are one of the more pernicious roots of the rejection of science. It's much easier to cast science aside if you pose it as a belief system or as equivalent to a religious belief system.  
Second, what does it mean to study why things exist or for what purpose [Why do carrots exists, and for what purpose are supernovae?], or what it means to be human?  I suppose in some sense I've been studying what it means to be human every day for the last 34 years. I'm sad to report I've had no breakthroughs :). Honestly, I'm not even sure I'd recognize a breakthrough if it occurred.  

Friday, June 13, 2014

Programming Geometric Art

A few weeks ago I was listening the the brilliant podcast 99% Invisible. The particular episode was all about quatrefoils, which are cloverleaf shaped geometric designs. In the episode they point out that quatrefoils feature prominently in high class objects, like gothic cathedrals and in luxury products (e.g. Louis Vuitton bags). The episode really stuck with me. It was so interesting. These simple shapes can convey so much meaning, much of it subconsciously (and now for you more consciously, because you too are going to start seeing quatrefoils everywhere, especially for my friends at Duke!)

I have always been fascinated with kaleidoscopes and geometric designs. I love Arabic art and gothic patterns. There something soothing about geometric designs, they have order and organization but at the same time an organic familiarity because similar processes arise in nature.  

This got me thinking. At a base level geometric art is just an algorithm. Draw a line here, move 10 degrees over, draw another line, repeat. I know a little about algorithms and I started to wonder if I could make anything pretty in R (R being an ironic choice because 99% of what I do in R is decidedly not pretty). At first I set out to make quatrefoils, but I never did figure it out (and if you do, paste your code in the comments section below).  Then I reset my sights on just making something interesting. 

Below are the first 5 pleasing things I've managed to produce along with the R code that draws them.  The first 4 were designs I made intentionally. It's a fun exercise. Find
or imagine something you want to draw, then think about the algorithm that would make it, and finally translate that into code. It's one of those tasks that uses a lot of your brain.

Of the images below, I'm most proud of the one called "angles", because I got the process from idea to algorithm to code on the first try! All the others took some trial and error, with interesting squiggles and blobs emerging along the way.       

These pictures raise a lot of questions which are beyond my grasp of geometry and trig.  For example, in the mystic rose below why do all those internal circles arise? What is their radius? Why are there voids? There must be some way to derive answers to these things mathematically, and I'm sure if I meditated on it long enough I could probably figure it out. Unfortunately, for now I'm a little too busy to remedy my own ignorance.

Also, "mystic rose" isn't my name.  I did some googling and that seems to be the standard name for that object.  I'm a geneticist.  I'd have called it the "full diallel". :) (And a hellacious one at that. 625 crosses. You'd need to be pretty ambitious or foolish.)   

Finally, if you decide to run the R code below, it is best in regular old vanilla R, because it draws one line at a time and you can see the pattern emerge (and if you have an old slow computer like mine it will emerge at a nice human pace). When I plotted these in RStudio the image wouldn't print to the screen until it was done, which kind of kills the effect (I recommend running some of the code, it's mesmerizing).   

##mystic rose
rad = function(x) (x*pi)/180
plot(c(0,0), c(0,0), ty='n', xlim=c(-1,1), ylim=c(-1, 1))

deg.rot = 15
for (i in seq(0,360, deg.rot)) {
         for (j in seq(0, 360, deg.rot)) {
               lines(c(cos(rad(i)), cos(rad(j))), c(sin(rad(i)), sin(rad(j))), lwd=0.5)
mystic rose 
v = seq(0,360,45)
rad = function(x) (x*pi)/180
plot(0,0, ty='n', xlim=c(-20,20), ylim=c(-20,20))
for (i in v) {
    for (j in seq(1, 10, 1)){
        symbols(cos(rad(i))*j,sin(rad(i))*10, circles=1, add=T) 


rad = function(x) (x*pi)/180
v = seq(45,360,45)
steps = seq(0,1,.025)
plot(c(0,0), c(0,0), ty='n', xlim=c(-1,1), ylim=c(-1, 1))
for (i in v) {
         next_angle = i + 45
         for (j in steps) {
         lines(c(cos(rad(i))*j, cos(rad(next_angle))*(1-j)), c(sin(rad(i))*j, sin(rad(next_angle))*(1-j)), lwd=0.5)

v = seq(2, 1502, 15)**.75
plot(0,0, xlim=c(0,1), ylim=c(-20,20), ty='n')
for (i in v) {
    lines(seq(0,1,0.01), dbeta(seq(0,1,.01), shape1=i, shape2=i), lwd=.1)
    lines(seq(0,1,0.01), -dbeta(seq(0,1,.01), shape1=i, shape2=i), lwd=.1)
betas also tells a story. it's drawn using a beta distribution. it shows the uncertainty around a proportion estimated to be at 50% frequency as a function of an ever increasing sample size. it moves really fast at first, but quickly hits diminishing returns.  at the end, adding hundreds of samples does very little to increase statistical power.
##trig functions
v = seq(0,pi*.25, pi/10)
v2 = seq(0,pi*0.5, pi/200)
plot(c(0,0), c(0,0), ty='n', xlim=c(-120,120), ylim=c(-150, 150))
for(i in v) {
            for (cons1 in seq(0,100,10) ){
                        for (cons2 in seq(0,100,5)) {
    xf = function (i) cons1*cos(i)+cos(3*i)+cos(2*i)+cos(4*i)
    yf = function(i) cons2*sin(i)+cons1*sin(3*i)
    lines(xf(v2), yf(v2), lwd=.05)
    lines(-xf(v2), -yf(v2),  lwd=.05)
trig functions

Tuesday, April 22, 2014

Hacking Google Scholar (a little bit)

This post is about 2 things I really like, the R statistical programming language and Google Scholar, and my (tiny) effort to bring them together.  I've written quite a lot about R.  I use it all the time at my job, and have frequently used it in my published work.  I'll admit, I don't usually cite R when I use it. In fact, I don't even acknowledge that I've used it.  Usually I just report some statistical stuff and move on, never clarifying the actual software used to get the result (or whether I did it with a slide rule and pencil, which reminds me of the ad image below that isn't at all related to this post, but is probably the best thing I've seen in weeks.) 

However, you can cite R in your publications (and maybe I should).  In fact R has a handy function called citation that's only purpose is to inform you how to cite R.  Here's what it says. 

To cite R in publications use:

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 

IBM ad from the 1950s
there are so many things I love about this ad: 
-white shirt, slick hair, and a tie - no exceptions
-no women (of course, these are engineers!)
-the phrase "routine repetitive figuring"
-the little atomic motif next to IBMs logo
-the earnest look on everyone's face
(or maybe it's concern, they're all
about to get laid off I think)
-the enormity of the computer
also, wouldn't it be nice if we still 
talked about computer power in 
engineer equivalents. not unlike 
horsepower.  imagine saying a
 2 giga-engineer processor. more 
tangible than a petaflop, whatever 
that is. 
That's it.  In older versions of R it had been R Development Core Team, but they dropped the Development part.

What's interesting is that inevitably when people add this to their favorite citation manager like Endnote it gets mangled into the name R.C. Team (or for older citations R.D.C. Team).  

I really love this. I see it all the time in papers.  So for giggles I made a Google Scholar account for R.D.C. Team.  R.D.C. is doing pretty well with almost 14,000 citations as of 4/14.  

Actually there is some interesting stuff that can be gleaned from this novelty Google Scholar account. First, notice the amazing growth of citations for R. 2013 saw almost as many citations as all the past years combined! This fits with my own experience. I learned R in 2004 or 2005, and back then it was unusual to meet R users, especially outside of stats departments or those folks (like me) who worked on microarrays.  Now it is practically a standard. Where I work we commonly ask interviewees if they use R, mainly because the primary alternative, SAS, costs a fortune.  

Also, the bulk of citations are to R.D.C. and not the more recent R.C. Team. I'm not exactly sure when the D was dropped, but I think it was in the last couple of years.  Because so many citations come from 2013, this suggests that people are either using an old version of R and finding the old citation there, or more likely, just copying an out-of-date citation from an existing paper. 

Another interesting thing are the myriad ways in which people screw up the citation. Here's an example of the old standby, RDC Team.  Here it's mangled to R Development CT.  And if you're not into that whole brevity thing, you can spell out RDC's full name; here's one that mangled it to Team R Development Core.  

This leads me to my final thoughts. I don't feel bad for not citing R. I do cite specific R packages, but if I just compute a correlation or regression in R I'm not going to cite it.  In these cases I use R out of convenience, not out of necessity. It feels no different than citing Microsoft Excel for helping me organize my data, which seems ridiculous. That said, if you are going to the trouble of citing R, do it correctly!! Here's a nice page describing how.