Recently, there has been a lot of attention for the imposter syndrome. Even seasoned programmers admit they suffer from feelings of anxiety and low self-esteem. Some share their personal stories, which can be comforting for those suffering in silence. I focus on a method that helped me grow confidence in recent years. It is a simple, yet very effective way to deal with being overwhelmed by the many things a data scientist can acquaint him or herself with.
Two Faces of the Imposter Demon
I think imposterism can be broken into two, related, entities. The first is feeling you are falling short on a personal level. That is, you think you are not intelligent enough, you think you don’t have perseverance, or any other way to describe you are not up to the job. Most advice for overcoming imposterism focuses on this part. I do not. Rather, I focus on the second foe, the feeling that you don’t know enough. This can be very much related to the feeling of failing on a personal level, you might feel you don’t know enough because you are too slow a learner. However, I think it is helpful to approach it as objective as possible. The feeling of not knowing enough can be combated more actively that way. Not by learning as much you can, but by considering not knowing a choice, rather than an imperfection.
You can’t have it all
The field of data science is incredibly broad. Comprising, among many others, getting data out of computer systems, preparing data in databases, principles of distributed computing, building and interpreting statistical models, data visualization, building machine learning pipelines, text analysis, translating business problems into data problems and communicating results to stakeholders. To make matters worse, for each and every topic there are several, if not dozens, databases, languages, packages and tools. This means, by definition, no one is going to have mastery of everything the field comprises. And thus there are things you do not and never will know.
Learning new stuff
To stay effective you have to keep up with developments within the field. New packages will aid your data preparations, new tools might process data in a faster way and new machine learning models might give superior results. Just to name a few. I think a great deal of impostering comes from feeling you can’t keep up. There is a constant list in the back of your head with cool new stuff you still have to try out. This is where meta-learning comes into play, actively deciding what you will and will not learn. For my peace of mind it is crucial to decide the things I am not going to do. I keep a log (Google Sheets document) that has two simple tabs. The first a collector of stuff I come across in blogs and on twitter. These are things that do look interesting, but that need a more thorough look. I also add things that I come across in the daily job, such as a certain part of SQL I don’t fully grasp yet. Once in a while I empty the collector, trying to pick up the small stuff right away and moving the larger things either to second tab or to will-not-do. The second tab holds the larger things I am actually going to learn. With time at hand at work or at home I work on learning the things on the second tab. More about this later.
Define Yourself
So you cannot have it all, you have to choose. What can be of good help when choosing is to have a definition of your unique data science profile. Here is mine:
I have thorough knowledge of statistical models and know how to apply them. I am a good R programmer, both in interactive analysis and in software development. I know enough about databases to work effectively with them, if necessary I can do the full data preparation in SQL. I know enough math to understand new models and read text books, but I can’t derive and proof new stuff on my own. I have a good understanding of the principles of machine learning and can apply most of the algorithms in practice. My oral and written communication are quite good, which helps me in translating back and forth between data and business problems.
That’s it, focused on what I do well and where I am effective. Some things that are not in there; building a full data pipeline on an Hadoop cluster, telling data stories with d3.js, creating custom algorithms for a business, optimizing a database, effective use of python, and many more. If someone comes to me with one of these task, it is just “Sorry, I am not your guy”.
I used to feel that I had to know everything. For instance, I started to learn python because I thought a good data scientist should know it as well as R. Eventually, I realized I will never be good at python, because I will always use R as my bread-and-butter. I know enough python to cooperate in a project where it is used, but that’s it and that it will remain. Rather, I spend time and effort now in improving what I already do well. This is not because I think because R is superior to python. I just happen to know R and I am content with knowing R very well at the cost of not having access to all the wonderful work done in python. I will never learn d3.js, because I don’t know JavaScript and it will take me ages to learn. Rather, I might focus on learning Stan which is much more fitting to my profile. I think it is both effective and alleviating stress to go deep on the things you are good at and deliberately choose things you will not learn.
The meta-learning
I told you about the collector, now a few more words about the meta-learning tab. It has three simple columns. what it is I am going to learn and how I am going to do that are the first two obvious categories. The most important, however, is why I am going to learn it. For me there are only two valid reasons. Either I am very interested in the topic and I envision enjoying doing it, or it will allow me to do my current job more effectively. I stressed current there because scrolling the requirements of job openings is about the perfect way to feed your imposter monster. Focus on what you are doing now and have faith you will pick-up new skills if a future job demands it.
Meta-learning gives me focus, relaxation and efficiency. At its core it is defining yourself as a data scientist and deliberately choose what you are and, more importantly, what you are not going to learn. I experienced, that doing this with rigor actively fights imposterism. Now, what works for me might not work for you. Maybe a different system fits you better. However, I think everybody benefits from defining the data scientist he/she is and actively choose what not to learn.
comments powered by Disqus