Data cleaning is an essential step to prepare your data for the analysis. While cleaning the data, every now and then, there’s a need to create a new column in the Pandas dataframe. It’s usually conditioned on a function which manipulates an existing column. A strategic way to achieve that is by using Apply function. I want to address a couple of bottlenecks here:
Pandas: The Pandas library runs on a single thread and it doesn’t parallelize the task. Thus, if you are doing lots of computation or data manipulation on your Pandas dataframe, it can be pretty slow and can quickly become a bottleneck.
Apply(): The Pandas apply() function is slow! It does not take the advantage of vectorization and it acts as just another loop. It returns a new Series or dataframe object, which carries significant overhead.
So now, you may ask, what to do and what to use? I am going to share 4 techniques that are alternative to Apply function and are going to improve the performance of operation in Pandas dataframe.
Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example – ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.
Since quite sometime now, I have developed a new love for Salads. No, not the mundane ones like the one below
These are quite boring, and make you feel like you are forcing/punishing yourself on a so-called-healthy diet. In fact, I never felt full eating just these and thus, I mostly ate Indian vegetarian in office. Although, I craved for vegetarian/vegan food in other cuisines but most of my experiments led to horrible (got chicken instead of tofu at a famous Thai place) or tasteless disasters. However, restaurants in Central Business District (popularly knows as CBD) area in Singapore gave me a whole new penchant for salads. The best part about salads – you get Asian, European, Mexican, and probably 100 more varieties. Super filling, super tasty and you can get them vegetarian or vegan, as you like! I am going to share photos of some of the best salads I have had, and an easy recipe for you to try at home.
Kids of 8-10 years of age are incredibly smart who are treading high on the curve of curiosity and learning. Thus, it’s equally challenging to teach such kids. Did I just write challenging? Did I not mention that I feel a strange pull for anything challenging? Jokes apart, in June I came across an opportunity to teach Python/Scratch to kids in Singapore. The program briefed a 10 week Code in the Community program run by Saturday Kids in collaboration with Google. This post is an account of my experience and learnings throughout these 10 weeks with Saturday Kids.
Time series, a series of data points ordered in time. Pretty intuitive, isn’t it? Time series analysis helps in businesses in analyzing past data, predict trends, seasonality, and numerous other use cases. Some examples of time series analysis in our day to day lives include:
Measuring number of taxi rides
In this blog, we will be dealing with stock market data and will be using Python 3, Pandas and Matplotlib.
Close your eyes and imagine that you live in a utopian world of perfect data. What do you see? What do you wish to see? Wait! are you imagining a flawless balanced dataset? A collection of data whose labels form a magnificent 1:1 ratio: 50% of this, 50% of that; not a bit to the left, nor a bit to the right. Just perfectly balanced, as all things should be. Now open your eyes, and come back to the real world.
Well, this blog is all about how to handle imbalanced datasets.
In my previous blog, I discussed how I landed up interning at Dentsu. I also discussed that I worked on scouting and building a POC for a cloud agnostic, open source API management tool/platform which could help in setting up API design, gateway, store, and analytics. In this blog, I will be jotting down my work in much more detail.
We will be exploring four API Management platforms, namely:
My experience of hunting for and landing an internship in Singapore.
सुखदु:खे समे कृत्वा लाभालाभौ जयाजयौ |
ततो युद्धाय युज्यस्व नैवं पापमवाप्स्यसि ||
Chapter 2 Verse 38, Bhagavad Gita
Shree Krishna says Fight for the sake of duty, treating alike happiness and distress, loss and gain, victory and defeat. Fulfilling your duty and responsibility in this way, you will never incur sin.
Arjuna’s was apprehensive that by killing his enemies, he would incur sin. Shree Krishna addresses his apprehension and he advises him to do his duty (dharma), without attachment to the fruits of his action. Such an attitude will release him from any sinful reactions.
This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require Multi Class Classification and quite a bit of data cleaning and preprocessing. We will discuss each step in detail below. So, let’s dive in.
This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions in the Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.