Understand the core principles through a hand-on experience

As a pythonist-datascientist, you are undoubtedly familiar to data analysis with pandas, a python package for dealing with table-like data type, similar to the canonical tools like Excel , SAS, SQL or data.table, dplyr in R.

To carry out an analysis in pandas, there are more than one way to go. Even for experienced data scientists and python developers, a chosen solution based on conventional methods sometime might not be the optimal one or even computing-wise inefficient.

This article aims to walk you through a simple usecase, and quantitatively compare different programming approaches. …

Understand and make use of Numba — a doped version of Numpy

I’ve recently come cross Numba , an open source just-in-time (JIT) compiler for python that can translate a subset of python and Numpy functions into optimized machine code. In principle, JIT with low-level-vertual-machine (LLVM) compiling would make a python code faster, as shown on the numba official website. When I tried with my example, it seemed at first not that obvious.

Numba version of a function is way longer than the Numpy one, Why?

As shown, I got Numba run time 600 times longer than with Numpy! I might do something wrong?

Before going to a detailed diagnosis, let’s step back and go through some core concepts to better understand how Numba work…

Why and how to effectively deal with SettingWithCopyWarning

I crossed by this apparently harmless and annoying warning message SettingWithCopyWarning countless times. As many, I chose an easy way to ignore or just hide the message with unease. One day, someone with more curiousity and rigorousness came to ask me about the same warning but with even more mysterious symptom.

SettingWithCopyWarning for apparently no reason!

His assignment operation is just as normal as any other common one for a data frame, so how on earth this warning pop up? More curious, I could not reproduce his observation when recreated his data.

Python for datascientist: How to pickup from yesterday.

Many pythonist-datascientist fellows, if happened to cross by Matlab on the way, might miss a handy feature there for saving workspace. You did some analysis, had an bunch of intermediate results that you want to temporally backup to reuse later on without re-running the whole analyisis from scratch. In Matlab , it is as simple as save(bk_filename) . How will we do it in Python?

What are there on the tables?

What do we have in our normal python workspace? Let’s do a quick check with dir() .

An Truong

Senior datascientist with passion for codes. Follow me for more practical tips of datascience in the industry.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store