Understand the core principles through a hand-on experience

As a pythonist-datascientist, you are undoubtedly familiar to data analysis with , a python package for dealing with table-like data type, similar to the canonical tools like , , or , in .

To carry out an analysis in , there are more than one way to go. Even for experienced data scientists and python developers, a chosen solution based on conventional methods sometime might not be the optimal one or even computing-wise inefficient.

This article aims to walk you through a simple usecase, and quantitatively compare different programming approaches. …


Understand and make use of Numba — a doped version of Numpy

I’ve recently come cross , an open source just-in-time (JIT) compiler for python that can translate a subset of python and functions into optimized machine code. In principle, JIT with low-level-vertual-machine (LLVM) compiling would make a python code faster, as shown on the numba official website. When I tried with my example, it seemed at first not that obvious.

Numba version of a function is way longer than the Numpy one, Why?

As shown, I got run time 600 times longer than with ! I might do something wrong?

Before going to a detailed diagnosis, let’s step back and go through some core concepts to better understand how work…


Why and how to effectively deal with SettingWithCopyWarning

I crossed by this apparently harmless and annoying warning message countless times. As many, I chose an easy way to ignore or just hide the message with unease. One day, someone with more curiousity and rigorousness came to ask me about the same warning but with even more mysterious symptom.

SettingWithCopyWarning for apparently no reason!

His assignment operation is just as normal as any other common one for a data frame, so how on earth this warning pop up? More curious, I could not reproduce his observation when recreated his data.


Python for datascientist: How to pickup from yesterday.

Many pythonist-datascientist fellows, if happened to cross by on the way, might miss a handy feature there for saving workspace. You did some analysis, had an bunch of intermediate results that you want to temporally backup to reuse later on without re-running the whole analyisis from scratch. In , it is as simple as . How will we do it in ?

What are there on the tables?

What do we have in our normal python workspace? Let’s do a quick check with .

An Truong

Senior datascientist with passion for codes. Follow me for more practical tips of datascience in the industry.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store