As a pythonist-datascientist, you are undoubtedly familiar to data analysis with
pandas, a python package for dealing with table-like data type, similar to the canonical tools like
To carry out an analysis in
pandas, there are more than one way to go. Even for experienced data scientists and python developers, a chosen solution based on conventional methods sometime might not be the optimal one or even computing-wise inefficient.
This article aims to walk you through a simple usecase, and quantitatively compare different programming approaches. …
I’ve recently come cross
Numba , an open source just-in-time (JIT) compiler for python that can translate a subset of python and
Numpy functions into optimized machine code. In principle, JIT with low-level-vertual-machine (LLVM) compiling would make a python code faster, as shown on the numba official website. When I tried with my example, it seemed at first not that obvious.
As shown, I got
Numbarun time 600 times longer than with
Numpy! I might do something wrong?
Before going to a detailed diagnosis, let’s step back and go through some core concepts to better understand how
I crossed by this apparently harmless and annoying warning message
SettingWithCopyWarning countless times. As many, I chose an easy way to ignore or just hide the message with unease. One day, someone with more curiousity and rigorousness came to ask me about the same warning but with even more mysterious symptom.
His assignment operation is just as normal as any other common one for a data frame, so how on earth this warning pop up? More curious, I could not reproduce his observation when recreated his data.
Python for datascientist: How to pickup from yesterday.
Many pythonist-datascientist fellows, if happened to cross by
Matlab on the way, might miss a handy feature there for saving workspace. You did some analysis, had an bunch of intermediate results that you want to temporally backup to reuse later on without re-running the whole analyisis from scratch. In
Matlab , it is as simple as
save(bk_filename) . How will we do it in
What do we have in our normal python workspace? Let’s do a quick check with