Recently my boss asked me to investigate areas where further improvements can increase work efficiency. I wrote down a list of potential time-saving improvements and this problem was one of the items at the top of my list.
One of the services we offer is product re-labelling. Customers send a reference table or list of current SKU and new SKU of each product(s) and pdf files of the new SKU labels. We would then print out the reference table and new product labels for the operatives to begin the relabelling process.
Sometimes, customers could have products in hundreds of variants or models, all with unique SKUs. The shipment arrives in hundreds of boxes where Amazon would repackage them in mixed SKU (i.e., a single box containing multiple SKUs). To combat this, we have built a template in Excel to generate Amazon compatible SKU labels and print them on demand. Combined with barcode scanners, it saves us a lot of time in many SKUs situation.
However, the current process of manual data entry into our re-labelling excel template is very time consuming:
Customers gets their SKU generated from Amazon seller central, in PDFs
We receive the PDF files, copy and paste the text content into a word processor (e.g. Notepad ++)
We use the find function to populate product description data in the reference table (Current vs New SKU)
Imagine if the customer is a successful fashion apparel seller on Amazon and has over a thousand SKUs of clothes and shoes in different sizes and colours. Yeah – you get the picture. Manual data entry is not only time-consuming, but prone to human error and could cost the business in the long run.
So, instead of potentially spending a whole day on just copy and pasting data, I've decided to future proof this and turn O(n) to O(1).
Let’s examine what my customer sent me:
45 PDF files, each with a single page and a single Amazon label, consisting of barcode, product description and product condition. I merged them in a single file using Smallpdf’s online merge PDF tool (No, not sponsored!).
Now our labels look much more organized.
Then, I wrangled with some data in python. The pdf file is first loaded with pymupdf and then into a pandas data frame. Next, data cleaning and processing is performed. Finally, I used xlsxwriter to export it to an excel file in the shape and format I wanted.
Full code available on github (in Jupyter Notebook format).
The last step is to copy over the SKU reference table given by our customer. The file consists of two columns: Original SKU(label barcode) and New SKU.
Wrote a VLOOKUP function, all done!