Overview: Data Prep, Cleansing, and Mapping

Identify sources.

Question: Where do I get the data from?

Depending on your spend analysis goals, the data you need will likely reside in more than one system, some within and some outside of the organization. The main source for your data is your accounting system which usually contains data relating to:
  • purchase orders (POs)
  • expensed payables, such as employee business expenses
  • invoices and payment transactions
  • general Ledger (GL) account numbers
  • master files such as the Vendor Master
  • Depending on the organization and its systems, you may need to source p-card transactions from the issuing bank and supplier contract data from other internal files.

    The Spendata Data Extraction Guide contains a description of typical procurement data available to companies and provides guidance on constructing a data request document that you can pass on to the individuals or departments concerned.

    Cleanse data.

    The Spendata data loading process walks you through loading the data, and through a review process to assure basic data quality. During this process the application calculates control totals, identifies issues with the data, and ask questions to resolve common ambiguities such as:
  • missing data
  • inconsistent formats
  • Family data.

    Question: Who am I buying from?

    Knowing what you spend with a single supplier is important for many reasons and is critical for negotiating, so the first task is to consolidate all the spend data for each key supplier. Since your data file contains purchasing data and vendor data, consolidation should be a simple process of rolling up the spending by vendor name, however, it rarely is. The problem analysts come across is that the spend is scattered among many different variants of a supplier name so any roll-up of spending is inaccurate, to say the least. Removing variations in vendor names and grouping together spend under a single name is a process we call "familying."

    A typical example of the same vendor being listed twice is highlighted in the table below. Imagine the number of different ways this vendor's name has appeared!

    Supplier TID Website Pay Terms Contact Address Phone Credit Limit Email
    Dunder Mifflin Corp 234661 www.dundermifflin.com 45 days S.Hudson Scranton,PA 1-800-555-1298 25000.00 shudson@dundermifflin.com
    Stark Industries 7826473 www.starkindustries.com 45 days Tom Stark New Rochelle, NY 1-888-782-7546 25000.00 tom.stark@starkindustries.com
    Wayne Shipping 4731121 www.wayneenterprises.com 15 days B. Wayne Gotham City, NY 1-888-782-7546 75000.00 rwayne@wayneenterprises.com
    Int'l Business Machines
    825473 www.ibm.com/ibm/us/en 30 days Arthur Foxe New Rochelle, NY 1-888-782-7546 125000.00 afoxe@us.ibm.com
    IBM 825490 www.ibm.com/ibm/us/en 30 days Jeff Heath Armonk, NY 1-888-782-7546 125000.00 jheath@us.ibm.com

    When you family vendors, you create a rule that says: whenever these supplier names appear in the vendor master file, family them under "IBM".

  • Int'l Business Machines
  • I.B.M.
  • International Business Machines
  • IBM
  • IBM, Inc.
  • Map Data.

    Question: What exactly did we buy and how do I classify it?

    Currently, all of the transactions you are interested in are assigned to general ledger account numbers. GL accounts tell us something about spend, but they relate more directly to the balance sheet and income statement rather than spending categories. For example, "cleaning services" would be charged against "facilities" in the general ledger. Though "facilities" may give us a clue as to what the expense was for, it doesn't tell us exactly what was purchased. So, the next step in your data preparation is to re-classify these transactions. This process is called "mapping" because you create rules that link or "map" spend transactions to descriptive spend categories represented by a "commodity structure." Commodity structures are hierarchical trees with each commodity having only one "parent."

    Mapping is the process of creating rules that link spending to your commodity structure. So, in a sense, mapping rules are a type of Rosetta Stone for your spend analysis effort. Their job is to translate accounting data into procurement terms, giving each transaction a home within the commodity structure.

    An example of a mapping rule might be:

    If Vendor = "IBM", and General Ledger code = "Temporary Labor", then set Commodity to "Operations/IT/Contract Programming".

    This one mapping rule might apply to hundreds, or even thousands, of transactions in your dataset depending on its size. And, once this rule is defined, all future transactions that match these criteria will automatically be mapped to the correct location within the commodity structure. In fact, by applying Spendata's best practices for mapping, you will never have to map large numbers of individual transactions. You will be able to map the vast majority of your spend quickly using a relatively small number of rules.