Feature Engineering
- 01:48
How to engineer a new feature called "tier change".
Downloads
No associated resources to download.
Glossary
Feature Engineering Machine LearningTranscript
If you take a look at your investor data, it also contains a feature called prior tier that indicates each investor's previous ranking for that client. Think about the real world meaning of changes between prior tier and invite tier. If the prior tier was participant and the invite tier is bookrunner, that means that the investor is being promoted in rank and will likely receive a greater share of investment banking fees in the future. Or if prior tier was bookrunner and the invite tier is participant, that means the investor is being demoted in rank and will likely receive a smaller share of investment banking fees in the future. If you had to guess, how do you think a promotion or a demotion might change an investor's willingness to commit to a transaction. By comparing prior tier to invite tier to identify changes, we can engineer a new feature to capture this relationship. We'll call it tier change. NumPy has a useful function for conditional logic, the where function, which can be used to create a new series based on relationships between existing series. We'll use the format that you see here first, the name of the existing dataframe, and then the new series name for the series that we want to create. We'll set that equal to the where function, and in the first argument of that where function, we'll write the condition to meet. In the second argument, we'll write the value to return if that condition is met. And in the third argument, we'll write the value to return if that condition is not met. Just like elif statements within an if statement, you can nest multiple wear functions inside each other for more complex conditional logic. The syntax is similar to using the if function in Excel.