Image for post
Image for post

Nowadays on the Internet there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This way, the machine learning model for automated news classification could be used to identify topics of untracked news and/or make individual suggestions based on the user’s prior interests. Thus, our aim is to build models that take as input news headline and short description and output news category.

News articles often fall under multiple topic labels. An article about a transfer in ownership of the football club, for instance, might be labeled under Sports, Business, and World News. Humans can recognize and correctly provide multiple relevant labels for an article, but can a machine learning system get similar results? …


Flight delays have become an important subject and problem for air transportation systems all over the world. The aviation industry is continuing to suffer from economic losses associated with flight delays all the time. According to data from the Bureau of Transportation Statistics (BTS) of the United States, more than 20% of U.S. flights were delayed in 2018. These flight delays have a severe economic impact in the U.S. that is equivalent to 40.7 billion dollars per year. Passengers suffer a loss of time, missed business opportunities or leisure activities, and airlines attempting to make up for delays leads to extra fuel consumption and a larger adverse environmental impact. …


Image for post
Image for post

“The CEO of Motion Picture Association of America (MPAA) J. Valenti mentioned that ‘No one can tell you how a movie is going to do in the marketplace. Not until the film opens in darkened theater and sparks fly up between the screen and the audience’”

The modern film industry, a business of nearly 10 billion dollars per year, is a cutthroat business competition.

Each year in the United States, hundreds of films are released to domestic audiences in the hope that they will become the next “blockbuster.” Predicting how well a movie will perform at the box office is hard because there are so many factors involved in success.

The goal of this project is to develop a computational model for predicting the revenues based on public data for movies extracted from Boxofficemojo.com online movie database.

The first phase is web scraping. Different types of features are extracted from Boxofficemojo.com which will be described later. Second phase is data cleaning. After scrapping data from our source, we cleaned our data mainly depend on unavailability of some features. After cleaning all data, next phase is exploratory data analysis. In third phase we create graphics to understand data. Fourth phase is feature engineering, where you create features for machine learning model from raw text data. Fifth phase is model analysis, where I applied one of the machine learning algorithms on our data set. …

About

Cigdem Tuncer

Database Developer|SQL Developer|Data Science Candidate

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store