Crawling the Windows Phone Marketplace

I have been asked by a few people how sites like WP7AppList get their data.  The Windows Phone Marketplace, which you access on your PC via Zune software, uses XML to get data over the wire.  I wanted to share a couple of code snippets which might help an erstwhile data junkie on their way.  This code works.  It may not be the most elegant solution, but it works, and I wanted to share it with others in case they wanted to see how to parse the XML, or how to write LINQ queries against it.

Caveat – this is a geek enthusiast post.  I used Fiddler to figure out how to parse the XML.  This was something I did over Christmas break to give me a project about which I could be excited, and learn some more about parsing XML with LINQ.  I also wanted to do some large database stuff, and this crawler throws off a ton of data.  I did not use an proprietary knowledge about how our backend systems are working.  This is all done against the public XML feeds.

First up, we are going to need to create some data structures to catch all of the inbound data.  You can use anonymous types with LINQ, but I liked having a measure of control, and having the ability to handle null values and potential errors in the feed.

public class ZestAppData
{
    public string Title { get; set; }
    public string Id { get; set; }
    public DateTime ReleaseDate { get; set; }
    public DateTime Updated { get; set; }
    public string Version { get; set; }
    public string ShortDescription { get; set; }
    public decimal AverageUserRating { get; set; }
    public int UserRatingCount { get; set; }
    public string ImageId { get; set; }
    
    public IList<ZestCategory> Categories = new List<ZestCategory>();
    public IList<ZestPublisher> Publisher = new List<ZestPublisher>();
    public IList<ZestOffer> Offers = new List<ZestOffer>();
}

public class ZestCategory
{
    public string Id { get; set; }
    public string IsRoot { get; set; }
    public string Title { get; set; }
}

public class ZestOffer
{
    public string OfferId { get; set; }
    public string MediaInstanceId { get; set; }
    public decimal Price { get; set; }
    public string PriceCurrencyCode { get; set; }
    public string LicenseRight { get; set; }
    public List<string> PaymentType = new List<string>();
}

public class ZestPublisher
{
    public string Id { get; set; }
    public string Name { get; set; }
}

 

You are also going to want to have a bunch of variables defined for the URLs where the XML is coming from, the XML namespaces, etc:

const string BaseAppsUrl = "http://catalog.zune.net";
const string BaseImageUrl = "http://image.catalog.zune.net";

const string ZestVersion = "/v3.2/";
const string ZestImageVersion = "/v3.0/";

const string BaseApps = "apps/";
const string BaseImage = "image/";

const string BaseAppsResource = "?clientType=WinMobile%207.0&store=Zest&orderby=downloadRank";
const string BaseCommentsResource = "/reviews/?store=Zest&chunkSize=10";
const string BaseImageResource = "?width=240&height=240";

ZestCrawlEntities ZestCrawlContext;

XNamespace ns = "http://www.w3.org/2005/Atom";
XNamespace zestns = "http://schemas.zune.net/catalog/apps/2008/02";

public string LangCode = "en-us"; //setting the default value

public List<string> ValidLangCodes = new List<string>(
    new string[] {  "en-us", "en-gb", "de-de",
                    "fr-fr", "es-es", "it-it",
                    "en-au", "de-at", "fr-be",
                    "fr-ca", "en-ca", "en-hk",
                    "en-in", "en-ie", "es-mx",
                    "en-nz", "en-sg", "de-ch",
                    "fr-ch" });

public string AppAfterMarkerUrl { get; set; }
public bool HasMoreApps = true;
public string AppsResponseString { get; set; }
public XElement ReturnedAppsXml;

 

Have a look at the ValidLangCodes list.  that’s the coding we have on the URLs for country specific data.  So if you want to get the data from Mexico, us “es-mx.”  The first two letters are the language code, and the second two are the country code.  If an app is listed in the feed, it is active.  The list returned is ordered, meaning the first app is ranked #1.  I am pulling the ALL APPs list, which is the orderby clause on the BaseAppsResource.

The ZextCrawlContext is the ADO.NET DB model.  Create your own and stuff the data however you want.

Now that we have the code segments, you are going to need a way to get the XML from MSFT servers.

public void GetAppsResponse()
{
    string FullUrl;
    bool done = false;

    if (!String.IsNullOrEmpty(AppAfterMarkerUrl))
    {
        FullUrl = AppAfterMarkerUrl;
    }
    else
    {
        FullUrl = BaseAppsUrl + ZestVersion + LangCode + "/"
            + BaseApps + BaseAppsResource;
    }
            
    while (!done)
    {
        try
        {
            var request = WebRequest.Create(FullUrl) as HttpWebRequest;
            request.KeepAlive = false;

            var response = request.GetResponse() as HttpWebResponse;

            if (request.HaveResponse == true && response != null)
            {
                var reader = new StreamReader(response.GetResponseStream());
                ReturnedAppsXml = XElement.Parse(reader.ReadToEnd());
                done = true;
            }
        }
        catch
        {
            Console.WriteLine("yeah, your connection was likely aborted");
            done = false;
        }
    }
}

 

Now comes the fun part.  Remember, the XML is coming over the wire, and it comes 100 elements at a time.  So you have to parse the stream, stuff them somewhere and get the next stream.  Included in the XML returned is the token for how you request the next bit of XML. (note, yes I know I am using RegEx where I could be using String.Replace; also sorry about the wonky formatting, but my blog has width issues)

public IEnumerable<ZestAppData> GetAppEntries()
{
    //first we have to parse the feed which came back
    IEnumerable<ZestAppData> entries =
        from e in ReturnedAppsXml.Elements(ns + "entry")
        select new ZestAppData
        {

            Title = e.Element(ns + "title").Value,

            Id = Regex.Replace(e.Element(ns + "id").Value, "(urn:uuid:)(.)", "$2"),

            ReleaseDate = DateTime.Parse(e.Element(zestns + "releaseDate").Value),

            Updated = DateTime.Parse(e.Element(ns + "updated").Value),

            ShortDescription = e.Element(zestns + "shortDescription") == null
                ? "" : e.Element(zestns + "shortDescription").Value,

            AverageUserRating = decimal.Parse(e.Element(zestns + "averageUserRating").Value),

            UserRatingCount = int.Parse(e.Element(zestns + "userRatingCount").Value),

            Version = e.Element(zestns + "version").Value,

            ImageId = Regex.Replace(e.Element(zestns + "image").Element(zestns + "id").Value, "(urn:uuid:)(.)", "$2"),

            Categories = (
                from category in e.Elements(zestns + "categories").Elements(zestns + "category")
                select new ZestCategory
                {
                    Id = category.Element(zestns + "id").Value,
                    Title = category.Element(zestns + "title").Value,
                    IsRoot = category.Element(zestns + "isRoot").Value
                }).ToList(),

            Publisher = (
                from publisher in e.Elements(zestns + "publisher")
                select new ZestPublisher
                {
                    Id = publisher.Element(zestns + "id").Value,
                    Name = publisher.Element(zestns + "name").Value
                }).ToList(),

            Offers = (
                from offer in e.Elements(zestns + "offers").Elements(zestns + "offer")
                select new ZestOffer
                {
                    OfferId = offer.Element(zestns + "offerId").Value,
                    MediaInstanceId = offer.Element(zestns + "mediaInstanceId").Value,
                    Price = decimal.Parse(offer.Element(zestns + "price").Value),
                    PriceCurrencyCode = offer.Element(zestns + "priceCurrencyCode").Value,
                    LicenseRight = offer.Element(zestns + "licenseRight").Value,
                    PaymentType = (
                        from paymenttype in offer.Elements(zestns + "paymentTypes").Elements()
                        select paymenttype.Value).ToList()
                }).ToList()
        };

    //now I need to get the AfterMarkerUrl from the XML feed
    var afterMarker =
        from e in ReturnedAppsXml.Elements(ns + "link")
        where e.Attribute("rel").Value == "next"
        select (string)e.Attribute("href").Value;

    if (afterMarker.Count() > 0)
    {
        AppAfterMarkerUrl = BaseAppsUrl + afterMarker.Single();
    }
    else
    {
        HasMoreApps = false;
    }

    return entries;
}

Now you have all the data you need to crawl the marketplace whenever you want.  The LINQ stuff is really, really fast.  Crawling the marketplaces can be a bit slow.  I crawl each one individually when my code runs, and I store app lists for each of the markets.

One of the mistakes I made was having ZestAppData.Udpated be a DateTime and not a Date.  I only crawl once per day, so I don’t need all the extra data.  The Zest feeds update daily, I think every couple of hours.

  • Stephan

    So even more script kiddies get to know how to rip the whole MS Marketplace for WP7.
    Just download all XAPs remove certifcate and deploy on unlocked device… Thanks

  • As you point out, this requires a device unlock phone. A script kiddie could probably do this a whole lot easier with Perl and curl.

  • With NoDo rolling out, this will be quite the expensive hobby! (i.e. No ChevronWP7 support)

  • Very nice.

    I think ill build a mini app tracker for my apps to collect their progess over time vs events/updates vs ad revenue.

    Thanks

  • if you want to send me an email with your AppId, I can pull the historical data for you from my DB. I started running the code back in late December.

  • Pingback: WP7 Marketplace & The PivotViewer – Match made in heaven « Sean Briscoe()

  • Stephan

    $100 for all apps? not that expensive and an unlock will come … why does MS doesn’t fix this?

  • Pingback: How AutoHotKey Can Help Bloggers Write Better | LEHSYS()

  • Pingback: Crawling the Windows Phone Marketplace | www.nalli.net()

  • Sean

    Really appreciate this overview. I thought it would be interesting to combine the power of the PivotViewer with the Marketplace data. I’m still in the process of publishing it, but I posted a quick walkthrough of what it looks like at http://seanbriscoe.info/2011/03/25/wp7-marketplace-the-pivotviewer-match-made-in-heaven/

  • Thanks Brandon, this is very helpful.  

    I had a script based on this example running regularly for weeks as a daily task, but recently it started to fail.

    Stepping through, I saw the Linq select was choking on this app’s entry:
    http://social.zune.net/redirect?type=phoneApp&id=ecc794a8-aa58-e011-854c-00237de2db9e

    Looks like that particular entry is missing an image id so the select new ZestAppData failed.  So for now I skip images to keep collecting review data.  

  • Zhiyongyao

    very nice.

    Do you know how to get the version history of an app?

    yao

  • radioactiveplaydough

    Wow!  Thank you for sharing this information!  Is it also possible to get extra details like file size information using this method?  

  • Pingback: Windows Phone 7 Marketplace Statistics « Mas-Tool's Favorites()

  • Anonymous

    Very informative i really like this blog a great stuff thanks! This might also a great source of info meladerm where to buy

  • Microsoft currently gives an official count of “more than 35,000? apps in the Windows Phone Marketplace. In the past, Microsoft has said that it doesn’t count extremely simple apps such as wallpapers or multiple versions (i.e. a paid game that also provides a “lite” version) as individual apps, which may explain the large discrepancy between the official number and the estimate.

  • Anonymous

    This site help me  to learn something that i didn’t know just a brief thanks to you guys i really look forward to see future post here thanks!  meladerm cream also very nice and has a good source of info.

  • Thanks for this wonderful posts. Ive been looking for this great information all over the web to use for my school paper. I hope its ok to refer this post on my paper work right? btw, I also have a blog about forex broker reviews and if you are interested, come and visit it. Thanks!

  • Very interesting article. Thanks for your excellent post.
    forex broker reviews

  • Tushar Ghosh

     WOW, it is an excellent example of XML coding. I was trying to running a almost same kind of code with XML. But there was always something wrong. I was fed up with the code. But now I have got an inspiration to run it. Thank you for the post 🙂

  • Agostino

    Thank you really a lot!! 🙂

  • Agostino

    Excuse, has the URL changed? I cannot connect to 
    http://catalog.zune.net/v3.2/en-GB/apps?orderBy=releaseDate&chunkSize=10&amp;
    clientType=WinMobile%207.0&store=Zest&store=&store=HTC

  • Pingback: Windows Phone Store api()

  • kaushikdb

    awesome post.

    watch online movies

  • nice post but this is not cinnect

    Watch Online Movie Filmplace

  • Awsome Post But Cannot Be Connect

    Watch filmplace http://filmplace.in